Creating public interest datasets for artificial intelligence for development
Artificial intelligence could play a transformational role in global development. Studies suggest it could support better medical diagnostics, increase agricultural productivity, and improve public service delivery, among other benefits. The creation of high-quality labelled datasets used to train machine learning models has driven progress in artificial intelligence in these domains. Many of these datasets, however, do not represent wide swaths of the population and overrepresent North American and European geographies, species, cultures, and languages. As a result, valuable machine learning tools do not work for the developing world, and some exclude poor and vulnerable communities.
This project will address this problem by funding the creation, expansion, and maintenance of equitably labelled datasets relevant for the Global South. It will also deepen our understanding of how to fund and support the development and maintenance of equitably labelled datasets. Initial datasets will be in the domains of healthcare, agriculture, and local languages. All datasets will be openly available.
This project is IDRC’s contribution to the Universal Labelling Project, a pooled fund structure that will distribute small grants to support the creation of public interest datasets. The project is a collaboration between the Rockefeller Foundation, Google.org, and IDRC. Meridian Institute, a charitable organization based in the United States, will act as the financial and administrative secretariat for the pooled fund.