Forced displacement document catalogue

A tool that allows users to extract and analyze the information contained in a comprehensive body of documents published by the Multilateral Development Banks, allowing project managers and researchers to easily access information and for gaps in knowledge to be identified.

13 Jul, 2023

Overall objectives

The objective of the activity is to contribute to monitoring the scope and coverage of the Mulit-lateral Development Banks’ (MDBs) —the World Bank, the African Development Bank, the Asian Development Bank, the European Bank for Reconstruction and Development, and the Inter-American Development Bank — projects and programs related to forced displacement, identify gaps in their research and analytical work, and provide tools to project managers and researchers to easily access relevant  information in this vast knowledge repository. The project will build a largely automated system that will extract and organize information contained in a large volume of publicly-available documents and provide a detailed description of how the MDBs over time and across countries, address forced displacement and related issues.

By analyzing and monitoring operational work and research related to forced displacement, this activity will provide useful information to project managers and researchers to identify knowledge and data gaps, foster new research, improve the quality of MDBs’ operational work, and provide a monitoring indicator of interventions by the MDBs in forced displacement and related fragility issues. The models will indicate when and where research and operations have been conducted that covered issues related to forced displacement, and will provide useful information on the associated topics, but will not provide an assessment of the “quality” (academic, or in terms of impact) of the publications.

Activity description

The MDBs make much of their respective research output and project documents publicly available. An analysis and regular monitoring of the scope and coverage of these documents can reveal the extent to which issues related to forced displacement (and others) have been covered in their analytical and operational work, over time and by country. Considering the large volume of information available, such an analysis/assessment can only be conducted by applying advanced machine learning (natural language processing) techniques.

This activity will build a comprehensive corpus of project and analytical documents published by the MDBs and apply machine learning algorithms to extract, structure, analyze, and monitor the information they contain. The necessary information will be gathered by scraping relevant and publicly available analytical and operational documents from the MDB’s respective website. Natural language processing models (word embeddings and Latent Dirichlet Allocation-LDA) will be applied to this corpus. The LDA model will extract the topic composition of each document. Combining this information with metadata available for each document (date, geographic coverage, type of document, etc.) will create a database that will allow in-depth and disaggregated analysis and monitoring of the relative importance of each topic in the MDB’s research output and project documents.

Engagement with partners

The project will be implemented by the Data Analytics and Tools unit (DECAT) at the World Bank. The team will engage with the Innovation Team and Division of Resilience and Solutions (DRS) at UNHCR during the design and implementation of activities. Upon completion of the project, the team anticipates engaging with other development and humanitarian partners interested in the activity.

Background and Context

In 2016, the MDBs issued a joint statement expressing their commitment to working together, and within their respective institutional mandates, to respond to the global forced displacement crisis[1]. Improving the availability, quality, and operational relevance of data collection, and analytical work related to forced displacement is thus a common objective and interest of the JDC and the MDBs. A better understanding of the content and evolution of the research and operational work of the MDBs can provide useful insights on how and where knowledge on forced displacement is generated and used to inform development projects and programs.


For further details on this activity, please contact:

Harriet Mugera, JDC Focal Point, [email protected]

More activities

Integrating World Bank microdata into the UNHCR microdata library

This project aims to provide technical support from the World Bank so that content on forced displacement from the Bank’s microdata library can be shared on UNHCR’s microdata library.

Statistical methods & tools

In fragile contexts, where data is most needed, it is usually outdated or of poor quality due to the challenges in collecting data. This project aims to improve the quality or survey data by improving sampling frames, questionaire design and through fieldwork.

Refugee Data Finder

A platform of socioeconomic, wellbeing and living standards statistics on forcibly displaced populations, incorporating survey-based indicators into the Refugee Data Finder andaggregation of indicators to produce data that is comparable over time and across countries.