Automated Text Analytics for the Joint Data Center on Forced Displacement

Background and Context

The Joint Data Center on Forced Displacement (JDC) was established to enhance the ability of stakeholders to make timely and evidence-informed decisions that can improve the wellbeing of forcibly displaced populations. Among these stakeholders are the multilateral development banks (MDBs)—the World Bank, the African Development Bank, the Asian Development Bank, the European Bank for Reconstruction and Development, and the Inter-American Development Bank — who play a critical role in financing and implementing their shareholders’ policies and programs.

In 2016, the MDBs issued a joint statement expressing their commitment to working together, and within their respective institutional mandates, to respond to the global forced displacement crisis[1]. Improving the availability, quality, and operational relevance of data collection, and analytical work related to forced displacement is thus a common objective and interest of the JDC and the MDBs. A better understanding of the content and evolution of the research and operational work of the MDBs can provide useful insights on how and where knowledge on forced displacement is generated and used to inform development projects and programs.


Activity description

The MDBs make much of their respective research output and project documents publicly available. An analysis and regular monitoring of the scope and coverage of these documents can reveal the extent to which issues related to forced displacement (and others) have been covered in their analytical and operational work, over time and by country. Considering the large volume of information available, such an analysis/assessment can only be conducted by applying advanced machine learning (natural language processing) techniques.

This activity will build a comprehensive corpus of project and analytical documents published by the MDBs and apply machine learning algorithms to extract, structure, analyze, and monitor the information they contain. The necessary information will be gathered by scraping relevant and publicly available analytical and operational documents from the MDB’s respective website. Natural language processing models (word embeddings and Latent Dirichlet Allocation-LDA) will be applied to this corpus. The LDA model will extract the topic composition of each document. Combining this information with metadata available for each document (date, geographic coverage, type of document, etc.) will create a database that will allow in-depth and disaggregated analysis and monitoring of the relative importance of each topic in the MDB’s research output and project documents.

Overall objectives

The objective of the activity is to contribute to monitoring the scope and coverage of the MDBs’ projects and programs related to forced displacement, identify gaps in their research and analytical work, and provide tools to project managers and researchers to easily access relevant  information in this vast knowledge repository. The project will build a largely automated system that will extract and organize information contained in a large volume of publicly-available documents and provide a detailed description of how the MDBs over time and across countries, address forced displacement and related issues.

By analyzing and monitoring operational work and research related to forced displacement, this activity will provide useful information to project managers and researchers to identify knowledge and data gaps, foster new research, improve the quality of MDBs’ operational work, and provide a monitoring indicator of interventions by the MDBs in forced displacement and related fragility issues. The models will indicate when and where research and operations have been conducted that covered issues related to forced displacement, and will provide useful information on the associated topics, but will not provide an assessment of the “quality” (academic, or in terms of impact) of the publications.

C.  Engagement with partners

The project will be implemented by the Data Analytics and Tools unit (DECAT) at the World Bank. The team will engage with the Innovation Team and Division of Resilience and Solutions (DRS) at UNHCR during the design and implementation of activities. Upon completion of the project, the team anticipates engaging with other development and humanitarian partners interested in the activity.

D.   Contact

For further details on this activity, please contact:


[1] See for example


Pin It on Pinterest

Share This