Compiling and curating UNHCR’s datasets for the Microdata Library
Background and context
While UNHCR routinely collects a wealth of data, both directly and through its partners, the organization as a whole has not been able to capitalize fully on its significant data collection investments. This is due to various reasons: data are at times lost or forgotten after the data collection and analysis activities are complete; data are not stored in a format that is conducive to further use; or metadata is not documented. In its Data Transformation Strategy[1], UNHCR has committed “…that by 2025, UNHCR is a trusted leader on data and information related to refugees and other affected populations, thereby enabling actions that protect, include and empower.” In the interest of promoting efficiency, transparency and the best use of public funds, open access and dissemination of data is increasingly promoted and sometimes even mandated by those funding data collection efforts, and many National Statistical Offices (NSOs) now maintain open data portals. To date, UNHCR has shared mainly aggregated data openly and publicly, and access to microdata has largely been regulated by ad hoc data sharing agreements. This project is scaling UNHCR’s commitment to open and responsible data sharing, by discovering, cleaning, cataloguing and anonymizing microdata collected by UNHCR and its partners in both an internal-facing and an external-facing online platform, namely the Raw Internal Data Library (RIDL) and the Microdata Library (MDL) https://microdata.unhcr.org/index.php/home). At the internal level, the project will therefore improve data quality, prevent data loss, prevent duplication of data collection efforts, and reduce the burden of trying to find data as well as responding to personalized requests of other staff members. At the external level, it will contribute to the data value chain by promoting, through an ease of access, further analysis by academics and research centers, private sector, development actors and other humanitarian organizations. This analysis can be used to inform programming, policy, and advocacy efforts generating a positive impact on the lives of people affected by forced displacement.
Activity description
The RIDL and MDL platforms are designed to provide a user-friendly format and a secure location for the storage and re-use of diverse datasets collected by UNHCR and others interested in or working with forced displacement issues. In addition to being the repository for new data activities, these platforms will host a large backlog of datasets which UNHCR or its partners have previously collected. To this end, JDC’s financial support facilitates the work of a Data Curation Team located within the Statistics and Demographic Section of UNHCR’s Global Data Service. The Team’s main task is to ensure that the RIDL and MDL platforms are populated with data that is appropriately cleaned, anonymized in a secure manner consistent with UNHCR’s Data Protection Policy, and is documented according to international standards. The Data Curation Team engages operations, partners and technical units to discover, identify, analyze, and prioritize data for storage, curation and potential wider dissemination. The activity will include scoping missions by curators to regions to better understand the current data landscape and existing datasets, and train data producers and users on the use of the platforms. Remote support on the platforms is also being provided to UNHCR field staff, regional offices and Headquarters to ensure safe data management for UNHCR microdata. Regional training workshops will be organized for a target technical and Information Management staff on the use of the platforms and other data-related activities. This will improve the buy-in of relevant actors and ensure that data will be shared with a higher quality standard. Through these activities, the curation team assists in institutionalizing the use of the platforms as corporate applications. The Curation Team will also assist in obtaining the necessary approvals to publish suitable data sets on the external MDL platform whenever appropriate and suitable and will manage the release of data as per appropriate protocols from the MDL in response to requests. The Curation Team will also develop and issue technical guidance and training materials for data producers, including best practices for cleaning, anonymization, and creation of metadata (the Data Curation Handbook). Standard language to be included in Project Partnership Agreements to facilitate data sharing from partners (including host governments, NSOs, non-governmental organizations and other organizations) will also be developed. The project will also support the continued technical/IT development of the two platforms.
Overall objectives
The project will continue to discover, clean, document and anonymize the backlog of data present in UNHCR’s various operations and technical sections for publication on the RIDL and MDL. It will also assist in institutionalizing the use of the RIDL as a corporate application and will standardize curation procedures and methods described in the Data Curation Handbook, through capacity development and trainings of UNHCR’s Data, Identity Management and Analysis staff. It will further familiarize staff members with the roles and procedures on Data Dissemination. Making data available on the external microdata library will facilitate data sharing with partners and reduce the need to engage in ad hoc data sharing agreements. The dissemination of microdata will enable analysis and research that will inform and improve future programmes and policies that aim to improve the lives and well-being of persons affected by forced displacement., Moreover, it will reduce duplication in data collection exercises and the burden of forcibly displaced persons to respond to queries. The project will also engage in communication and promotion of activities to encourage the use of the shared data. Overall, it will, improve the quality and reliability of data produced by UNHCR, enhance the credibility of UNHCR as an authoritative source of information on FDPs, and contribute to maximizing the impact of UNHCR’s data collection.
Engagement with partners
The UNHCR Microdata Library application was set up and will continue to be maintained in close coordination with the World Bank.
Contact
For further details on this activity, please contact:
- contact@jointdatacenter.org
- Felix Schmieding, JDC Focal Point, schmiedf@unhcr.org