Big Data for Sampling Design: The Venezuelan Migration Crisis in Ecuador

Juan Eduardo Munoz, Jose Victor Gallegos Munoz and Sergio Daniel Olivieri


Even though Ecuador has a reliable and up-to-date sampling frame for the national census, the lack of information on the numbers of Venezuelans displaced in Ecuador and their locations in the country posed challenges for the design and implementation of the Human Mobility and Host Community Survey (Encuesta de personas en Movilidad y Comunidades de Acogida, EPEC). The survey aimed to gather comparable data on Venezuelan migrants and their host communities. This paper presents a methodology that exploits ‘Big Data’ to generate representative samples of Venezuelan and host households in Ecuador. The analysis is based on Call Detail Records and External Detail Records between June 2018 and March 2019 provided by Telefónica de Ecuador.

Telefónica de Ecuador analyzed their database to determine how many of their active mobile phones in each primary sampling unit (PSU) were likely to belong to Venezuelans displaced abroad, based on the name of the account holder or the volume of calls and messages to/from Venezuela. To estimate the total number of Venezuelans in each PSU, figures were adjusted using Telefónica’s market shares (to estimate the total number of Venezuelan phones from all companies in each PSU) and the fraction of the population using mobile phones. The analysis revealed:

  • 470,095 Venezuelans were estimated to be living in Ecuador in 2019. Venezuelan were concentrated in the main cities of the corridor from the Colombian border in the north to the Peruvian border in the south.
  • There is a high provincial variation, ranging from cantons hosting almost 90,000 Venezuelans displaced abroad (Guayaquil and Quito) to others with none. More than half of Venezuelans are living and working in just four cantons (Guayaquil, Quito, Mana, and Santo Domingo).

In the first sampling stage, 200 PSUs were stratified into three categories depending on the Venezuelan migrant density, defined as the ratio between the number of Venezuelan cellphones in the PSU (as per Telefónica’s estimates) and the total population of the PSU (as per the 2010 population census). Within each stratum, the sample was selected with probability proportional to the number of households reported by the 2010 Census. In the second sampling stage, all households in each of the selected sectors were listed and stratified into three categories considering nationality and demographic composition. Within each stratum, the sample was selected by systematic equal-probability sampling.

This methodology can be useful in designing sampling frames in countries with limited information (e.g. lack of a recent census or a migratory registry) and scarce resources to rapidly gather socio-economic data of migrants and host communities for policy design. The prospects of applying the methodology may be limited in settings where telecommunication companies lack the capacity to provide the requisite data and where the company’s market share and coverage is low. Since migrant and refugee populations are often very mobile, it is also necessary to regularly update the data and sampling frame.