This paper proposes an approach for analyzing a collection of news articles to extract ‘signals of violence’, which can be used in prediction models to forecast forced displacement. The authors test their proposed approach using news articles drawn from the Expanded Open Source dataset—including over 680,000 news articles on Syria and Iraq from January 2012 to June 2017—as well as monthly refugee population data from UNHCR. The approach involves the following steps:
- Automatically processing and analyzing news articles using topic modeling techniques to identify a set of distinct topics for each month, i.e. based on the probability that certain keywords appear together in a particular topic.
- Manually labeling and categorizing the extracted topics for each month, using the following categories: violence/terrorism; economic issues; environmental issues; political issues; religious conflicts; refugee crisis; and relief.
- Estimating a ‘violence score’ for each month equal to the total number of ‘violence’ topics for each month divided by the total number of topics for each month.
- Building prediction models for forecasting the number of refugees from Syria and Iraq.
The authors demonstrate that violence scores, constructed from information extracted from news articles, can be effective in improving the performance of models predicting forced displacement.