关键词:
Computer science
摘要:
Social indicators are numerical measures that describe the well-being of individuals or communities; social indicators span multiple dimensions including food and water, safety, labor, infrastructure, transportation, etc. Taken as a time series, it is possible to use social indicator data for several predictive tasks including predicting social unrest, flow of refugees, and even armed conflict. Thus far, such indicators have been generated on a periodic basis by organizations such as the United Nations and The World Bank based on tedious and expensive data collection methodology. Due to the availability of massive amounts of open source data including social media, satellite data, commodity prices, traffic data, it should now be possible to (i) map data sources to indicators, (ii) generate social indicators in (near) real-time across the global, and (iii) develop models that capture the interaction of indicators for specific predictive tasks. It is critical to develop models that take as input these heterogeneous data sources and derive representations automatically, without relying heavily on heuristic feature engineering. The modeling task is challenging since: (i) event like social unrest and conflict, is due to multiple instigating factors and can rarely be attributed to a single factor and (ii) the diversity of data types (structured, unstructured, numerical, categorical), geospatial granularity, update rates, etc. Furthermore, while efforts have been made to improve the performance of predictive models in terms of accuracy, explanations are rarely provided. In this dissertation, we present a novel framework to combine heterogenous data sources to model social indicators covering multiple topics and data types, solve predictive tasks, and generate natural language descriptions which serve as the explanations. We focus on two specific tasks, namely, civil unrest prediction, and risk assessment and violence prediction. Extensive experiments and evaluation are pre