关键词:
Information science
Artificial intelligence
Computer science
摘要:
Spatial data have tremendous value and are necessary components in many important societal applications. In recent years, our world has been witnessing a revolution brought by spatial technologies (e.g., Google Maps, Waze, Uber, Lyft, Grubhub, Lime, autonomous driving). According to a McKinsey Global Institute report, location data will generate about $600 billion annual revenue by 2020 with applications in energy, health, retail, etc. The world's economy also heavily relies on location and time data from over 2 billion GPS receivers, and these data are essential to many applications such as banks, airlines, police, emergency services, and telecommunications. Meanwhile, new types of spatial data are emerging at unprecedented scales and varieties (e.g., 25GB/hour per connected vehicle, 47.7PB per year by NASA by 2022). While spatial data are critical, valuable and collected at massive scales, they pose great challenges to traditional artificial intelligence (AI) techniques when applied to important societal problems. This thesis addresses three of these challenges. First, spatial data (e.g., crime or disease distribution, air quality) are often directly linked to our lived environments. As a result, decisions made on such data tend to have direct impacts on the life of citizens, and thus require statistical robustness to avoid errors which can have high economic and social costs (e.g., false alarm of a crime hotspot). Second, spatial data exhibit interdependency and variability, violating the common i.i.d. (identically and independently distributed) assumption in traditional statistics. This introduces new challenges to traditional optimization problems where spatial interdependency between nearby locations is often neglected and understudied (e.g., spatial contiguity required in land allocation). Finally, data and domain knowledge gaps are common in geospatial problems. For example, while Earth observation imagery is available in the tens of petabytes, there is very