关键词:
Computer science
Artificial intelligence
摘要:
Enterprises increasingly employ a wide array of tools and processes to make data-driven decisions. However, there are significant inefficiencies in the enterprise-wide workflow that stem from the fact that business workflows are expressed in Natural Language (NL), but the actual computational workflow must be manually translated into computational programs. This thesis presents an initial approach to bridge this gap by targeting the Data Science (DS) component of enterprise workflows. In this initial approach, we propose using a conversational agent to allow data scientists to assemble data analytics pipelines. A crucial insight is that while precise interpretation of NL continues to be challenging, restricted versions of natural languages are starting to become practical as natural interfaces in complex decision-making domains. Also, we recognize that DS workflow components are often templatized. Putting these two insights together, we develop an initial prototypical system called Ava that uses an unambiguous version of NL known as Controlled Natural Language (CNL) to program DS workflows. The initial Ava system was limited in functionality and focussed on training Machine Learning (ML) models using NL conversations. However, real-world data analysis scenarios go far beyond just training ML Models and involve various stages like exploratory analysis, statistical analysis, and data cleaning. Some analyses may require slicing, dicing, querying, and visualizing the data in different dimensions. To cater to these broader analytics tasks, we propose modifications to Ava's CNL and formalize Ava's CNL via a pattern language. We also propose an end-to-end system architecture for an NL-based data analytics system which is deployed commercially as DataChat. Finally, we propose ML pipelines, a flexible, scalable, and framework-agnostic approach to compose ML model training workflows using DataChat's CNL. We also present DataChat's AutoML framework, which is built using the ML