高级英语-文献订阅-三峡大学图书馆

Understanding Unfairness and Its Mitigation in Open-Source Machine Learning Models

Biswas, Sumon

Iowa State University

关键词： Computer science Artificial intelligence

摘要： Machine learning models are increasingly being used in important decision-making software such as approving bank loans, recommending criminal sentencing, hiring employees, and so on. It is important to ensure the fairness of these models so that no discrimination is made based on protected attributes (e.g., race, sex, age) while decision making. Many such algorithmic fairness issues of machine learning software have been reported in the recent past. Research has been conducted to measure unfairness and mitigate that to a certain extent. What unfairness issues exist in open-source models and how do the mitigation techniques perform? In this thesis, we have focused on the empirical evaluation of fairness and mitigations on real-world machine learning models. We have created a benchmark of 40 top-rated models from Kaggle used for 5 different tasks, and then using a comprehensive set of fairness metrics, evaluated their fairness. Then, we have applied 7 mitigation techniques on these models and analyzed the fairness, mitigation results, and impacts on performance. We have found that some model optimization techniques result in inducing unfairness in the models. On the other hand, although there are some fairness control mechanisms in machine learning libraries, they are not documented. The mitigation algorithms also exhibit common patterns such as mitigation in the post-processing is often costly (in terms of performance) and mitigation in the pre-processing stage is preferred in most cases. We have also presented different trade-off choices of fairness mitigation decisions. Our study suggests future research directions to reduce the gap between theoretical fairness aware algorithms and the software engineering methods to leverage them in practice.

Measurement of Magnetic Susceptibility in Brain Cortical Tissue By Magnetic Resonance Imaging

Pazmino Jorge Campos

McGill University (Canada)

来源详细信息

Unsupervised Domain Adaptation for Visual Recognition

Zhang, Youshan

Lehigh University

来源详细信息

关键词： Computer science Artificial intelligence

摘要： While huge volumes of unlabeled data are generated and made available in many domains, the demand for automated understanding of visual data is higher than ever before. Most existing machine learning models typically rely on massive amounts of labeled training data to achieve high performance. Unfortunately, such a requirement cannot be met in real-world applications. The number of labels is limited and manually annotating data is expensive and time-consuming. Manual labeling data becomes burdensome and a bottleneck of a new unlabeled domain. It is often necessary to transfer knowledge from an existing labeled domain to a new domain. However, model performance degrades because of the differences between domains (domain shift or dataset bias). To overcome the burden of annotation, Domain Adaptation (DA) aims to mitigate the domain shift problem when transferring knowledge from one domain into another similar but different domain. Unsupervised DA (UDA) deals with a labeled source domain and an unlabeled target domain. The principal objective of UDA is to reduce the domain discrepancy between the labeled source data and unlabeled target data and to learn domain-invariant representations across the two domains during training. Despite several successes of the existing DA models, the domain differences are still challenging to be minimized. The target domain performance is still significantly poorer than the performance in problems without domain shift, which means the performance of proposed models has not yet reached a satisfactory level in real-world applications. The objective of this dissertation is to address domain shift in the visual recognition tasks via the domain adaptation method in computer vision. This work first reviews recent DA related papers and introduces a taxonomy by grouping methods published on unsupervised domain adaptation. It then presents several novel improvements for UDA based on both traditional and deep learning methods. This dissertation i

Detecting Surface Interactions via a Wearable Microphone to Improve Augmented Reality Text Entry

Habibi, Reza

Michigan Technological University

来源详细信息

Konstitutivní Model Poru?ení D?eva Trhlinami P?i Tahově-Smykovém Namáhání

?mídová Eli?ka

Czech Technical University

来源详细信息

Computación inteligente aplicada al mantenimiento del sector de la generación eléctrica insular

Calvo Daniel González

Universidad de La Laguna (Canary IslandsSpain)

来源详细信息

Flood Risk Analysis on Terrains

Lowe, Aaron

Duke University

来源详细信息

关键词： Computer science Hydrologic sciences Geography Applied mathematics Civil engineering

摘要： An important problem in terrain analysis is modeling how water flows across a terrain and creates floods by filling up depressions. This thesis examines a number of flood-risk related problems. One such problem is answering terrain-flood queries: given a terrain represented as a triangulated xy-monotone surface, a rain distribution and a volume of rain, determine which portions of the terrain are flooded. The first part of this thesis develops efficient algorithms for terrain-flood queries under the single-flow direction (SFD) and multiflow-directions (MFD) models, in which water at a point flows along a single downslope edge or multiple downslope edges respectively. Algorithms are given for the more specific case of the SFD model, and then it is shown how to answer queries under the more general MFD model. Available terrain data is also often subject to uncertainty which must be incorporated into the terrain analysis. For instance, the digital elevation models of terrains have to be refined to incorporate underground pipes, tunnels, and waterways under bridges, but there is often uncertainty in their existence. By representing the uncertainty in the terrain data explicitly, methods for flood risk analysis that properly incorporate terrain uncertainty when reporting what areas are at risk of flooding can be developed. The second part of the thesis shows how the algorithms for flood-risk can be extended to handle \uncertain" terrains, using standard a Monte Carlo method. Finally, the third part of the thesis develops efficient algorithms for computing flow-query related problems to determine how much water is flowing over a given vertex or edges as a function of time. We show how to compute the 1D flow rate as well as develop a model for computing 2D channels as well. A number of the algorithms are implemented and their efficacy and efficiency are tested on real terrains of different types (urban, suburban and mountainous.)

The State of Practice for Security Unit Testing: Towards Data Driven Strategies to Shift Security into Developer's Automated Testing Workflows

Gonzalez, Danielle Nicole

Rochester Institute of Technology

来源详细信息

关键词： Computer science Artificial intelligence

摘要： The pressing need to “shift security left” in the software development lifecycle has motivated efforts to adapt the iterative and continuous process models used in practice today. Security unit testing is praised by practitioners and recommended by expert groups, usually in the context of DevSecOps and achieving “continuous security”. In addition to vulnerability testing and standards adherence, this technique can help developers verify that security controls are implemented correctly, i.e. functional security testing. Further, the means by which security unit testing can be integrated into developer workflows is unique from other standalone tools as it is an adaptation of practices and infrastructure developers are already familiar with. Yet, software engineering researchers have so far failed to include this technique in their empirical studies on secure development and little is known about the state of practice for security unit testing. This dissertation is motivated by the disconnect between promotion of security unit testing and the lack of empirical evidence on how it is and can be applied. The goal of this work was to address the disconnect towards identifying actionable strategies to promote wider adoption and mitigate observed challenges. Three mixed-method empirical studies were conducted wherein practitioner-authored unit test code, Q&A posts, and grey literature were analyzed through three lenses: Practices (what they do), Perspectives and Guidelines (what and how they think it should be done), and Pain Points (what challenges they face) to incorporate both technical and human factors of this phenomena. Accordingly, this work contributes novel and important insights into how developers write functional unit tests for at least nine security controls, including a taxonomy of 53 authentication unit test cases derived from real code and a detailed analysis of seven unique pain points that developers seek help with from peers on Q&A sites. Recommendations g

From Data to Insights through Conversation

Leo John, Rogers Jeffrey

The University of Wisconsin - Madison

来源详细信息

关键词： Computer science Artificial intelligence

摘要： Enterprises increasingly employ a wide array of tools and processes to make data-driven decisions. However, there are significant inefficiencies in the enterprise-wide workflow that stem from the fact that business workflows are expressed in Natural Language (NL), but the actual computational workflow must be manually translated into computational programs. This thesis presents an initial approach to bridge this gap by targeting the Data Science (DS) component of enterprise workflows. In this initial approach, we propose using a conversational agent to allow data scientists to assemble data analytics pipelines. A crucial insight is that while precise interpretation of NL continues to be challenging, restricted versions of natural languages are starting to become practical as natural interfaces in complex decision-making domains. Also, we recognize that DS workflow components are often templatized. Putting these two insights together, we develop an initial prototypical system called Ava that uses an unambiguous version of NL known as Controlled Natural Language (CNL) to program DS workflows. The initial Ava system was limited in functionality and focussed on training Machine Learning (ML) models using NL conversations. However, real-world data analysis scenarios go far beyond just training ML Models and involve various stages like exploratory analysis, statistical analysis, and data cleaning. Some analyses may require slicing, dicing, querying, and visualizing the data in different dimensions. To cater to these broader analytics tasks, we propose modifications to Ava's CNL and formalize Ava's CNL via a pattern language. We also propose an end-to-end system architecture for an NL-based data analytics system which is deployed commercially as DataChat. Finally, we propose ML pipelines, a flexible, scalable, and framework-agnostic approach to compose ML model training workflows using DataChat's CNL. We also present DataChat's AutoML framework, which is built using the ML

Classification of COVID-19 Infection Using Deep Learning and Radiomic Features Extracted from Computed Tomography Scans of Patients’ Lungs

Sam Sugumar Jones

University of MarylandBaltimore County

来源详细信息

教学课程资源库更多>>

高级英语

限定内容

核心刊收录

日期分布

学科分类号

主题

机构

作者

语言

文献订阅

教学课程资源库 更多>>

高级英语

限定内容

核心刊收录

日期分布

学科分类号

主题

机构

作者

语言

文献订阅

教学课程资源库更多>>