监测数据的机器学习分类预测-专题定制-三峡大学图书馆

Prediction of Backbreak in Open-Pit Blasting Operations Using the Machine Learning Method

Khandelwal, Manoj Monjezi, M.

Maharana Pratap Univ Agr & Technol Coll Technol & Engn Udaipur 313001 IndiaTarbiat Modares Univ Fac Engn Tehran Iran

Machine learning methods to forecast temperature in buildings

Mateo, Fernando Jose Carrasco, Juan Sellami, Abderrahim Millan-Giraldo, Monica Dominguez, Manuel Soria-Olivas, Emilio

Univ Valencia Intelligent Data Anal Lab ETSE E-46100 Valencia SpainUniv Jaume 1 Inst New Imaging Technol Dept Comp Languages & Syst Castellon de La Plana 12071 SpainUniv Leon SUPPRESS Res Grp Leon 24007 Spain

来源详细信息

ETHNOPRED: a novel machine learning method for accurate continental and sub-continental ancestry identification and population stratification correction

Hajiloo, Mohsen Sapkota, Yadav Mackey, John R. Robson, Paula Greiner, Russell Damaraju, Sambasivarao

Univ Alberta Dept Comp Sci Edmonton AB CanadaUniv Alberta Alberta Innovates Ctr Machine Learning Edmonton AB CanadaUniv Alberta Dept Lab Med & Pathol Edmonton AB CanadaUniv Alberta Dept Oncol Edmonton AB CanadaAlberta Hlth Serv Edmonton AB CanadaUniv Alberta Dept Agr Food & Nutr Sci Edmonton AB Canada

来源 BioMed Central期刊

详细信息

关键词： Decision Tree Genome Wide Association Study Population Stratification Ancestry Informative Marker Individual Decision Tree

摘要： Background: Population stratification is a systematic difference in allele frequencies between subpopulations. This can lead to spurious association findings in the case-control genome wide association studies (GWASs) used to identify single nucleotide polymorphisms (SNPs) associated with disease-linked phenotypes. Methods such as self-declared ancestry, ancestry informative markers, genomic control, structured association, and principal component analysis are used to assess and correct population stratification but each has limitations. We provide an alternative technique to address population stratification. Results: We propose a novel machine learning method, ETHNOPRED, which uses the genotype and ethnicity data from the HapMap project to learn ensembles of disjoint decision trees, capable of accurately predicting an individual's continental and sub-continental ancestry. To predict an individual's continental ancestry, ETHNOPRED produced an ensemble of 3 decision trees involving a total of 10 SNPs, with 10-fold cross validation accuracy of 100% using HapMap II dataset. We extended this model to involve 29 disjoint decision trees over 149 SNPs, and showed that this ensemble has an accuracy of >= 99.9%, even if some of those 149 SNP values were missing. On an independent dataset, predominantly of Caucasian origin, our continental classifier showed 96.8% accuracy and improved genomic control's. from 1.22 to 1.11. We next used the HapMap III dataset to learn classifiers to distinguish European subpopulations (North-Western vs. Southern), East Asian subpopulations (Chinese vs. Japanese), African subpopulations (Eastern vs. Western), North American subpopulations (European vs. Chinese vs. African vs. Mexican vs. Indian), and Kenyan subpopulations (Luhya vs. Maasai). In these cases, ETHNOPRED produced ensembles of 3, 39, 21, 11, and 25 disjoint decision trees, respectively involving 31, 502, 526, 242 and 271 SNPs, with 10-fold cross validation accuracy of 86.5% +/- 2.4%

The Use of Machine Learning Methodologies to Analyse Antibiotic and Biocide Susceptibility in <i>Staphylococcus aureus</i>

Coelho, Joana Rosado Carrico, Joao Andre Knight, Daniel Martinez, Jose-Luis Morrissey, Ian Oggioni, Marco Rinaldo Freitas, Ana Teresa

Univ Tecn Lisboa INESC ID IST Lisbon PortugalUniv Lisbon Fac Med Inst Mol Med Inst Microbiol P-1699 Lisbon PortugalQuotient Biores Fordham EnglandUniv Western Australia Queen Elizabeth II Med Ctr Nedlands WA 6009 AustraliaCSIC Ctr Nacl Biotecnol Dept Biotecnol Microbiana Madrid SpainIHMA Europe Sarl Epalinges SwitzerlandUniv Siena Dipartimento Biotecnol I-53100 Siena Italy

关键词： Bacterial diseases Bacterial pathogens Biological data management Biology Computational biology Drug research and development Drugs and devices Emerging infectious diseases Epidemiology Infectious disease epidemiology Infectious diseases Medicine Microbial control Microbiology Pharmacology Public Health and Epidemiology Research Article Staphylococci

摘要： Background: The rise of antibiotic resistance in pathogenic bacteria is a significant problem for the treatment of infectious diseases. Resistance is usually selected by the antibiotic itself;however, biocides might also co-select for resistance to antibiotics. Although resistance to biocides is poorly defined, different in vitro studies have shown that mutants presenting low susceptibility to biocides also have reduced susceptibility to antibiotics. However, studies with natural bacterial isolates are more limited and there are no clear conclusions as to whether the use of biocides results in the development of multidrug resistant bacteria. Methods: The main goal is to perform an unbiased blind-based evaluation of the relationship between antibiotic and biocide reduced susceptibility in natural isolates of Staphylococcus aureus. One of the largest data sets ever studied comprising 1632 human clinical isolates of S. aureus originated worldwide was analysed. The phenotypic characterization of 13 antibiotics and 4 biocides was performed for all the strains. Complex links between reduced susceptibility to biocides and antibiotics are difficult to elucidate using the standard statistical approaches in phenotypic data. Therefore, machine learning techniques were applied to explore the data. Results: In this pioneer study, we demonstrated that reduced susceptibility to two common biocides, chlorhexidine and benzalkonium chloride, which belong to different structural families, is associated to multidrug resistance. We have consistently found that a minimum inhibitory concentration greater than 2 mg/L for both biocides is related to antibiotic non-susceptibility in S. aureus. Conclusions: Two important results emerged from our work, one methodological and one other with relevance in the field of antibiotic resistance. We could not conclude on whether the use of antibiotics selects for biocide resistance or vice versa. However, the observation of association between multiple

Classification of Amyloidogenic Hexapeptides with Machine Learning Methods

Kotulska, Malgorzata Unold, Olgierd Stanislawski, Jerzy

Wroclaw Univ Technol PL-50370 Wroclaw Poland

来源详细信息

Machine learning methods can replace 3D profile method in classification of amyloidogenic hexapeptides

Stanislawski, Jerzy Kotulska, Malgorzata Unold, Olgierd

Wroclaw Univ Technol Inst Comp Engn Control & Robot PL-50370 Wroclaw PolandWroclaw Univ Technol Inst Biomed Engn & Instrumentat PL-50370 Wroclaw Poland

来源 BioMed Central期刊

详细信息

关键词： Amyloid 3D profile WEKA Alternating decision tree Neural network

摘要： Background: Amyloids are proteins capable of forming fibrils. Many of them underlie serious diseases, like Alzheimer disease. The number of amyloid-associated diseases is constantly increasing. Recent studies indicate that amyloidogenic properties can be associated with short segments of aminoacids, which transform the structure when exposed. A few hundreds of such peptides have been experimentally found. Experimental testing of all possible aminoacid combinations is currently not feasible. Instead, they can be predicted by computational methods. 3D profile is a physicochemical-based method that has generated the most numerous dataset - ZipperDB. However, it is computationally very demanding. Here, we show that dataset generation can be accelerated. Two methods to increase the classification efficiency of amyloidogenic candidates are presented and tested: simplified 3D profile generation and machine learning methods. Results: We generated a new dataset of hexapeptides, using more economical 3D profile algorithm, which showed very good classification overlap with ZipperDB (93.5%). The new part of our dataset contains 1779 segments, with 204 classified as amyloidogenic. The dataset of 6-residue sequences with their binary classification, based on the energy of the segment, was applied for training machine learning methods. A separate set of sequences from ZipperDB was used as a test set. The most effective methods were Alternating Decision Tree and Multilayer Perceptron. Both methods obtained area under ROC curve of 0.96, accuracy 91%, true positive rate ca. 78%, and true negative rate 95%. A few other machine learning methods also achieved a good performance. The computational time was reduced from 18-20 CPU-hours (full 3D profile) to 0.5 CPU-hours (simplified 3D profile) to seconds (machine learning). Conclusions: We showed that the simplified profile generation method does not introduce an error with regard to the original method, while increasing the computational

基于马尔科夫链模型的癌症分类与诊断方法研究

李鼎

中国科学技术大学

来源详细信息

关键词： 癌症诊断高通量生物技术基因表达数据马尔科夫链模型机器学习方法

摘要： 现今，癌症问题日益突出。癌症分类与诊断方法的滞后一直以来都是制约癌症治疗的一个很大的瓶颈。在过去的几十年中，人们一直在寻找一种行之有效的基于基因表达水平的癌症诊断方法。高通量生物技术的产生大大方便了人们从基因层次理解和认识癌症病理。机器学习方法和统计学方法在癌症诊断中的不断的应用，使该领域又有了另一番景象。基因表达数据天生具有高噪声、高变异和高维小样本特性。如何结合先验生物知识建模基因表达数据是当前生物信息学研究的焦点问题之一。本文提出应用马尔科夫链模型理论建模基因通路，在有效表征基因通路活性的基础上发展癌症分类与诊断方法。主要工作可概括如下:\n 1)提出了有约束的基因表达数据离散化方法。为了能够使数据得到更好的离散性能，依据三态基因调控假设对基因离散状态数进行约束，然后通过最大化类别分布差异性来实现基因表达数据的有效离散。在三个标准基因表达数据集（白血病数据集、前列腺癌数据集和肝癌数据集）上对该离散方法进行了实验验证。和现有方法的比较实验证实，该离散方法简单、直观，具有较低的时间复杂度，获得了较好的癌症分类性能。\n 2)提出了两种基于马尔科夫链模型的基因通路建模方法。所建立的模型使用基因间的状态转移概率矩阵来刻画基因通路中的遗传信息或转导信号的传输特性，并标定基因通路的活性。基因通路图被分解为多个基因链。在一个基因链中，遗传信息由第一个基因有序的流向最后一个基因。该基因链被视为一个基因马尔科夫链。链中的每个基因视作马尔科夫链的一个时间点，可具有三种离散状态。用G={g1，g2,…，gn}代表一条长度为n的基因链，其马尔科夫链可以表示为{X(g)，g∈G}，其中，X(g)表示第g个基因所处的调控状态。通过假设不同的状态转移方式，两种基因通路模型被建立，即齐次马尔科夫链模型和非齐次马尔科夫链模型。\n 3)提出了一种基于基因通路网络的癌症分类与诊断方法。为了探索基因通路信息与癌症分类之间的内在联系，应用马尔科夫链理论为每种类别的样例建立一个属于它们自身的马尔科夫链模型。然后，利用所构建的模型来预测未知样例的发生概率，对未知样例的类别做出预测。这种癌症诊断方法是一种有监督的方法。相对于犹如黑盒子一样的传统的癌症诊断模型，该分类模型能够将统计学方法和生物先验知识进行很好的融合，获得更高精度的同时，也能够帮助揭示癌症发病的内在机理。特别的，算法简单，直观，易于实现。结合三个来自KEGG数据库的癌症相关的基因通路，在白血病数据集和肝癌数据集这两个标准数据集上对该方法进行了实验验证。

面向最优养殖布局的机器学习方法

丁惠

中国海洋大学

来源详细信息

关键词： 最优养殖布局机器学习方法支持向量机海洋生物

摘要： 海洋生物种类繁多，为人们提供了大量的化学能源、矿物能源、食用药用能源以及空间资源，因此，研究海洋经济是人们可持续发展的必要之路。海洋生物在海域中的养殖密度及分布布局都会对海水的潮汐潮流造成一定的影响，因此，因地制宜地进行增养殖是获得良好经济效益的前提，研究最优养殖布局对人们有着非常重要的作用。\n 作为统计学习理论基础上一种新的机器学习方法，支持向量机较好地解决了小样本、非线性、过学习、高维数等实际问题，在预测、分类及回归方面都有很好地应用。而由于海洋中的观测数据量相对少，影响因素较多，物理过程复杂等特点，很多数据都难以提取及准确的使用。基于此，本文使用将支持向量机和水动力相结合的方法解决海洋养殖布局问题。\n 针对桑沟湾特定养殖海区，在目前只有较少的观测数据的情况下，本文研究如何充分利用观测资料给出的数据，并结合水动力数值模式，使用统计学习方法，从有限数据中深入挖掘出有用的信息。在水动力模型中加入养殖筏架对整个流场产生的阻力，研究其对流场产生的影响，然后优化改进的模式，进而使用支持向量机求解整个湾内流速的变化。\n 此外，在求解过程中，根据海湾的固有特点，目标值的不固定性、多样性，本文又提出了将多输出支持向量机与水动力模式相结合的方法求解湾内流速的变化情况，经过实验的多次验证，使用多输出支持向量机不但节省了运算时间，并且得到的结果也比较准确，为海洋学的研究提供了大量的数据资料，对科学反映桑沟湾高密度养殖区的真实环境提供可能，为优化养殖布局、改善养殖环境提供科学的依据。

融合半监督学习的主动学习技术

李明致

中国科学技术大学

来源详细信息

关键词： 机器学习方法主动学习半监督学习边缘样本采样标记方法

摘要： 机器学习方法研究计算机系统如何通过自动化学习的过程来提升系统性能的算法。对于很多机器学习问题，例如高光谱遥感图像分类、搜索引擎的排序学习、语音识别等，学习模型的泛化性能依赖于有标记样本。但是对于这些问题来说，可获取的有标记样本的数量相对问题的规模来说往往不足，又或者获取样本的代价较高。如何依据有限的有标记样本集来训练足够好的学习模型是机器学习领域一个热点的研究问题。\n 主动学习(Active Learning)和半监督学习(Semi-supervised Learning)方法是解决这一问题的两种思路。主动学习研究训练样本的选择方法，以尽可能少的标记样本，得到尽可能好的泛化能力。半监督学习研究结合有标记样本和无标记样本的训练和学习模型，建立有标记样本和无标记样本的联系，来得到更好的泛化能力。在实际问题中，研究者发现半监督学习可以作为主动学习的一种补充技术，并试图把这两种方法结合起来使用。我们把融合半监督学习的主动学习方法的分为两类。第一类算法中，半监督学习主要用作为主动学习方法的一种采样技术。本文把这一类方法称为Active Learning withSemi-Supervised Heuristic，ALSSH。第二类方法中，半监督学习方法被当作一种伪标记技术，配合主动学习一起来对未标记样本进行标记。本文把这一类方法称为Collaborative Active and Semi-Supervised Labeling, CASSL。\n 在CASSL类型的算法中，学习模型并不能保证产生完全正确的标记。在迭代采样的初始阶段，算法可以学习到的模型的准确性是有限的，需要针对特定问题设定伪标记样本的选择规则，以弥补这一不足。如果加入错误的伪标记样本，可能会影响随后的模型学习和训练过程，造成性能的下降，及时的清除错误的伪标记样本非常必要。我们提出基于协同验证的融合半监督学习的主动学习标记方法。我们把这一方法称为Ensured Collaborative Active andSemi-Supervised Labeling，ECASSL。ECASSL以SVM作为基本的学习模型，边缘样本采样(Margin Sampling)作为基本的采样方法。每轮迭代，我们利用新的学习模型验证伪标记样本，根据验证结果，修正或者清除这些伪标记样本。实验结果表明，ECASSL算法有效提高了学习和标注性能。

基于机器学习方法的H1N1神经氨酸苷酶抑制剂的分类预测

吕巍薛英孟庆伟

山东农业大学生命科学学院作物生物学国家重点实验室山东泰安271018山东农业大学生物学博士后科研流动站山东泰安271018四川大学化学学院教育部绿色化学与技术重点实验室成都610064四川大学生物治疗国家重点实验室成都610041

来源详细信息

关键词： 机器学习方法 H1N1型流感病毒神经酰胺酶抑制剂支持向量机

摘要： 流感是一种主要的呼吸道传染病,在普通人群中有着较高的发病率,而对于一些年老和高危病人还有较高的死亡率.研究显示抑制神经氨酸苷酶(NA)可以阻断病毒RNA复制,因此NA是有效治疗H1N1型流感病毒的重要药物靶标.通过计算机方法进行虚拟筛选和预测NA抑制剂已经变得越来越重要.针对酶活性位点进行基于结构的合理药物设计,开发H1N1病毒神经氨酸苷酶抑制剂,已成为药物研究的热点之一.本文通过多种机器学习方法(支持向量机(SVM)、k-最近相邻法(k-NN)和C4.5决策树(C4.5DT))对已知的神经氨酸苷酶抑制剂(NAIs)与非神经氨酸苷酶抑制剂(non-NAIs)建立分类预测模型.其中227个结构多样性化合物(72个NAIs与155个non-NAIs)被用于测试分类预测系统,并用递归变量消除法选择与神经氨酸苷酶抑制剂分类相关的性质描述符以提高预测精度.本研究对独立验证集的总预测精度为75.9%-92.6%,NA抑制剂的预测精度为64.3%-78.6%,非H1N1抑制剂的预测精度为77.5%-97.5%.SVM法给出最好的总预测精度(92.6%).本研究表明支持向量机等机器学习方法可以有效预测未知数据集中潜在的NA抑制剂,并有助于发现与其相关的分子描述符.

科研专题资源库更多>>

监测数据的机器学习分类预测

限定内容

核心刊收录

日期分布

学科分类号

主题

机构

作者

语言

专题定制

在线全文

在线全文

在线全文

科研专题资源库 更多>>

监测数据的机器学习分类预测

限定内容

核心刊收录

日期分布

学科分类号

主题

机构

作者

语言

专题定制

在线全文

在线全文

在线全文

科研专题资源库更多>>