关键词:
Databases
Software
Genomics
Metabolites
Computer science
Genetics
摘要:
Background: Natural products (NP) from plants and microbes are a rich source for bioactive compounds essential for human life. A large part of agriculture, lifestyle and healthcare practice relies on metabolites derived from natural sources. To examine the biosynthetic potential of organisms and to guide NP discovery efforts, people increasingly utilise metabolomic, transcriptomic and genomic approaches. The co-location of metabolic genes in microbial genomes (termed Biosynthetic Gene Cluster or BGC) paves a way for an inexpensive and high throughput survey of natural products. While a focused scope analysis that targets a specific family of known compound chemistry was proven successful to optimize the compound’s utility, a truly global overview which will open our eyes to the actual extent of novel chemistries lies unexplored in nature is still hampered by the limitation of (high quality) data, techniques and bioinformatic tools that are currently available. Results: A computational prediction tool PlantiSMASH was made to enable the exploration of putative plant BGCs, which combines genomic and transcriptomic data to give insights into plant secondary metabolism and evolution. To support large scale annotation and analysis of BGCs, a reference database of known BGC (MIBiG) was markedly improved both in quality and quantity, providing a 73% data increase over its initial release version. A large-scale study of BGC and Gene Cluster Family (GCF) diversity across taxa was done, enabled by the development of a novel bioinformatics tool which can process 1.2 million BGCs within ten days of computing time. Finally, an online database of more than 25,000 GCFs was released for the first time, giving means to the community to do crowdsourced curation, which in turn would come back and be useful in the annotation and discovery of putative or novel BGCs of their own. Conclusions: The works presented in this thesis provide the foundation for a global diversity-informed NP disc