Sequence Annotation of Edible Fungus Calocybe indica Using Different Bioinformatics Tools



Journal Title

Journal ISSN

Volume Title



The fruiting body of Calocybe indica, also known as the Milky White Mushroom (MWM), is an edible fungus widely consumed in South Asia. A notable feature of the MWM, among others, is its symbiotic relationship with plant roots (ectomycorrhizal), such as the roots of Borassus flabellifer (doub palm), Cocos nucifera (coconut tree), Tamarindus indicus (tamarind tree), and Peltophorum pterocarpum (yellow poinciana tree). C. indica is a member of the Basidiomycetes division, which contains much of the known ectomycorrhizal fungal species. Ectomycorrhizal fungi are essential in agriculture as many trees depend on mutualistic symbiotic relationships to fulfill nutrient needs and protect against pathogenic organisms. In addition, ectomycorrhizal fungi facilitate the nutrient and carbon cycles in farmland, maintaining soil health and improving farmland productivity. Unique features of C. indica also include long shelf-life, the ability to grow in tropical climates, and solvent extracts from a previous study found to have anti-proliferative properties against MCF-7 breast cancer cell lines. Such findings show that MWM extracts are an excellent source of phytochemicals such as flavonoids, terpenoids, and alkaloids with the potential to treat breast cancer. However, there is currently no literature available for C. indica, genomic sequencing data, nor the secondary metabolites/compounds and pathways containing biosynthetic gene clusters (BGCs) responsible for producing anti-cancer metabolites. The proposed master thesis project is about whole genome sequencing, assembly, and annotation of C. indica, the first fungus in the Calocybe genus to be thoroughly studied. Various bioinformatics tools have been used to identify carbohydrate-active enzymes (CAZymes), transfer RNA, ribosomal RNA, non-ribosomal peptide synthases, and conduct phylogenetic analysis with other edible mushrooms such as Lentinula edodes, Agaricus bisporus, and Pleurotus ostreatus. Nanopore long-read sequencing data, Illumina DNA short read data, and Illumina RNA short read data received from collaborators have been used to complete the annotation. Preliminary phylogenetic analysis of C. indica and other common edible mushrooms using orthologous proteins revealed a close evolutionary relationship to Tricholoma matsutake, a high-value mushroom from Japan. Annotation results in this study have allowed us to understand wood- and chitin-degrading machinery in C. indica and other edible mushrooms and investigate pathways that produce novel secondary metabolites. This study also revealed that C. indica possesses distinct enzymes linked with various metabolic pathways and abundant hydrophobin, glycosyltransferase, and laccase genes. Only one tyrosinase gene was identified in C. indica, markedly less than the six tyrosinase genes in A. bisporus, and may be responsible for its long shelf-life. CAZyme analysis showed an exceptionally high content of genes in the GH family and a CAZyme distribution most consistent with white-rot fungi. To the best of our knowledge, this genome-wide assembly and annotation data represent the first genome-scale assembly of this species. Research funding from this study has helped us understand the significance and usefulness of C. indica in the biotechnology, food, and agricultural industries.



Calocybe indica, Edible mushroom, Genomics, WGS, Genome annotation, Genome assembly, Bioinformatics