The Microbiome Data Landscape in 2026: A Practical Guide to Every Database You Should Know
Where to find microbiome data, what each source is good for, and how they connect
If you’re starting a microbiome research project and want to know what’s already known about your organisms of interest, you’re going to hit at least half a dozen databases. Each covers a different angle. None of them talk to each other particularly well.
This is a practical guide — not a review paper, but a working reference for researchers who need to find data.
Taxonomy & Classification
DatabaseWhat it isWhen to use itNCBI TaxonomyThe canonical taxonomy — 1.1M+ named taxa with hierarchical classificationMapping organism names to stable IDs. Your reference point for “what is this thing?”GTDB (Genome Taxonomy Database)Genome-based taxonomy that often disagrees with NCBIWhen you need phylogenetically consistent classification, especially for archaeaSILVACurated ribosomal RNA gene databaseReference for 16S/18S amplicon analysis. Three versions: full, non-redundant, and clusteredGreengenes2Unified 16S + WGS taxonomyIf you’re using QIIME2 and want to compare amplicon and shotgun results
Key tension: NCBI uses a phenotype-based taxonomy. GTDB uses a genome-based taxonomy. They disagree on hundreds of genus- and family-level assignments. Know which one your pipeline uses.
Disease & Phenotype Associations
DatabaseWhat it isWhen to use itDisbiomeCurated microbe-disease associations from literatureFinding which taxa are increased/decreased in a condition. ~3,000 entriesBugSigDBStandardized microbiome signatures from published studiesMeta-analysis of differential abundance results. Richer metadata than DisbiomegutMDisorderGut microbiota and disordersFocused on gut-specific associations with free-text contextGMrepoCurated human gut metagenome repositoryWhen you want actual abundance data, not just associations
Practical tip: These databases overlap significantly but use different organism naming conventions. Cross-referencing requires taxonomy normalization. This is one of the reasons we built MicroMap — it reconciles entities across all of these.
Metabolites & Metabolomics
DatabaseWhat it isWhen to use itHMDB (Human Metabolome Database)220,000+ metabolite entries with biological contextUnderstanding what metabolites are, where they’re found, and what pathways they participate inPubChemChemical structures, properties, and bioactivitiesCross-referencing compound identifiers and finding chemical propertiesMiMeDB (Microbial Metabolome Database)Metabolites produced or modified by microbesWhen you specifically need microbial-origin metabolites
Pathways & Functional Analysis
DatabaseWhat it isWhen to use itKEGGMetabolic pathways, modules, and orthologsPathway enrichment analysis, functional annotation of metagenomesReactomeCurated biological pathways with visualizationWhen you need detailed mechanistic pathway maps, especially for human biologyMetaCycMetabolic pathways from all domains of lifeReference for microbial metabolic capabilities
Antimicrobial Resistance
DatabaseWhat it isWhen to use itCARD (Comprehensive Antibiotic Resistance Database)AMR gene sequences, resistance mechanisms, and ontologyIdentifying resistance genes in metagenomes, understanding resistance mechanismsResFinderAMR gene detection in sequencing dataScreening clinical or environmental isolates for resistance
Drugs & Targets
DatabaseWhat it isWhen to use itChEMBLBioactive drug-like compounds and their targetsUnderstanding drug-microbiome interactions, finding compounds that target specific pathwaysDrugBankApproved and experimental drugs with pharmacological dataClinical drug information, drug-microbiome interaction research
Literature & Papers
DatabaseWhat it isWhen to use itPubMed36M+ biomedical citationsLiterature search. Always the starting pointbioRxiv/medRxivPreprints (not peer-reviewed)Staying current — many microbiome studies appear here 6-12 months before publicationSemantic ScholarAI-powered academic search with citation graphsFinding papers by concept rather than keyword, citation analysis
The Integration Problem
Here’s the core issue: these databases were built independently, by different teams, with different identifiers, different update schedules, and different quality standards. To answer even a moderately complex question — “What metabolites produced by gut bacteria associated with Parkinson’s disease are involved in dopamine metabolism?” — you need to traverse across at least four of them.
This is why we built MicroMap as a knowledge graph that integrates across these sources. But even if you don’t use our tool, the mental model is valuable: think of biological data as a graph of entities and relationships, not as isolated tables.
Recommendations for new researchers:
Pick your taxonomy reference early and stick with it. NCBI Taxonomy is the safest default.
Check multiple association databases. Disbiome and BugSigDB often have complementary coverage.
Always check provenance. “This taxon is associated with disease X” is only as good as the study behind it.
Bookmark HMDB and KEGG. You’ll use them constantly for metabolite and pathway context.
Use MicroMap (kgdev.graphomics.com/docs) if you want a single API that spans all of the above. Ask us for an API key at graphomics.com.
Next time: A deep dive into how alpha and beta diversity metrics actually work — intuition, math, and when to use each one.
