The Microbiome Data Landscape in 2026: A Practical Guide to Every Database You Should Know

Where to find microbiome data, what each source is good for, and how they connect

Apr 24, 2026

If you’re starting a microbiome research project and want to know what’s already known about your organisms of interest, you’re going to hit at least half a dozen databases. Each covers a different angle. None of them talk to each other particularly well.

This is a practical guide — not a review paper, but a working reference for researchers who need to find data.

Taxonomy & Classification

DatabaseWhat it isWhen to use itNCBI TaxonomyThe canonical taxonomy — 1.1M+ named taxa with hierarchical classificationMapping organism names to stable IDs. Your reference point for “what is this thing?”GTDB (Genome Taxonomy Database)Genome-based taxonomy that often disagrees with NCBIWhen you need phylogenetically consistent classification, especially for archaeaSILVACurated ribosomal RNA gene databaseReference for 16S/18S amplicon analysis. Three versions: full, non-redundant, and clusteredGreengenes2Unified 16S + WGS taxonomyIf you’re using QIIME2 and want to compare amplicon and shotgun results

Key tension: NCBI uses a phenotype-based taxonomy. GTDB uses a genome-based taxonomy. They disagree on hundreds of genus- and family-level assignments. Know which one your pipeline uses.

Disease & Phenotype Associations

DatabaseWhat it isWhen to use itDisbiomeCurated microbe-disease associations from literatureFinding which taxa are increased/decreased in a condition. ~3,000 entriesBugSigDBStandardized microbiome signatures from published studiesMeta-analysis of differential abundance results. Richer metadata than DisbiomegutMDisorderGut microbiota and disordersFocused on gut-specific associations with free-text contextGMrepoCurated human gut metagenome repositoryWhen you want actual abundance data, not just associations

Practical tip: These databases overlap significantly but use different organism naming conventions. Cross-referencing requires taxonomy normalization. This is one of the reasons we built MicroMap — it reconciles entities across all of these.

Metabolites & Metabolomics

DatabaseWhat it isWhen to use itHMDB (Human Metabolome Database)220,000+ metabolite entries with biological contextUnderstanding what metabolites are, where they’re found, and what pathways they participate inPubChemChemical structures, properties, and bioactivitiesCross-referencing compound identifiers and finding chemical propertiesMiMeDB (Microbial Metabolome Database)Metabolites produced or modified by microbesWhen you specifically need microbial-origin metabolites

Pathways & Functional Analysis

DatabaseWhat it isWhen to use itKEGGMetabolic pathways, modules, and orthologsPathway enrichment analysis, functional annotation of metagenomesReactomeCurated biological pathways with visualizationWhen you need detailed mechanistic pathway maps, especially for human biologyMetaCycMetabolic pathways from all domains of lifeReference for microbial metabolic capabilities

Antimicrobial Resistance

DatabaseWhat it isWhen to use itCARD (Comprehensive Antibiotic Resistance Database)AMR gene sequences, resistance mechanisms, and ontologyIdentifying resistance genes in metagenomes, understanding resistance mechanismsResFinderAMR gene detection in sequencing dataScreening clinical or environmental isolates for resistance

Drugs & Targets

DatabaseWhat it isWhen to use itChEMBLBioactive drug-like compounds and their targetsUnderstanding drug-microbiome interactions, finding compounds that target specific pathwaysDrugBankApproved and experimental drugs with pharmacological dataClinical drug information, drug-microbiome interaction research

Literature & Papers

DatabaseWhat it isWhen to use itPubMed36M+ biomedical citationsLiterature search. Always the starting pointbioRxiv/medRxivPreprints (not peer-reviewed)Staying current — many microbiome studies appear here 6-12 months before publicationSemantic ScholarAI-powered academic search with citation graphsFinding papers by concept rather than keyword, citation analysis

The Integration Problem

Here’s the core issue: these databases were built independently, by different teams, with different identifiers, different update schedules, and different quality standards. To answer even a moderately complex question — “What metabolites produced by gut bacteria associated with Parkinson’s disease are involved in dopamine metabolism?” — you need to traverse across at least four of them.

This is why we built MicroMap as a knowledge graph that integrates across these sources. But even if you don’t use our tool, the mental model is valuable: think of biological data as a graph of entities and relationships, not as isolated tables.

Recommendations for new researchers:

Pick your taxonomy reference early and stick with it. NCBI Taxonomy is the safest default.
Check multiple association databases. Disbiome and BugSigDB often have complementary coverage.
Always check provenance. “This taxon is associated with disease X” is only as good as the study behind it.
Bookmark HMDB and KEGG. You’ll use them constantly for metabolite and pathway context.
Use MicroMap (kgdev.graphomics.com/docs) if you want a single API that spans all of the above. Ask us for an API key at graphomics.com.

Next time: A deep dive into how alpha and beta diversity metrics actually work — intuition, math, and when to use each one.

Graphomics

Discussion about this post

Ready for more?