Skip to content

COCONUT Database Downloads

COCONUT provides users with various download options for obtaining chemical structures of natural products in widely accepted and machine-readable formats. These downloads are updated monthly and contain all cumulative changes made to the database during the previous month.

Downloads Page

Database Downloads

Postgres Database Dump

A complete database dump containing the following tables:

  • citables
  • citations
  • collection_molecule
  • collections
  • entries
  • geo_location_molecule
  • geo_locations
  • molecule_organism
  • molecule_related
  • molecules
  • organism_parts
  • organisms
  • properties
  • structures
  • taggables
  • tags

CSV Format Files

Two CSV formatted files are available:

1. Lite Format

Contains essential molecular information including:

  • Identifiers (id, identifier, name, iupac_name)
  • Structural representations (canonical_smiles, standard_inchi, standard_inchi_key)
  • Physical properties (molecular_weight, formula, etc.)
  • Molecular descriptors (alogp, topological_polar_surface_area, etc.)
  • Structural features (rotatable_bond_count, hydrogen_bond_acceptors, etc.)
  • Classification data (chemical_class, chemical_sub_class, etc.)

2. Full Format

  • Includes all columns from the Lite format
  • Additional data: organisms, collections, DOIs, synonyms, CAS registry numbers

SDF (Structure-Data File) Format

Generated using RDKit, these files provide molecular structures in standard chemical format:

1. 2D Coordinate Files

  • Lite: 2D coordinate data with COCONUT identifier
  • Full: 2D coordinates plus comprehensive molecular data including:
    • Structural identifiers
    • Physical properties
    • Molecular descriptors
    • Classification data
    • Biological sources
    • Literature references

2. 3D Coordinate Files

  • Single file containing 3D coordinates with corresponding COCONUT identifiers

Collection-Specific Downloads

Collection SDFs Downloads Individual collection pages offer targeted SDF downloads containing:

  • 2D coordinate information (generated by RDKit)
  • Comprehensive molecular data
    • Structural identifiers
    • Physical properties
    • Molecular descriptors
    • Classification data
    • Biological sources
    • Literature references

Understanding Molecule Count Differences

The number of molecules in collection-wise downloads may differ from counts shown on search pages:

ContextCount ExampleInclusion Criteria
Search pages12,759Active molecules excluding parents
Downloads14,494Active molecules including parentless parents

This difference is demonstrated by the following SQL queries:

sql
-- Search count (active + not parent)
SELECT count(*)
FROM molecules m
JOIN collection_molecule cm ON m.id = cm.molecule_id
JOIN collections c ON cm.collection_id = c.id
WHERE c.title = 'Australian natural products' 
  AND m.active=true 
  AND m.is_parent=false;
  
-- Download count (active + not parent OR parent without variants)
SELECT count(*)
FROM molecules m
JOIN collection_molecule cm ON m.id = cm.molecule_id
JOIN collections c ON cm.collection_id = c.id
WHERE c.title = 'Australian natural products' 
  AND m.active=true 
  AND (m.is_parent=false OR (m.is_parent=true AND has_variants=false));

Important Notice

The COCONUT dataset is subject to specific terms of use and licensing restrictions. Please review and comply with all associated terms and conditions.