COCONUT Database Downloads
COCONUT provides users with various download options for obtaining chemical structures of natural products in widely accepted and machine-readable formats. These downloads are updated monthly and contain all cumulative changes made to the database during the previous month.
Downloads Page
Postgres Database Dump
A complete database dump containing the following tables:
- citables
- citations
- collection_molecule
- collections
- entries
- geo_location_molecule
- geo_locations
- molecule_organism
- molecule_related
- molecules
- organism_parts
- organisms
- properties
- structures
- taggables
- tags
CSV Format Files
Two CSV formatted files are available:
1. Lite Format
Contains essential molecular information including:
- Identifiers (id, identifier, name, iupac_name)
- Structural representations (canonical_smiles, standard_inchi, standard_inchi_key)
- Physical properties (molecular_weight, formula, etc.)
- Molecular descriptors (alogp, topological_polar_surface_area, etc.)
- Structural features (rotatable_bond_count, hydrogen_bond_acceptors, etc.)
- Classification data (chemical_class, chemical_sub_class, etc.)
2. Full Format
- Includes all columns from the Lite format
- Additional data: organisms, collections, DOIs, synonyms, CAS registry numbers
SDF (Structure-Data File) Format
Generated using RDKit, these files provide molecular structures in standard chemical format:
1. 2D Coordinate Files
- Lite: 2D coordinate data with COCONUT identifier
- Full: 2D coordinates plus comprehensive molecular data including:
- Structural identifiers
- Physical properties
- Molecular descriptors
- Classification data
- Biological sources
- Literature references
2. 3D Coordinate Files
- Single file containing 3D coordinates with corresponding COCONUT identifiers
Collection-Specific Downloads
Individual collection pages offer targeted SDF downloads containing:
- 2D coordinate information (generated by RDKit)
- Comprehensive molecular data
- Structural identifiers
- Physical properties
- Molecular descriptors
- Classification data
- Biological sources
- Literature references
Understanding Molecule Count Differences
The number of molecules in collection-wise downloads may differ from counts shown on search pages:
Context | Count Example | Inclusion Criteria |
---|---|---|
Search pages | 12,759 | Active molecules excluding parents |
Downloads | 14,494 | Active molecules including parentless parents |
This difference is demonstrated by the following SQL queries:
-- Search count (active + not parent)
SELECT count(*)
FROM molecules m
JOIN collection_molecule cm ON m.id = cm.molecule_id
JOIN collections c ON cm.collection_id = c.id
WHERE c.title = 'Australian natural products'
AND m.active=true
AND m.is_parent=false;
-- Download count (active + not parent OR parent without variants)
SELECT count(*)
FROM molecules m
JOIN collection_molecule cm ON m.id = cm.molecule_id
JOIN collections c ON cm.collection_id = c.id
WHERE c.title = 'Australian natural products'
AND m.active=true
AND (m.is_parent=false OR (m.is_parent=true AND has_variants=false));
Important Notice
The COCONUT dataset is subject to specific terms of use and licensing restrictions. Please review and comply with all associated terms and conditions.