EMICSS

  • Space
    Structured EM annotations
  • Space
    Links on the EMDB entry pages.
  • Space
    Dynamic visualisation.
  • Space
    Available in the search engine and REST API.

EMICSS stands for EMDB Integration with Complexes, Structures and Sequences. This service provides weekly updated cross-reference information for all EMDB entries, including both entry-level annotations (e.g., publication, corresponding PDB and EMPIAR entries, etc.) and sample-level (e.g., UniProt identifiers, AlphaFold DB models, etc.) annotations. The information from EMICSS is used on the EMDB website to provide relevant links and annotation for individual entries and sample components. The search system also takes advantage of this data to enable advanced queries not otherwise possible.

EMICSS is produced by an automated pipeline that produces XML files (one for every EMDB entry) and TSV files (one for every resource that is used by EMICSS). All EMICSS data is generated afresh with every weekly EMDB release. The structure and content of the EMICSS XML files is described by an XSD data model (available from https://ftp.ebi.ac.uk/pub/databases/em_ebi/emdb_related/emicss/emicss-schema/current/emdb_emicss.xsd). The EMICSS XML file for an EMDB entry can be accessed through the corresponding entry page on the EMDB website, either via the 'Links' tab or using the ‘Download’ dropdown menu. Sample-level annotations are also shown in the sample tab of every EMDB entry page. The EMICSS FTP area contains both the XML and TSV files. EMICSS can also be accessed through the EMDB annotation API.

If you use EMICSS, please cite: A. Duraisamy, N. Fonseca, G. J. Kleywegt and A. Patwardhan, "EMICSS: Value-added annotations for EMDB entries", manuscript in preparation (2023).

All EMICSS data is free to download and use; we encourage resources that index or expose EMDB data to make use of them. If you have any questions or suggestions regarding EMICSS, please contact the EMDB helpdesk.

The table below shows the provenance of the various information items collected by EMICSS. Note that much of the data about individual molecules that have been modelled into the EM volume are obtained from the PDBe/UniProt SIFTS resource.

Information Source
EMPIAR id(s) EMPIAR
PDB id(s) EMDB
Sample Weight EMDB/PDBe
DOI EMDB/Europe PMC
PubMed id EMDB/Europe PMC
PMC id EMDB/Europe PMC
ISSN EMDB
ORCID identifiers Europe PMC
UniProt id(s) EMDB/UniProt
PDBe-KB links UniProt
Complex Portal id(s) Complex Portal/UniProt
Gene Ontology terms PDBe/UniProt/QuickGO/EMDB
InterPro mappings PDBe/UniProt/InterPro/EMDB
Pfam domains PDBe/UniProt/Pfam/EMDB
CATH domains PDBe/UniProt
SCOP domains PDBe/UniProt
SCOP2 domains PDBe/UniProt
ChEMBL id(s) PDBe CCD
ChEBI id(s) PDBe CCD
DrugBank id(s) PDBe CCD
AlphaFold DB links UniProt/AlphaFold DB

Downloads

Directory Description
https://ftp.ebi.ac.uk/pub/databases/em_ebi/emdb_related/emicss/ EMICSS FTP area
https://ftp.ebi.ac.uk/pub/databases/em_ebi/emdb_related/emicss/emicss-schema/current/ EMICSS data model
https://ftp.ebi.ac.uk/pub/databases/em_ebi/emdb_related/emicss/entries/ EMICSS XML files for EMDB entries
https://ftp.ebi.ac.uk/pub/databases/em_ebi/emdb_related/emicss/entries.tar.gz Whole-archive EMICSS XML files for EMDB entries compressed (tar.gz)
https://ftp.ebi.ac.uk/pub/databases/em_ebi/emdb_related/emicss/resources/ EMICSS TSV files for external resources
https://ftp.ebi.ac.uk/pub/databases/em_ebi/emdb_related/emicss/resources.tar.gz Whole-archive EMICSS TSV files for external resources compressed (tar.gz)

Organisation of the per-entry information

For EMDB entries with a 4-digit identifier (e.g., EMD-8117), the directories are grouped by the first two digits (in the example, /81/). The next level of the directory tree then consists of the entire 4-digit code. In this example, the full directory path will thus be: https://ftp.ebi.ac.uk/pub/databases/em_ebi/emdb_related/emicss/entries/81/8117/. The file name in that directory will be emd_8117_emicss.xml

For EMDB entries with a 5-digit identifier (e.g., EMD-28754), there is an additional level. The first level will consist of the first two digits (/28/), the second level of the third digit (/7/) and the lowest level of the entire 5-digit code. In this example, the full directory path will thus be: https://ftp.ebi.ac.uk/pub/databases/em_ebi/emdb_related/emicss/entries/28/7/28754/. The filename in that directory will be emd_28754_emicss.xml

Note: if you want to download the entire set of EMICSS XML files for all entries it will be significantly faster to download the tarball (see table above). Note that EMICSS mappings are transient. We therefore regenerate all EMICSS files every week, thus every file will differ from the same file a week earlier (even if the mapping may not have changed).

Statistics

The graph directly below provides information regarding the EMICSS coverage for the most recent release. Each bar indicates how many EMDB entries have one or more references to that resource (hover over a bar to see the exact count). The bottom graph tracks the development of this coverage over time.

Funding

The work on EMDB and EMICSS is funded by the Wellcome Trust (grant 212977/Z/18/Z) and EMBL-EBI.