• Register
X
Forgot Password

If you have forgotten your password you can enter your email here and get a temporary password sent to your email.

X

Leaving Community

Are you sure you want to leave this community? Leaving the community will revoke any permissions you have been granted in this community.

No
Yes
X
Forgot Password

If you have forgotten your password you can enter your email here and get a temporary password sent to your email.

Norway

Suggested data repositories  


Where can I deposit my data?

We have organized a list of data repositories that are recommended by the following sources: NIDDK domain experts, Nature Scientific Data, PLOS One, NLM NIH Data Sharing Repositories, Science. It is generally best practice to deposit data into a discipline-specific and community recognized repository if one is available, or into an institutional or generalist repository (Generalist Repository Comparison Chart, developed by representatives of participating generalist repositories) if no suitable specialist repository is available.

By scientific disciplines

De-identified human clinical research data

    Clinical trial data is encouraged to be submitted to the ClinicalTrials.gov even if it is not required. For studies include human genomic and associate phenotypic data, you can consider NIH database of Genotypes and Phenotypes (dbGaP). Another repository that you can consider is ICPSR, which hosts a variety of human data, including many demographic and social science studies. Information on uploading data to ICPSR can be found here. Before uploading data, please note that the data should be de-identified, and you should follow all your institutional IRB's requirements and receive approvals. For completed phase I-IV interventional studies, you can also share anonymized data at Vivli.


NIDDK-specific repositories

Repository Name RRID Description Type of Data Recommended By
Accelerating Medicines Partnership Type 2 Diabetes Knowledge Portal (AMP-T2D)
RRID:SCR_003743
   
Portal and database of DNA sequence, functional and epigenomic information, and clinical data from studies on type 2 diabetes and analytic tools to analyze these data. .Provides data and tools to promote understanding and treatment of type 2 diabetes and its complications. Used for identifying genetic biomarkers correlated to Type 2 diabetes and development of novel drugs for this disease.Array, exome sequencing, whole genome sequencing dataNLM, NIDDK
NIDDK Central Repository
RRID:SCR_006542
   
NIDDK Central Repositories are two separate contract funded components that work together to store data and samples from significant, NIDDK funded studies. First component is Biorepository that gathers, stores, and distributes biological samples from studies. Biorepository works with investigators in new and ongoing studies as realtime storage facility for archival samples.Second component is Data Repository that gathers, stores and distributes incremental or finished datasets from NIDDK funded studies Data Repository helps active data coordinating centers prepare databases and incremental datasets for archiving and for carrying out restricted queries of stored databases. Data Repository serves as Data Coordinating Center and website manager for NIDDK Central Repositories website.Genetic and other data collected in designated NIDDK-funded clinical studiesNLM, NIDDK



NIH-supported repositories (for complete and current list of NIH repositories click here)

Repository Name RRID Description Type of Data Recommended By
Accelerating Medicines Partnership Type 2 Diabetes Knowledge Portal (AMP-T2D)
RRID:SCR_003743
   
Portal and database of DNA sequence, functional and epigenomic information, and clinical data from studies on type 2 diabetes and analytic tools to analyze these data. .Provides data and tools to promote understanding and treatment of type 2 diabetes and its complications. Used for identifying genetic biomarkers correlated to Type 2 diabetes and development of novel drugs for this disease.Array, exome sequencing, whole genome sequencing dataNLM, NIDDK
Analysis, Visualization, and Informatics Lab-space (AnVIL)
RRID:SCR_017469
  
Portal to facilitate integration and computing on and across large datasets generated by NHGRI programs, as well as initiatives funded by National Institutes of Health or by other agencies that support human genomics research. Resource for genomic scientific community, that leverages cloud based infrastructure for democratizing genomic data access, sharing and computing across large genomic, and genomic related data sets. Component of federated data ecosystem, and is expected to collaborate and integrate with other genomic data resources through adoption of FAIR (Findable, Accessible, Interoperable, Reusable) principles, as their specifications emerge from scientific community. Will provide collaborative environment, where datasets and analysis workflows can be shared within consortium and be prepared for public release to broad scientific community through AnVIL user interfaces.Genomic dataNLM, NIDDK
Cancer Imaging Archive (TCIA)
RRID:SCR_008927
    
Archive of medical images of cancer accessible for public download. All images are stored in DICOM file format and organized as Collections, typically patients related by common disease (e.g. lung cancer), image modality (MRI, CT, etc) or research focus. Neuroimaging data sets include clinical outcomes, pathology, and genomics in addition to DICOM images. Submitting Data Proposals are welcomed.Primary DICOM image datasets from cancer patients and analysis datasetsNLM, NIDDK, PLoS ONE, Sci Data
Cancer Nanotechnology Laboratory (caNanoLab)
RRID:SCR_013717
   
Data sharing portal designed to facilitate information sharing across international biomedical nanotechnology research community to expedite and validate use of nanotechnology in biomedicine.Physico-chemical, in vitro and in vivo assays data that characterize nanomaterials. *This is a curated resource which may not accept direct submission of data. Contact the database directly for further informationNLM, PLoS ONE, Sci Data
Cell Image Library (CIL)
RRID:SCR_003510
    
Freely accessible, public repository of vetted and annotated microscopic images, videos, and animations of cells from a variety of organisms, showcasing cell architecture, intracellular functionalities, and both normal and abnormal processes. Explore by Cell Process, Cell Component, Cell Type or Organism. The Cell includes images acquired from historical and modern collections, publications, and by recruitment.Microscopic imaging dataNLM, NIDDK
ClinicalTrials.gov
RRID:SCR_002309
    
Registry and results database of federally and privately supported clinical trials conducted in United States and around world. Provides information about purpose of trial, who may participate, locations, and phone numbers for more details. This information should be used in conjunction with advice from health care professionals.Offers information for locating federally and privately supported clinical trials for wide range of diseases and conditions. Research study in human volunteers to answer specific health questions. Interventional trials determine whether experimental treatments or new ways of using known therapies are safe and effective under controlled environments. Observational trials address health issues in large groups of people or populations in natural settings. ClinicalTrials.gov contains trials sponsored by National Institutes of Health, other federal agencies, and private industry. Studies listed in database are conducted in all 50 States and in 178 countries.Clinical trialNLM, NIDDK, Sci Data
DNA DataBank of Japan (DDBJ)
RRID:SCR_002359
    
Maintains and provides archival, retrieval and analytical resources for biological information. Central DDBJ resource consists of public, open-access nucleotide sequence databases including raw sequence reads, assembly information and functional annotation. Database content is exchanged with EBI and NCBI within the framework of the International Nucleotide Sequence Database Collaboration (INSDC). In 2011, DDBJ launched two new resources: DDBJ Omics Archive and BioProject. DOR is archival database of functional genomics data generated by microarray and highly parallel new generation sequencers. Data are exchanged between the ArrayExpress at EBI and DOR in the common MAGE-TAB format. BioProject provides organizational framework to access metadata about research projects and data from projects that are deposited into different databases.Gene sequenceNLM, NIDDK, Science, PLoS ONE, Sci Data
Database of Interacting Proteins (DIP)
RRID:SCR_003167
    
Database to catalog experimentally determined interactions between proteins combining information from a variety of sources to create a single, consistent set of protein-protein interactions that can be downloaded in a variety of formats. The data were curated, both, manually and also automatically using computational approaches that utilize the the knowledge about the protein-protein interaction networks extracted from the most reliable, core subset of the DIP data. Because the reliability of experimental evidence varies widely, methods of quality assessment have been developed and utilized to identify the most reliable subset of the interactions. This CORE set can be used as a reference when evaluating the reliability of high-throughput protein-protein interaction data sets, for development of prediction methods, as well as in the studies of the properties of protein interaction networks. Tools are available to analyze, visualize and integrate user's own experimental data with the information about protein-protein interactions available in the DIP database. The DIP database lists protein pairs that are known to interact with each other. By interact they mean that two amino acid chains were experimentally identified to bind to each other. The database lists such pairs to aid those studying a particular protein-protein interaction but also those investigating entire regulatory and signaling pathways as well as those studying the organization and complexity of the protein interaction network at the cellular level. Registration is required to gain access to most of the DIP features. Registration is free to the members of the academic community. Trial accounts for the commercial users are also available.Protein interaction dataNLM, NIDDK, PLoS ONE, Sci Data
European Nucleotide Archive (ENA)
RRID:SCR_006515
    
Public archive providing a comprehensive record of the world''''s nucleotide sequencing information, covering raw sequencing data, sequence assembly information and functional annotation. All submitted data, once public, will be exchanged with the NCBI and DDBJ as part of the INSDC data exchange agreement. The European Nucleotide Archive (ENA) captures and presents information relating to experimental workflows that are based around nucleotide sequencing. A typical workflow includes the isolation and preparation of material for sequencing, a run of a sequencing machine in which sequencing data are produced and a subsequent bioinformatic analysis pipeline. ENA records this information in a data model that covers input information (sample, experimental setup, machine configuration), output machine data (sequence traces, reads and quality scores) and interpreted information (assembly, mapping, functional annotation). Data arrive at ENA from a variety of sources including submissions of raw data, assembled sequences and annotation from small-scale sequencing efforts, data provision from the major European sequencing centers and routine and comprehensive exchange with their partners in the International Nucleotide Sequence Database Collaboration (INSDC). Provision of nucleotide sequence data to ENA or its INSDC partners has become a central and mandatory step in the dissemination of research findings to the scientific community. ENA works with publishers of scientific literature and funding bodies to ensure compliance with these principles and to provide optimal submission systems and data access tools that work seamlessly with the published literature. ENA is made up of a number of distinct databases that includes the EMBL Nucleotide Sequence Database (Embl-Bank), the newly established Sequence Read Archive (SRA) and the Trace Archive. The main tool for downloading ENA data is the ENA Browser, which is available through REST URLs for easy programmatic use. All ENA data are available through the ENA Browser. Note: EMBL Nucleotide Sequence Database (EMBL-Bank) is entirely included within this resource.Sequence reads, genome assemblies, targeted assembled and annotated sequencesNLM, NIDDK, Science, PLoS ONE, Sci Data
FlyBase
RRID:SCR_006549
   
Database of Drosophila genetic and genomic information with information about stock collections and fly genetic tools. Gene Ontology (GO) terms are used to describe three attributes of wild-type gene products: their molecular function, the biological processes in which they play a role, and their subcellular location. Additionally, FlyBase accepts data submissions. FlyBase can be searched for genes, alleles, aberrations and other genetic objects, phenotypes, sequences, stocks, images and movies, controlled terms, and Drosophila researchers using the tools available from the "Tools" drop-down menu in the Navigation bar.Proteomics data, microarrays, tiling BACsNLM, NIDDK
GenBank
RRID:SCR_002760
    
NIH genetic sequence database that provides annotated collection of all publicly available DNA sequences for almost 280 000 formally described species (Jan 2014) .These sequences are obtained primarily through submissions from individual laboratories and batch submissions from large-scale sequencing projects, including whole-genome shotgun (WGS) and environmental sampling projects. Most submissions are made using web-based BankIt or standalone Sequin programs, and GenBank staff assigns accession numbers upon data receipt. It is part of International Nucleotide Sequence Database Collaboration and daily data exchange with European Nucleotide Archive (ENA) and DNA Data Bank of Japan (DDBJ) ensures worldwide coverage. GenBank is accessible through NCBI Entrez retrieval system, which integrates data from major DNA and protein sequence databases along with taxonomy, genome, mapping, protein structure and domain information, and biomedical journal literature via PubMed. BLAST provides sequence similarity searches of GenBank and other sequence databases. Complete bimonthly releases and daily updates of GenBank database are available by FTP.Gene sequenceNLM, NIDDK, Science, PLoS ONE, Sci Data
Gene Expression Omnibus (GEO)
RRID:SCR_005012
    
Functional genomics data repository supporting MIAME-compliant data submissions. Includes microarray-based experiments measuring the abundance of mRNA, genomic DNA, and protein molecules, as well as non-array-based technologies such as serial analysis of gene expression (SAGE) and mass spectrometry proteomic technology. Array- and sequence-based data are accepted. Collection of curated gene expression DataSets, as well as original Series and Platform records. The database can be searched using keywords, organism, DataSet type and authors. DataSet records contain additional resources including cluster tools and differential expression queries.Microarray; next-generation sequencing (NGS)NLM, NIDDK, Science, PLoS ONE, Sci Data
Metabolomics Workbench
RRID:SCR_013794
    
Repository for metabolomics data and metadata which provides analysis tools and access to various resources. NIH grantees may upload data and general users can search metabolomics database. Provides protocols for sample preparation and analysis, information about NIH Metabolomics Program, data sharing guidelines, funding opportunities, services offered by its Regional Comprehensive Metabolomics Resource Cores (RCMRC)s, and training workshops.Metabolomics data and metadataNLM, NIDDK, PLoS ONE
NCBI Sequence Read Archive (SRA)
RRID:SCR_004891
    
Repository of raw sequencing data from next generation of sequencing platforms including including Roche 454 GS System, Illumina Genome Analyzer, Applied Biosystems SOLiD System, Helicos Heliscope, Complete Genomics, and Pacific Biosciences SMRT. In addition to raw sequence data, SRA now stores alignment information in form of read placements on reference sequence. Data submissions are welcome. Archive of high throughput sequencing data,part of international partnership of archives (INSDC) at NCBI, European Bioinformatics Institute and DNA Database of Japan. Data submitted to any of this three organizations are shared among them.NGS data only; Multiple sequence alignmentNLM, PLoS ONE, Sci Data
NCBI database of Genotypes and Phenotypes (dbGap)
RRID:SCR_002709
    
Database developed to archive and distribute clinical data and results from studies that have investigated interaction of genotype and phenotype in humans. Database to archive and distribute results of studies including genome-wide association studies, medical sequencing, molecular diagnostic assays, and association between genotype and non-clinical traits.Genotyping and phenotyping information in human subjectsNLM, NIDDK, PLoS ONE, Sci Data
NIDDK Central Repository
RRID:SCR_006542
   
NIDDK Central Repositories are two separate contract funded components that work together to store data and samples from significant, NIDDK funded studies. First component is Biorepository that gathers, stores, and distributes biological samples from studies. Biorepository works with investigators in new and ongoing studies as realtime storage facility for archival samples.Second component is Data Repository that gathers, stores and distributes incremental or finished datasets from NIDDK funded studies Data Repository helps active data coordinating centers prepare databases and incremental datasets for archiving and for carrying out restricted queries of stored databases. Data Repository serves as Data Coordinating Center and website manager for NIDDK Central Repositories website.Genetic and other data collected in designated NIDDK-funded clinical studiesNLM, NIDDK
NIMH Data Archive
RRID:SCR_004434
     
The National Institute of Mental Health Data Archive (NDA) makes available human subjects data collected from hundreds of research projects across many scientific domains. Research data repository for data sharing and collaboration among investigators. Used to accelerate scientific discovery through data sharing across all of mental health and other research communities, data harmonization and reporting of research results. Infrastructure created by National Database for Autism Research (NDAR), Research Domain Criteria Database (RDoCdb), National Database for Clinical Trials related to Mental Illness (NDCT), and NIH Pediatric MRI Repository (PedsMRI).Imaging data; neurosignal recordings data; autism spectrum disorder-relevant data (all data types); clinical data, genomic data, phenotype dataNLM, NIDDK, PLoS ONE
National Addiction and HIV Data Archive Program (NAHDAP)
RRID:SCR_000636
    
Archive that acquires, preserves and disseminates data relevant to drug addiction and HIV research. Collection of data on drug addiction and HIV infection in United States. Most of datasets are raw data from surveys, interviews, and administrative records. They were originally gathered in research projects and for administrative purposes. Some datasets have been used in published studies. Bibliographies of these studies are available . Provides access to research data and technical assistance for data depositors. Provides e-workshops on data preparation and data systems.Wide variety of data related to drug abuseNLM, NIDDK, PLoS ONE
NeuroMorpho.Org
RRID:SCR_002145
     
Centrally curated inventory of digitally reconstructed neurons associated with peer-reviewed publications that contains some of the most complete axonal arborizations digitally available in the community. Each neuron is represented by a unique identifier, general information (metadata), the original and standardized ASCII files of the digital morphological reconstruction, and a set of morphometric features. It contains contributions from over 100 laboratories worldwide and is continuously updated as new morphological reconstructions are collected, published, and shared. Users may browse by species, brain region, cell type or lab name. Users can also download morphological reconstructions for research and analysis. Deposition and distribution of reconstruction files ultimately prevents data loss. Centralized curation and annotation aims at minimizing the effort required by data owners while ensuring a unified format. It also provides a one-stop entry point for all available reconstructions, thus maximizing data visibility and impact.3D neuronal reconstructions and associated metadataNLM, NIDDK, PLoS ONE, Sci Data
OpenNeuro
RRID:SCR_005031
    
Open platform for analyzing and sharing neuroimaging data from human brain imaging research studies. Brain Imaging Data Structure ( BIDS) compliant database. Formerly known as OpenfMRI. Data archives to hold magnetic resonance imaging data. Platform for sharing MRI, MEG, EEG, iEEG, and ECoG data.Neuroscience research dataNLM, NIH BRAIN Initiative, NIDDK, PLoS ONE, Sci Data
PeptideAtlas
RRID:SCR_006783
    
Multi-organism, publicly accessible compendium of peptides identified in a large set of tandem mass spectrometry proteomics experiments. Mass spectrometer output files are collected for human, mouse, yeast, and several other organisms, and searched using the latest search engines and protein sequences. All results of sequence and spectral library searching are subsequently processed through the Trans Proteomic Pipeline to derive a probability of correct identification for all results in a uniform manner to insure a high quality database, along with false discovery rates at the whole atlas level. The raw data, search results, and full builds can be downloaded for other uses. All results of sequence searching are processed through PeptideProphet to derive a probability of correct identification for all results in a uniform manner ensuring a high quality database. All peptides are mapped to Ensembl and can be viewed as custom tracks on the Ensembl genome browser. The long term goal of the project is full annotation of eukaryotic genomes through a thorough validation of expressed proteins. The PeptideAtlas provides a method and a framework to accommodate proteome information coming from high-throughput proteomics technologies. The online database administers experimental data in the public domain. You are encouraged to contribute to the database.Tandem mass spectrometry proteomics data of peptidesNLM, NIDDK, PLoS ONE, Sci Data
PhysioNet
RRID:SCR_007345
    
Collection of dissemination and exchange recorded biomedical signals and open-source software for analyzing them. Provides facilities for cooperative analysis of data and evaluation of proposed new algorithm. Providies free electronic access to PhysioBank data and PhysioToolkit software. Offers service and training via on-line tutorials to assist users at entry and more advanced levels. In cooperation with annual Computing in Cardiology conference, PhysioNet hosts series of challenges, in which researchers and students address unsolved problems of clinical or basic scientific interest using data and software provided by PhysioNet. All data included in PhysioBank, and all software included in PhysioToolkit, are carefully reviewed. Researchers are further invited to contribute data and software for review and possible inclusion in PhysioBank and PhysioToolkit. Please review guidelines before submitting material.Physiologic dataNLM, NIDDK, PLoS ONE
Research Collaboratory for Structural Bioinformatics Protein Data Bank (RCSB PDB)
RRID:SCR_012820
   
Collection of structural data of biological macromolecules. Database of information about 3D structures of large biological molecules, including proteins and nucleic acids. Users can perform queries on data and analyze and visualize results.Three-dimensional (3D) biomolecular structure information, protein structure data, nucleic acids structure dataNLM, NIDDK
SPARC Project
RRID:SCR_017041
    
The SPARC data repository as of 2023 is an open data repository developed as part of the NIH SPARC initiative and has been used by SPARC funded investigator groups to curate and publish high quality datasets related to the autonomic nervous system. We are thrilled that as of August 2022, SPARC is accepting datasets from investigators that are not funded through the NIH SPARC program. The NIH's Common Fund Stimulating Peripheral Activity to Relieve Conditions (SPARC) program aims to transform our understanding of these nerve-organ interactions and ultimately advance neuromodulation field toward precise treatment of diseases and conditions for which conventional therapies fall short.All research data. Multi-modal data is submitted via the SPARC Dataset Structure.NIDDK, NLM
The Immunology Database and Analysis Portal (ImmPort)
RRID:SCR_012804
    
Data sharing repository of clinical trials, associated mechanistic studies, and other basic and applied immunology research programs. Platform to store, analyze, and exchange datasets for immune mediated diseases. Data supplied by NIAID/DAIT funded investigators and genomic, proteomic, and other data relevant to research of these programs extracted from public databases. Provides data analysis tools and immunology focused ontology to advance research in basic and clinical immunology.Immunology research data including experimental data and clinical trial dataNLM, NIDDK, PLoS ONE, Sci Data
UniProtKB
RRID:SCR_004426
    
Central repository for collection of functional information on proteins, with accurate and consistent annotation. In addition to capturing core data mandatory for each UniProtKB entry (mainly, the amino acid sequence, protein name or description, taxonomic data and citation information), as much annotation information as possible is added. This includes widely accepted biological ontologies, classifications and cross-references, and experimental and computational data. The UniProt Knowledgebase consists of two sections, UniProtKB/Swiss-Prot and UniProtKB/TrEMBL. UniProtKB/Swiss-Prot (reviewed) is a high quality manually annotated and non-redundant protein sequence database which brings together experimental results, computed features, and scientific conclusions. UniProtKB/TrEMBL (unreviewed) contains protein sequences associated with computationally generated annotation and large-scale functional characterization that await full manual annotation. Users may browse by taxonomy, keyword, gene ontology, enzyme class or pathway.Amino acid sequence and annotationNLM, Science, PLoS ONE, Sci Data
dbSNP
RRID:SCR_002338
    
Database as central repository for both single base nucleotide substitutions and short deletion and insertion polymorphisms. Distinguishes report of how to assay SNP from use of that SNP with individuals and populations. This separation simplifies some issues of data representation. However, these initial reports describing how to assay SNP will often be accompanied by SNP experiments measuring allele occurrence in individuals and populations. Community can contribute to this resource.Simple genetic polymorphisms or structural variationsNLM, NIDDK, PLoS ONE, Sci Data
dbVar
RRID:SCR_003219
    
Structural variation database designed to store data on variant DNA > / = 1 bp in size from all organisms. Associations of defined variants with phenotype information is also provided. Users can browse data containing number of variant cells from each study, and filter studies by organism, study type, method and genomic variant. Organisms include human, mouse, cattle and several additional animals.Simple genetic polymorphisms or structural variationsNLM, NIDDK, PLoS ONE, Sci Data



Institutional repository

Does your institution have a repository for managing and publishing your data? Usually, these services are provided by the library.
The Directory of Open Access Resources provides a directory of institutional repositories.



All research data types

Repository Name RRID Description Type of Data Recommended By
Dataverse Network Project
RRID:SCR_001997
      
Project portal for publishing, citing, sharing and discovering research data. Software, protocols, and community connections for creating research data repositories that automate professional archival practices, guarantee long term preservation, and enable researchers to share, retain control of, and receive web visibility and formal academic citations for their data contributions. Researchers, data authors, publishers, data distributors, and affiliated institutions all receive appropriate credit. Hosts multiple dataverses. Each dataverse contains studies or collections of studies, and each study contains cataloging information that describes the data plus the actual data files and complementary files. Data related to social sciences, health, medicine, humanities or other sciences with an emphasis in human behavior are uploaded to the IQSS Dataverse Network (Harvard). You can create your own dataverse for free and start adding studies for your data files and complementary material (documents, software, etc). You may install your own Dataverse Network for your University or organization.All research dataNIDDK, PLoS ONE, Sci Data
Dryad Digital Repository
RRID:SCR_005910
      
International, curated, digital repository that makes the data underlying scientific publications discoverable, freely reusable, and citable. Particularly data for which no specialized repository exists. Provides the infrastructure for, and promotes the re-use of, data underlying the scholarly literature. Governed by a nonprofit membership organization. Membership is open to any stakeholder organization, including but not limited to journals, scientific societies, publishers, research institutions, libraries, and funding organizations. Most data are associated with peer-reviewed articles, although data associated with non-peer reviewed publications from reputable academic sources, such as dissertations, are also accepted. Used to validate published findings, explore new analysis methodologies, repurpose data for research questions unanticipated by the original authors, and perform synthetic studies.UC system is member organization of Dryad general subject data repository.All research data. Most types of files can be submitted (e.g., text, spreadsheets, video, photographs, code) including compressed archives of multiple files. DRYAD encourage and accept quality research data to be published and preserved, regardless of whether they are associated with a journal article, book chapter, or other publication.NIDDK, Science, PLoS ONE, Sci Data
FigShare
RRID:SCR_004328
     
Repository for all data, figures, theses, publications, posters, presentations, filesets, videos, datasets, negative data in a citable, shareable and discoverable manner with Digital Object Identifiers. Allows to upload any file format to be made visualisable in the browser so that figures, datasets, media, papers, posters, presentations and filesets can be disseminated in a way that the current scholarly publishing model does not allow. Features integration with ORCID, Symplectic Elements, can import items from Github and is a source tracked by Altmetric.com. Figshare gives users unlimited public space and 1GB of private storage space for free. Data are digitally preserved by CLOCKSS. Supported by Digital Science, a division of Macmillan Publishers Limited, as a community-based, open science project that retains its autonomy.All research dataNIDDK, PLoS ONE, Sci Data
Mendeley Data
RRID:SCR_015671
      
Cloud-based data repository for storing, publishing and accessing scientific data. Mendeley Data creates a permanent location and issues Force 11 compliant citations for uploaded data.All research dataNIDDK
NIH Figshare Archive
RRID:SCR_017580
    
Repository to make datasets resulting from NIH funded research more accessible, citable, shareable, and discoverable. Data submitted will be reviewed to ensure there is no personally identifiable information in data and metadata prior to being published and in line with FAIR -Findable, Accessible, Interoperable, and Reusable principles. Data published on Figshare is assigned persistent, citable DOI (Digital Object Identifier) and is discoverable in Google, Google Scholar, Google Dataset Search, and more.Complited on July,2020. Researches can continue to share NIH funded data and other research product on figshare.com.All research data, datasets and spreadsheet data, video, audio, code (NIH-funded)NIDDK
SPARC Project
RRID:SCR_017041
    
The SPARC data repository as of 2023 is an open data repository developed as part of the NIH SPARC initiative and has been used by SPARC funded investigator groups to curate and publish high quality datasets related to the autonomic nervous system. We are thrilled that as of August 2022, SPARC is accepting datasets from investigators that are not funded through the NIH SPARC program. The NIH's Common Fund Stimulating Peripheral Activity to Relieve Conditions (SPARC) program aims to transform our understanding of these nerve-organ interactions and ultimately advance neuromodulation field toward precise treatment of diseases and conditions for which conventional therapies fall short.All research data. Multi-modal data is submitted via the SPARC Dataset Structure.NIDDK, NLM
ZENODO
RRID:SCR_004129
      
Repository for all research outputs from across all fields of science in any file format as well as both positive and negative results. They assign all publicly available uploads a Digital Object Identifier (DOI) to make the upload easily and uniquely citeable. They further support harvesting of all content via the OAI-PMH protocol. They promote peer-reviewed openly accessible research, and curate uploads. ZENODO allows users to create their own collection and accept or reject all uploads to it. They allow for uploading under a multitude of different licenses and access levels.Multidisciplinary research results including text, spreadsheet, audio, video, images, source codeNIDDK, PLoS ONE, Sci Data



Chemistry and chemical biology and biochemistry

Repository Name RRID Description Type of Data Recommended By
Cancer Nanotechnology Laboratory (caNanoLab)
RRID:SCR_013717
   
Data sharing portal designed to facilitate information sharing across international biomedical nanotechnology research community to expedite and validate use of nanotechnology in biomedicine.Physico-chemical, in vitro and in vivo assays data that characterize nanomaterials. *This is a curated resource which may not accept direct submission of data. Contact the database directly for further informationNLM, PLoS ONE, Sci Data
ChEMBL
RRID:SCR_014042
   
Collection of bioactive drug-like small molecules that contains 2D structures, calculated properties and abstracted bioactivities. Used for drug discovery and chemical biology research. Clinical progress of new compounds is continuously integrated into the database.Bioactivity data containing information mannually extracted from the medicinal chemistry literature. *This is a curated resource which may not accept direct submission of data. Contact the database directly for further informationNIDDK, Sci Data
Kinetic Models of Biological Systems (KiMoSys)
RRID:SCR_017423
   
Web application for quantitative KInetic MOdels of biological SYStems. Platform includes public data repository of relevant published measurements, including metabolite concentrations, flux data, and enzyme measurements and tools in order to build ODE-based kinetic model. Designed to search, exchange and disseminate experimental data and associated kinetic models for systems modeling community.Kinetic modelsNIDDK, PLoS ONE
Mass spectrometry Interactive Virtual Environment (MassIVE)
RRID:SCR_013665
    
Mass spectrometry Interactive Virtual Environment (MassIVE) is a community resource developed by the NIH-funded Center for Computational Mass Spectrometry to promote the global, free exchange of mass spectrometry data. Data repository for proteomics data.Mass spectrometry dataNIDDK, PLoS ONE
PubChem Substance
RRID:SCR_004742
    
As one of three primary databases of PubChem (Pcsubstance, Pccompound, and PCBioAssay), PubChem Substance Database contains descriptions of chemical samples, from a variety of sources, and links to PubMed citations, protein 3D structures, and biological screening results that are available in PubChem BioAssay. If the contents of a chemical sample are known, the description includes links to PubChem Compound. A PubChem FTP is available and new data is accepted into the repository. Pcsubstance contains more than 81 million records (2011).Community-provided information about chemical entitiesNIDDK, PLoS ONE, Sci Data
STRENDA
RRID:SCR_017422
    
Storage and search platform supported by Beilstein-Institut that incorporates STRENDA Guidelines. For authors who prepare manuscript containing functional enzymology data, STRENDA DB provides means to ensure that data sets are complete and valid before submitting them to journal.Functional enzymology dataNIDDK, PLoS ONE, Sci Data



Clinical Research

Repository Name RRID Description Type of Data Recommended By
NCBI database of Genotypes and Phenotypes (dbGap)
RRID:SCR_002709
    
Database developed to archive and distribute clinical data and results from studies that have investigated interaction of genotype and phenotype in humans. Database to archive and distribute results of studies including genome-wide association studies, medical sequencing, molecular diagnostic assays, and association between genotype and non-clinical traits.Genotyping and phenotyping information in human subjectsNLM, NIDDK, PLoS ONE, Sci Data
Vivli
RRID:SCR_018080
     
Independent, non-profit organization that has developed global data-sharing and analytics platform to promote, coordinate, and facilitate scientific sharing and reuse of clinical research data through creation and implementation of sustainable global data-sharing enterprise. Our focus is on sharing individual participant-level data from completed clinical trials. Users can search listed studies, request data sets from data contributors, aggregate data, or share data of their own. Vivli (Center for Clinical Research Data) is launching a portal to share participant-level data from COVID trials.Human subject clinical research data; Individual participant-level data from completed clinical trials



Cytometry and Immunology

Repository Name RRID Description Type of Data Recommended By
FLOWRepository
RRID:SCR_013779
    
A database of flow cytometry experiments where users can query and download data collected and annotated according to the MIFlowCyt data standard.Flow cytometry dataNIDDK, PLoS ONE, Sci Data
The Immunology Database and Analysis Portal (ImmPort)
RRID:SCR_012804
    
Data sharing repository of clinical trials, associated mechanistic studies, and other basic and applied immunology research programs. Platform to store, analyze, and exchange datasets for immune mediated diseases. Data supplied by NIAID/DAIT funded investigators and genomic, proteomic, and other data relevant to research of these programs extracted from public databases. Provides data analysis tools and immunology focused ontology to advance research in basic and clinical immunology.Immunology research data including experimental data and clinical trial dataNLM, NIDDK, PLoS ONE, Sci Data



Functional Genomics

Repository Name RRID Description Type of Data Recommended By
ArrayExpress
RRID:SCR_002964
    
International functional genomics data collection generated from microarray or next-generation sequencing (NGS) platforms. Repository of functional genomics data supporting publications. Provides genes expression data for reuse to the research community where they can be queried and downloaded. Integrated with the Gene Expression Atlas and the sequence databases at the European Bioinformatics Institute. Contains a subset of curated and re-annotated Archive data which can be queried for individual gene expression under different biological conditions across experiments. Data collected to MIAME and MINSEQE standards. Data are submitted by users or are imported directly from the NCBI Gene Expression Omnibus.Microarray; next-generation sequencing (NGS)NIDDK, Science, PLoS ONE, Sci Data
Database of Interacting Proteins (DIP)
RRID:SCR_003167
    
Database to catalog experimentally determined interactions between proteins combining information from a variety of sources to create a single, consistent set of protein-protein interactions that can be downloaded in a variety of formats. The data were curated, both, manually and also automatically using computational approaches that utilize the the knowledge about the protein-protein interaction networks extracted from the most reliable, core subset of the DIP data. Because the reliability of experimental evidence varies widely, methods of quality assessment have been developed and utilized to identify the most reliable subset of the interactions. This CORE set can be used as a reference when evaluating the reliability of high-throughput protein-protein interaction data sets, for development of prediction methods, as well as in the studies of the properties of protein interaction networks. Tools are available to analyze, visualize and integrate user's own experimental data with the information about protein-protein interactions available in the DIP database. The DIP database lists protein pairs that are known to interact with each other. By interact they mean that two amino acid chains were experimentally identified to bind to each other. The database lists such pairs to aid those studying a particular protein-protein interaction but also those investigating entire regulatory and signaling pathways as well as those studying the organization and complexity of the protein interaction network at the cellular level. Registration is required to gain access to most of the DIP features. Registration is free to the members of the academic community. Trial accounts for the commercial users are also available.Protein interaction dataNLM, NIDDK, PLoS ONE, Sci Data
European Genome phenome Archive
RRID:SCR_004944
    
Web service for permanent archiving and sharing of all types of personally identifiable genetic and phenotypic data resulting from biomedical research projects. The repository allows you to explore datasets from numerous genotype experiments, supplied by a range of data providers. The EGA''s role is to provide secure access to the data that otherwise could not be distributed to the research community. The EGA contains exclusive data collected from individuals whose consent agreements authorize data release only for specific research use or to bona fide researchers. Strict protocols govern how information is managed, stored and distributed by the EGA project. As an example, only members of the EGA team are allowed to process data in a secure computing facility. Once processed, all data are encrypted for dissemination and the encryption keys are delivered offline. The EGA also supports data access only for the consortium members prior to publication.Data linking genotyping and phenotyping information in human subjects, including sequence data, array based data, and phenotype data.NIDDK, Science, PLoS ONE, Sci Data
Gene Expression Omnibus (GEO)
RRID:SCR_005012
    
Functional genomics data repository supporting MIAME-compliant data submissions. Includes microarray-based experiments measuring the abundance of mRNA, genomic DNA, and protein molecules, as well as non-array-based technologies such as serial analysis of gene expression (SAGE) and mass spectrometry proteomic technology. Array- and sequence-based data are accepted. Collection of curated gene expression DataSets, as well as original Series and Platform records. The database can be searched using keywords, organism, DataSet type and authors. DataSet records contain additional resources including cluster tools and differential expression queries.Microarray; next-generation sequencing (NGS)NLM, NIDDK, Science, PLoS ONE, Sci Data
GenomeRNAi
RRID:SCR_013088
   
GenomeRNAi is a database of phenotypes from systematic RNA interference (RNAi) screens in cultured Drosophila cells. The phenotype database can be searched by keywords, RNAi identifiers or Drosophila gene sequences. Searches with homologous sequences from human or C. elegans are also possible. Integrated tools evaluate the specificity of long double-stranded RNAs (RNAi probes) by similarity searches against all predicted Drosophila transcripts. This site can also be used to identify pre-designed RNAi probes from available Drosophila RNAi libraries. Caenorhabditis elegans genome, human genomeRNAi screening data (phenotype and/or reagent)NIDDK, PLoS ONE, Sci Data
IntAct
RRID:SCR_006944
    
Open source database system and analysis tools for molecular interaction data. All interactions are derived from literature curation or direct user submissions. Direct user submissions of molecular interaction data are encouraged, which may be deposited prior to publication in a peer-reviewed journal. The IntAct Database contains (Jun. 2014): * 447368 Interactions * 33021 experiments * 12698 publications * 82745 Interactors IntAct provides a two-tiered view of the interaction data. The search interface allows the user to iteratively develop complex queries, exploiting the detailed annotation with hierarchical controlled vocabularies. Results are provided at any stage in a simplified, tabular view. Specialized views then allows "zooming in" on the full annotation of interactions, interactors and their properties. IntAct source code and data are freely available.Molecular interaction dataNIDDK, PLoS ONE, Sci Data
Japanese Genotype-phenotype Archive (JGA)
RRID:SCR_003118
    
A service for permanent archiving and sharing of all types of personally identifiable genetic and phenotypic data resulting from biomedical research projects. The JGA contains exclusive data collected from individuals whose consent agreements authorize data release only for specific research use or to bona fide researchers. Strict protocols govern how information is managed, stored and distributed by the JGA. Once processed, all data are encrypted. The JGA accepts only de-identified data approved by JST-NBDC. The JGA implements access-granting policy whereby the decisions of who will be granted access to the data resides with the JST-NBDC. After data submission the JGA team will process the data into databases and archive the original data files. The accepted data types include manufacturer-specific raw data formats from the array-based and new sequencing platforms. The processed data such as the genotype and structural variants or any summary level statistical analyses from the original study authors are stored in databases. The JGA also accepts and distributes any phenotype data associated with the samples. For other human biological data, please contact the NBDC human data ethical committee.Data linking genotyping and phenotyping information in human subjects; all types of individual-level genetic and de-identified phenotypic dataNIDDK, PLoS ONE, Sci Data
NCBI database of Genotypes and Phenotypes (dbGap)
RRID:SCR_002709
    
Database developed to archive and distribute clinical data and results from studies that have investigated interaction of genotype and phenotype in humans. Database to archive and distribute results of studies including genome-wide association studies, medical sequencing, molecular diagnostic assays, and association between genotype and non-clinical traits.Genotyping and phenotyping information in human subjectsNLM, NIDDK, PLoS ONE, Sci Data
PubChem BioAssay
RRID:SCR_010734
    
Data and information collection and repository for biological activities of small molecules and small interfering RNAs (siRNAs) hosted by the US National Institutes of Health (NIH). Used to select and summarize the bioactivities of tested substances.Small-molecule and RNAi screening data; bioactibity and toxicity dataNIDDK, Sci Data



Imaging

Repository Name RRID Description Type of Data Recommended By
Cancer Imaging Archive (TCIA)
RRID:SCR_008927
    
Archive of medical images of cancer accessible for public download. All images are stored in DICOM file format and organized as Collections, typically patients related by common disease (e.g. lung cancer), image modality (MRI, CT, etc) or research focus. Neuroimaging data sets include clinical outcomes, pathology, and genomics in addition to DICOM images. Submitting Data Proposals are welcomed.Primary DICOM image datasets from cancer patients and analysis datasetsNLM, NIDDK, PLoS ONE, Sci Data
Cell Image Library (CIL)
RRID:SCR_003510
    
Freely accessible, public repository of vetted and annotated microscopic images, videos, and animations of cells from a variety of organisms, showcasing cell architecture, intracellular functionalities, and both normal and abnormal processes. Explore by Cell Process, Cell Component, Cell Type or Organism. The Cell includes images acquired from historical and modern collections, publications, and by recruitment.Microscopic imaging dataNLM, NIDDK
Coherent X-Ray Imaging Data Bank (CXIDB)
RRID:SCR_014722
    
Database of Coherent X-ray Imaging (CXI) experiments. This data is widely accessible to researchers worldwide.Peptides and proteins; Coherent X-ray Imaging (CXI) dataNIDDK, PLoS ONE, Sci Data
Image Data Resource (IDR)
RRID:SCR_017421
     
Public repository of reference image datasets from published scientific studies. Platform for publishing, mining and integrating bioimaging data, following FAIR principles and Euro-BioImaging/ELIXIR imaging strategy using OMERO and Bio-Formats open source software built by Open Microscopy Environment. Deployed on OpenStack cloud running on EMBL-EBI’s Embassy resource, it includes image data linked to independent studies from genetic, RNAi, chemical, localisation and geographic high content screens, super resolution microscopy, and digital pathology.Reference image datasets supporting an original scientific publicationNIDDK, Sci Data
SICAS Medical Image Repository
RRID:SCR_017420
    
Medical image repository to store medical research data.Brain images, Segmentations, SSM, Dicom, ITK based images and statistical modelsNIDDK, PLoS ONE, Sci Data
SPARC Project
RRID:SCR_017041
    
The SPARC data repository as of 2023 is an open data repository developed as part of the NIH SPARC initiative and has been used by SPARC funded investigator groups to curate and publish high quality datasets related to the autonomic nervous system. We are thrilled that as of August 2022, SPARC is accepting datasets from investigators that are not funded through the NIH SPARC program. The NIH's Common Fund Stimulating Peripheral Activity to Relieve Conditions (SPARC) program aims to transform our understanding of these nerve-organ interactions and ultimately advance neuromodulation field toward precise treatment of diseases and conditions for which conventional therapies fall short.All research data. Multi-modal data is submitted via the SPARC Dataset Structure.NIDDK, NLM



Metabolomics

Repository Name RRID Description Type of Data Recommended By
MetaboLights
RRID:SCR_014663
    
A cross-species, cross-technique database for metabolomics experiments, data, and derived information. It includes metabolite structures and their reference spectra, their biological roles, locations and concentrations, and experimental data from metabolic experiments.Metabolite structureNIDDK, PLoS ONE, Sci Data
Metabolomics Workbench
RRID:SCR_013794
    
Repository for metabolomics data and metadata which provides analysis tools and access to various resources. NIH grantees may upload data and general users can search metabolomics database. Provides protocols for sample preparation and analysis, information about NIH Metabolomics Program, data sharing guidelines, funding opportunities, services offered by its Regional Comprehensive Metabolomics Resource Cores (RCMRC)s, and training workshops.Metabolomics data and metadataNLM, NIDDK, PLoS ONE



Molecular and supramolecular structure

Repository Name RRID Description Type of Data Recommended By
Biological Magnetic Resonance Data Bank (BMRB)
RRID:SCR_002296
    
Public depository that collects, annotates, archives, and disseminates important spectral and quantitative data derived from nuclear magnetic resonance spectroscopic investigations of biological macromolecules and metabolites. Provides reference information and maintains a collection of NMR pulse sequences and computer software for biomolecular NMR.Peptides and proteins; NMR spectroscopy dataNIDDK, Science, PLoS ONE, Sci Data
Cambridge Crystallographic Data Centre (CCDC)
RRID:SCR_014707
    
Institution which compiles and distributes small molecule crystallography data from the Cambridge Structural Database (CSD), a repository of experimentally determined organic and metal-organic crystal structures. CCDC also produces associated knowledge-based application software for structural chemists. Structures deposited with CCDC are made publically available for download at the point of publication or at consent from the depositor.small-molecule crystal structureNIDDK, Science, PLoS ONE
Crystallography Open Database (COD)
RRID:SCR_005874
    
Database of crystal structures of organic, inorganic, metal-organic compounds and minerals, excluding biopolymers. It currently contains ~291204 entries (July 2014) in crystallographic information file format, with nearly full coverage of the International Union of Crystallography publications, and is growing in size and quality. Deposit your data: An interface allows you to upload, validate and edit CIF files before submitting them for deposition.Structural data for small molecules; peptides and proteinsNIDDK, PLoS ONE, Sci Data
EMDataResource.org
RRID:SCR_003207
    
Portal for deposition and retrieval of cryo electron microscopy (3DEM) density maps, atomic models, and associated metadata. Global resource for 3 Dimensional Electron Microscopy structure data archiving and retrieval, news, events, software tools, data standards, validation methods.Peptides and proteins; larger assemblies; 3-Dimensional Electron Microscopy (3DEM)NIDDK, PLoS ONE, Sci Data
Electron Microscopy Data Bank at PDBe (MSD-EBI)
RRID:SCR_006506
    
Repository for electron microscopy density maps of macromolecular complexes and subcellular structures at Protein Data Bank in Europe. Covers techniques, including single-particle analysis, electron tomography, and electron (2D) crystallography.Peptides and proteins; Electron microscopy density mapsNIDDK, Science
Inorganic Crystal Structure Database (ICSD)
RRID:SCR_017429
  
Database for completely identified inorganic crystal structures. Collection of known inorganic crystal structures published since 1913, including their atomic coordinates. Includes only data which have passed thorough quality checks. Tool for materials research.Inorganic crystal structureNIDDK, Science
PDBe - Protein Data Bank in Europe
RRID:SCR_004312
    
The European resource for the collection, organization and dissemination of data on biological macromolecular structures. In collaboration with the other worldwide Protein Data Bank (wwPDB) partners - the Research Collaboratory for Structural Bioinformatics (RCSB) and BioMagResBank (BMRB) in the USA and the Protein Data Bank of Japan (PDBj) - they work to collate, maintain and provide access to the global repository of macromolecular structure data. The main objectives of the work at PDBe are: * to provide an integrated resource of high-quality macromolecular structures and related data and make it available to the biomedical community via intuitive user interfaces. * to maintain in-house expertise in all the major structure-determination techniques (X-ray, NMR and EM) in order to stay abreast of technical and methodological developments in these fields, and to work with the community on issues of mutual interest (such as data representation, harvesting, formats and standards, or validation of structural data). * to provide high-quality deposition and annotation facilities for structural data as one of the wwPDB deposition sites. Several sophisticated tools are also available for the structural analysis of macromolecules.Macromolecular structure dataNIDDK, Science
PDBj - Protein Data Bank Japan
RRID:SCR_008912
    
PDBj (Protein Data Bank Japan) maintains a centralized PDB archive of macromolecular structures and provides integrated tools, in collaboration with the RCSB, the BMRB in USA and the PDBe in EU.Macromolecular structure dataScience
Protein Circular Dichroism Data Bank (PCDDB)
RRID:SCR_017428
    
Public repository for archiving circular dichroism spectroscopic data and associated bioinformatics and experimental metadata. For authors to deposit experimental data as well as detailed information on methods and calculations associated with published work. Includes links for each entry to bioinformatics databases. Data are freely available to accessors either as single files or as complete data bank downloads.Peptides and proteinsNIDDK, PLoS ONE, Sci Data
Structural Biology Grid
RRID:SCR_003511
    
Computing resources structural biologists need to discover the shapes of the molecules of life, it provides access to web-enabled structural biology applications, data sharing facilities, biological data sets, and other resources valuable to the computational structural biology community. Consortium includes X-ray crystallography, NMR and electron microscopy laboratories worldwide.SBGrid Service Center is located at Harvard Medical School.SBGrid's NIH-compliant Service Center supports SBGrid operations and provides members with access to Software Maintenance, Computing Access, and Training. Consortium benefits include: * remote management of your customized collection of structural biology applications on Linux and Mac workstations; * access to commercial applications exclusively licensed to members of the Consortium, such as NMRPipe, Schrodinger Suite (limited tokens) and the Incentive version of Pymol; remote management of supporting scientific applications (e.g., bioinformatics, computational chemistry and utilities); * access to SBGrid seminars and events; and * advice about hardware configurations, operating system installations and high performance computing. Membership is restricted to academic/non-profit research laboratories that use X-ray crystallography, 2D crystallography, NMR, EM, tomography and other experimental structural biology technologies in their research. Most new members are fully integrated with SBGrid within 2 weeks of the initial application.Peptides and proteinsNIDDK, Sci Data



Neuroscience

Repository Name RRID Description Type of Data Recommended By
NeuroMorpho.Org
RRID:SCR_002145
     
Centrally curated inventory of digitally reconstructed neurons associated with peer-reviewed publications that contains some of the most complete axonal arborizations digitally available in the community. Each neuron is represented by a unique identifier, general information (metadata), the original and standardized ASCII files of the digital morphological reconstruction, and a set of morphometric features. It contains contributions from over 100 laboratories worldwide and is continuously updated as new morphological reconstructions are collected, published, and shared. Users may browse by species, brain region, cell type or lab name. Users can also download morphological reconstructions for research and analysis. Deposition and distribution of reconstruction files ultimately prevents data loss. Centralized curation and annotation aims at minimizing the effort required by data owners while ensuring a unified format. It also provides a one-stop entry point for all available reconstructions, thus maximizing data visibility and impact.3D neuronal reconstructions and associated metadataNLM, NIDDK, PLoS ONE, Sci Data
OpenNeuro
RRID:SCR_005031
    
Open platform for analyzing and sharing neuroimaging data from human brain imaging research studies. Brain Imaging Data Structure ( BIDS) compliant database. Formerly known as OpenfMRI. Data archives to hold magnetic resonance imaging data. Platform for sharing MRI, MEG, EEG, iEEG, and ECoG data.Neuroscience research dataNLM, NIH BRAIN Initiative, NIDDK, PLoS ONE, Sci Data
SPARC Project
RRID:SCR_017041
    
The SPARC data repository as of 2023 is an open data repository developed as part of the NIH SPARC initiative and has been used by SPARC funded investigator groups to curate and publish high quality datasets related to the autonomic nervous system. We are thrilled that as of August 2022, SPARC is accepting datasets from investigators that are not funded through the NIH SPARC program. The NIH's Common Fund Stimulating Peripheral Activity to Relieve Conditions (SPARC) program aims to transform our understanding of these nerve-organ interactions and ultimately advance neuromodulation field toward precise treatment of diseases and conditions for which conventional therapies fall short.All research data. Multi-modal data is submitted via the SPARC Dataset Structure.NIDDK, NLM



Nucleic acid sequence

Repository Name RRID Description Type of Data Recommended By
DNA DataBank of Japan (DDBJ)
RRID:SCR_002359
    
Maintains and provides archival, retrieval and analytical resources for biological information. Central DDBJ resource consists of public, open-access nucleotide sequence databases including raw sequence reads, assembly information and functional annotation. Database content is exchanged with EBI and NCBI within the framework of the International Nucleotide Sequence Database Collaboration (INSDC). In 2011, DDBJ launched two new resources: DDBJ Omics Archive and BioProject. DOR is archival database of functional genomics data generated by microarray and highly parallel new generation sequencers. Data are exchanged between the ArrayExpress at EBI and DOR in the common MAGE-TAB format. BioProject provides organizational framework to access metadata about research projects and data from projects that are deposited into different databases.Gene sequenceNLM, NIDDK, Science, PLoS ONE, Sci Data
Database of Genomic Variants Archive (DGVa)
RRID:SCR_004896
   
Public repository that accepts direct submissions and provides archiving, accessioning and distribution of publicly available genomic structural variants, in all species. Variants are accessioned at the study and sample level, granting stable identifiers that can be used in publications. DGVa data is integrated with other EBI resources, including comprehensive EBI search and Ensembl genome browser. Exchanges data with companion database, dbVar, at National Center for Biotechnology Information.NOTE: since 2019 DGVa doesn't accept submissions. Please send the data for submission to European Variation Archive (EVA).Genomic structural variantsNIDDK, Science, PLoS ONE, Sci Data
European Nucleotide Archive (ENA)
RRID:SCR_006515
    
Public archive providing a comprehensive record of the world''''s nucleotide sequencing information, covering raw sequencing data, sequence assembly information and functional annotation. All submitted data, once public, will be exchanged with the NCBI and DDBJ as part of the INSDC data exchange agreement. The European Nucleotide Archive (ENA) captures and presents information relating to experimental workflows that are based around nucleotide sequencing. A typical workflow includes the isolation and preparation of material for sequencing, a run of a sequencing machine in which sequencing data are produced and a subsequent bioinformatic analysis pipeline. ENA records this information in a data model that covers input information (sample, experimental setup, machine configuration), output machine data (sequence traces, reads and quality scores) and interpreted information (assembly, mapping, functional annotation). Data arrive at ENA from a variety of sources including submissions of raw data, assembled sequences and annotation from small-scale sequencing efforts, data provision from the major European sequencing centers and routine and comprehensive exchange with their partners in the International Nucleotide Sequence Database Collaboration (INSDC). Provision of nucleotide sequence data to ENA or its INSDC partners has become a central and mandatory step in the dissemination of research findings to the scientific community. ENA works with publishers of scientific literature and funding bodies to ensure compliance with these principles and to provide optimal submission systems and data access tools that work seamlessly with the published literature. ENA is made up of a number of distinct databases that includes the EMBL Nucleotide Sequence Database (Embl-Bank), the newly established Sequence Read Archive (SRA) and the Trace Archive. The main tool for downloading ENA data is the ENA Browser, which is available through REST URLs for easy programmatic use. All ENA data are available through the ENA Browser. Note: EMBL Nucleotide Sequence Database (EMBL-Bank) is entirely included within this resource.Sequence reads, genome assemblies, targeted assembled and annotated sequencesNLM, NIDDK, Science, PLoS ONE, Sci Data
European Variation Archive (EVA)
RRID:SCR_017425
    
Open access database of all types of genetic variation data from all species. Users can download data from any study, or submit their own data to archive. You can also query all variants by study, gene, chromosomal location or dbSNP identifier using our Variant Browser.Gene sequenceNIDDK, Science, PLoS ONE, Sci Data
GenBank
RRID:SCR_002760
    
NIH genetic sequence database that provides annotated collection of all publicly available DNA sequences for almost 280 000 formally described species (Jan 2014) .These sequences are obtained primarily through submissions from individual laboratories and batch submissions from large-scale sequencing projects, including whole-genome shotgun (WGS) and environmental sampling projects. Most submissions are made using web-based BankIt or standalone Sequin programs, and GenBank staff assigns accession numbers upon data receipt. It is part of International Nucleotide Sequence Database Collaboration and daily data exchange with European Nucleotide Archive (ENA) and DNA Data Bank of Japan (DDBJ) ensures worldwide coverage. GenBank is accessible through NCBI Entrez retrieval system, which integrates data from major DNA and protein sequence databases along with taxonomy, genome, mapping, protein structure and domain information, and biomedical journal literature via PubMed. BLAST provides sequence similarity searches of GenBank and other sequence databases. Complete bimonthly releases and daily updates of GenBank database are available by FTP.Gene sequenceNLM, NIDDK, Science, PLoS ONE, Sci Data
MGnify
RRID:SCR_016429
    
Portal for the analysis and exploration of metagenomic, metatranscriptomic, amplicon and assembly data. Provides functional and taxonomic analyses of user-submitted sequences, as well as analysis of publicly available metagenomic datasets held within the European Nucleotide Archive (ENA).Microbiome analysis resource in 2020.metagonomics sequence data and associate metadataNIDDK, Science, PLoS ONE, Sci Data
NCBI Assembly Archive Viewer
RRID:SCR_012917
  
Database providing information on structure of assembled genomes, assembly names and other meta-data, statistical reports, and links to genomic sequence data. The Archive links the raw sequence information found in the Trace Archive with assembly information found in publicly available sequence repositories (GenBank/EMBL/DDBJ).Genome assemblies (collection of genomic sequences that are used to represent the genome of an organism)NIDDK, Sci Data
NCBI Sequence Read Archive (SRA)
RRID:SCR_004891
    
Repository of raw sequencing data from next generation of sequencing platforms including including Roche 454 GS System, Illumina Genome Analyzer, Applied Biosystems SOLiD System, Helicos Heliscope, Complete Genomics, and Pacific Biosciences SMRT. In addition to raw sequence data, SRA now stores alignment information in form of read placements on reference sequence. Data submissions are welcome. Archive of high throughput sequencing data,part of international partnership of archives (INSDC) at NCBI, European Bioinformatics Institute and DNA Database of Japan. Data submitted to any of this three organizations are shared among them.NGS data only; Multiple sequence alignmentNLM, PLoS ONE, Sci Data
dbSNP
RRID:SCR_002338
    
Database as central repository for both single base nucleotide substitutions and short deletion and insertion polymorphisms. Distinguishes report of how to assay SNP from use of that SNP with individuals and populations. This separation simplifies some issues of data representation. However, these initial reports describing how to assay SNP will often be accompanied by SNP experiments measuring allele occurrence in individuals and populations. Community can contribute to this resource.Simple genetic polymorphisms or structural variationsNLM, NIDDK, PLoS ONE, Sci Data
dbVar
RRID:SCR_003219
    
Structural variation database designed to store data on variant DNA > / = 1 bp in size from all organisms. Associations of defined variants with phenotype information is also provided. Users can browse data containing number of variant cells from each study, and filter studies by organism, study type, method and genomic variant. Organisms include human, mouse, cattle and several additional animals.Simple genetic polymorphisms or structural variationsNLM, NIDDK, PLoS ONE, Sci Data
miRBase
RRID:SCR_003152
   
Central online repository for microRNA nomenclature, sequence data, annotation and target prediction.Collection of published miRNA sequences and annotation.miRNA sequences and annotationNIDDK, PLoS ONE



Other domain-specific repositories

Repository Name RRID Description Type of Data Recommended By
Influenza Research Database (IRD)
RRID:SCR_006641
   
The Influenza Research Database (IRD) serves as a public repository and analysis platform for flu sequence, experiment, surveillance and related data.Flu sequence, experiment and surveillance dataPLoS ONE
NIMH Data Archive
RRID:SCR_004434
     
The National Institute of Mental Health Data Archive (NDA) makes available human subjects data collected from hundreds of research projects across many scientific domains. Research data repository for data sharing and collaboration among investigators. Used to accelerate scientific discovery through data sharing across all of mental health and other research communities, data harmonization and reporting of research results. Infrastructure created by National Database for Autism Research (NDAR), Research Domain Criteria Database (RDoCdb), National Database for Clinical Trials related to Mental Illness (NDCT), and NIH Pediatric MRI Repository (PedsMRI).Imaging data; neurosignal recordings data; autism spectrum disorder-relevant data (all data types); clinical data, genomic data, phenotype dataNLM, NIDDK, PLoS ONE
National Addiction and HIV Data Archive Program (NAHDAP)
RRID:SCR_000636
    
Archive that acquires, preserves and disseminates data relevant to drug addiction and HIV research. Collection of data on drug addiction and HIV infection in United States. Most of datasets are raw data from surveys, interviews, and administrative records. They were originally gathered in research projects and for administrative purposes. Some datasets have been used in published studies. Bibliographies of these studies are available . Provides access to research data and technical assistance for data depositors. Provides e-workshops on data preparation and data systems.Wide variety of data related to drug abuseNLM, NIDDK, PLoS ONE
SPARC Project
RRID:SCR_017041
    
The SPARC data repository as of 2023 is an open data repository developed as part of the NIH SPARC initiative and has been used by SPARC funded investigator groups to curate and publish high quality datasets related to the autonomic nervous system. We are thrilled that as of August 2022, SPARC is accepting datasets from investigators that are not funded through the NIH SPARC program. The NIH's Common Fund Stimulating Peripheral Activity to Relieve Conditions (SPARC) program aims to transform our understanding of these nerve-organ interactions and ultimately advance neuromodulation field toward precise treatment of diseases and conditions for which conventional therapies fall short.All research data. Multi-modal data is submitted via the SPARC Dataset Structure.NIDDK, NLM



Protein sequence

Repository Name RRID Description Type of Data Recommended By
UniProtKB
RRID:SCR_004426
    
Central repository for collection of functional information on proteins, with accurate and consistent annotation. In addition to capturing core data mandatory for each UniProtKB entry (mainly, the amino acid sequence, protein name or description, taxonomic data and citation information), as much annotation information as possible is added. This includes widely accepted biological ontologies, classifications and cross-references, and experimental and computational data. The UniProt Knowledgebase consists of two sections, UniProtKB/Swiss-Prot and UniProtKB/TrEMBL. UniProtKB/Swiss-Prot (reviewed) is a high quality manually annotated and non-redundant protein sequence database which brings together experimental results, computed features, and scientific conclusions. UniProtKB/TrEMBL (unreviewed) contains protein sequences associated with computationally generated annotation and large-scale functional characterization that await full manual annotation. Users may browse by taxonomy, keyword, gene ontology, enzyme class or pathway.Amino acid sequence and annotationNLM, Science, PLoS ONE, Sci Data



Proteomics

Repository Name RRID Description Type of Data Recommended By
Global Proteome Machine Database (GPM DB)
RRID:SCR_006617
   
The Global Proteome Machine Organization was set up so that scientists involved in proteomics using tandem mass spectrometry could use that data to analyze proteomes. The projects supported by the GPMO have been selected to improve the quality of analysis, make the results portable and to provide a common platform for testing and validating proteomics results. The Global Proteome Machine Database was constructed to utilize the information obtained by GPM servers to aid in the difficult process of validating peptide MS/MS spectra as well as protein coverage patterns. This database has been integrated into GPM server pages, allowing users to quickly compare their experimental results with the best results that have been previously observed by other scientists.Protein and peptides; Mass spectrometryNIDDK, PLoS ONE
ProteomeXchange
RRID:SCR_004055
    
A data repository for proteomic data sets. The ProteomeExchange consortium, as a whole, aims to provide a coordinated submission of MS proteomics data to the main existing proteomics repositories, as well as to encourage optimal data dissemination. ProteomeXchange provides access to a number of public databases, and users can access and submit data sets to the consortium's PRIDE database and PASSEL/PeptideAtlas.Tandem mass spectrometry proteomics data of peptidesNIDDK, PLoS ONE, Sci Data
Proteomics Identifications (PRIDE)
RRID:SCR_003411
    
Centralized, standards compliant, public data repository for proteomics data, including protein and peptide identifications, post-translational modifications and supporting spectral evidence. Originally it was developed to provide a common data exchange format and repository to support proteomics literature publications. This remit has grown with PRIDE, with the hope that PRIDE will provide a reference set of tissue-based identifications for use by the community. The future development of PRIDE has become closely linked to HUPO PSI. PRIDE encourages and welcomes direct user submissions of protein and peptide identification data to be published in peer-reviewed publications. Users may Browse public datasets, use PRIDE BioMart for custom queries, or download the data directly from the FTP site. PRIDE has been developed through a collaboration of the EMBL-EBI, Ghent University in Belgium, and the University of Manchester.Protein and peptide identification/quantification data from mass spectrometryPLoS ONE, Sci Data