How to Publish

Open Data, Software and Code Guidelines

Open Data, Software and Code Guidelines
These guidelines relate to the F1000Research policy on data availability, which requires all authors to share the underlying data which relates to their article. The policy text can be read here.
If you cannot share your data, for example for ethical reasons, a limited number of exceptions to these guidelines are provided below.
For more information on each of the requirements, please see Further Guidance.
What is required when submitting an article
  1. Your dataset(s) must be deposited in an appropriate data repository.
  2. Your dataset(s) must have a license applied which allows reuse by others (CC0 or CC-BY).
  3. Your dataset(s) must have a persistent identifier (e.g. a DOI) allocated by a data repository.
  4. You must provide a data availability statement as a section at the end of your article, including elements 1-3.
  5. You must include a data citation and add a reference to data to your reference list.
  6. Your dataset(s) should not contain any sensitive information, for example in relation to human research participants.
  7. You should share any related software and code.
  8. Your dataset(s) must be useful and reusable by others, adhere to any relevant data sharing standards in your discipline and align with the FAIR Data Principles.
  9. Your dataset(s) should link back to your article, if possible.
If you fail to adhere to these guidelines when submitting, the publication of your article may be delayed, and your article may ultimately be rejected.
Further guidance
1. Your dataset(s) must be deposited in an appropriate data repository
Before submission, you should deposit your data in an appropriate data repository and ensure that the dataset is published openly on the web. The data should be stored in an Open file format. The repository you choose must supply you with a persistent identifier (for example a DOI or accession code) and allow you to apply an open license, which must be CC0, CC-BY 4.0 or equivalent. Please include descriptive legends and where applicable, coding schemas alongside your datasets.
Most repositories do not charge a fee for deposit; however, a fee may apply if the repository provides data checking or curation services; or if you are storing very large datasets (for example over 100GB).
Discipline-specific repositories
F1000Research strongly encourages the use of community-recognized and discipline-specific repositories where they are available.
For some data types (crystallographic data, expression and sequence data, metabolomics data and proteomics data), depositing data into specific data repositories is mandatory. A list of appropriate data repositories for disciplinary data is available below.
Generalist repositories
If there is no appropriate discipline-specific repository available, please deposit your data in a generalist data repository, an institutional data repository (for example provided by your university), or a national data repository.
Controlled access repositories
If you cannot share your data openly, for example to protect the privacy of your research participants, you may choose to use a repository which restricts or controls who can access your data and for what purposes.
2. Your dataset(s) must be openly licensed
To allow the maximum possible reuse, your dataset(s) should be published with a CC0 Public Domain Dedication, which does not retain any rights to the data. Alternatively, a CC-BY 4.0 Creative Commons Attribution Only license, which requires others to attribute you when using the data, is acceptable. Your chosen repository should allow you to apply a CC0 Public Domain Dedication, CC-BY 4.0 license or equivalent to your data.
For software and source code, we strongly advise you to use an OSI-approved license.
3. Your dataset(s) must have a persistent identifier
Persistent identifiers allow datasets to be uniquely identified on the web. Commonly used persistent identifiers include DOIs and accession numbers, but other persistent identifiers such as PURLs, ARKs, Handles or URNs are also acceptable. Your chosen data repository should provide you with a persistent identifier for each dataset that you deposit.
We also recommend that you use an appropriate Research Resource Identifier (RRID) to unambiguously identify any antibodies, model organisms, cell lines, plasmids, or other tools (software, databases, services) which you used in your research. RRIDs can be found on the Resource Identification Portal and should be included in your Methods section.
4. You must provide a data availability statement
You must include a data availability statement to the end of your article, before the reference list, describing each dataset and including a link to the relevant repository and the dataset’s persistent identifier.
When drafting the statement, please include:
  • The name of the repository used;
  • A brief description of the contents of each dataset;
  • A statement that the dataset has a CC0 Public Domain Dedication or CC-BY 4.0 license applied.
If your data must be restricted for legal, ethical, or other reasons, please see below for further information on what should be included in your data availability statement.
Examples:
Data Type Data Availability Statement Example Data Citation Example
Data deposited into a generalist repository Figshare: Dietary knowledge assessment among the patients with type 2 diabetes in Madinah: A cross-sectional study. https://doi.org/10.6084/m9.figshare.22122656.v1.

The project contains the following underlying data:
  • Data.xlsx. (Anonymised answers to questionnaire, correct answers – 1, incorrect answers - 0).

Data are available under the terms of the Creative Commons Attribution 4.0 International license (CC-BY 4.0).
Alharbi M: Dietary knowledge assessment among the patients with type 2 diabetes in Madinah: A cross-sectional study. [Dataset]. figshare. 2023. https://doi.org/10.6084/m9.figshare.22122656.v1

Example taken from: Alharbi M, Alharbi M, Surrati A et al. Dietary knowledge assessment among the patients with type 2 diabetes in Madinah: A cross-sectional study  [version 2; peer review: 2 approved]. F1000Research 2024, 12:416 (https://doi.org/10.12688/f1000research.131518.2)
Data deposited into a repository with accession codes The underlying data has been deposited in the ProteomeXchange Consortium via the PRIDE partner repository, accession number PXD027611: https://identifiers.org/pride.project:PXD027611. Wright, J and Choudhary, J. Identifying and characterizing Thrap3, Bclaf1 and Erh direct interactions using cross-linking mass spectrometry. PRIDE. 2021. https://identifiers.org/pride.project:PXD027611.

Example taken from: Shcherbakova L, Pardo M, Roumeliotis T and Choudhary J. Identifying and characterising Thrap3, Bclaf1 and Erh interactions using cross-linking mass spectrometry. Wellcome Open Res 2021, 6:260 (https://doi.org/10.12688/wellcomeopenres.17160.1)
Data with access restrictions LSHTM Data Compass: Treatment of child wasting: Child Health Research Initiative (CHNRI) prioritisation exercise dataset, https://doi.org/10.17037/DATA.00001882.
This project contains the following underlying data:
  • Underlying data file 1: dataset (NWL-CHNRI-dataset) (restricted access)
  • Underlying data file 2: dataset description (NWL-CHNRI-dataset-codebook) (unrestricted access)
Due to the fact that open posting of data on a repository was not included in the study information sheet at the time the survey was done, data access will be granted once users have consented to the data sharing agreement and provided written plans and justification for what is proposed with the data. Data access may be obtained by submitting a request to the No Wated Lives, Action Against Hunger authors via the LSHTM Data Compass repository.
Requests will be reviewed by Action Against Hunger/ No Wasted Lives (the lead agency for this study) and key collaborators as named on the repository.
Kerac M, Angood C, Mayberry A, et al.: Treatment of child wasting: Child Health Research Initiative (CHNRI) prioritisation exercise dataset. LSHTM Data Compass. 2020. http://www.doi.org/10.17037/DATA.00001882

Example taken from: Angood C, Kerac M, Black R et al. Treatment of child wasting: results of a child health and nutrition research initiative (CHNRI) prioritisation exercise. F1000Research 2021, 10:126 (https://doi.org/10.12688/f1000research.46544.1)
Articles without data No data associated with this article None required
Articles where the data consists of bibliographic references The data for this article consists of bibliographic references, which are included in the References section. Standard bibliographic references
5. You must include a data citation and add a reference to data to your reference list
Your dataset should be cited in the body of your article, and you should add the dataset to your reference list as you would any other bibliographic citation.
You may use your preferred referencing style but should include, at a minimum:
Dataset creator; Publication year; Dataset title; Name of repository where the data is located; Persistent Identifier (e.g. DOI).
Please add [Dataset] to the reference to denote its type.
6. Your dataset(s) must not contain any sensitive information
It is your responsibility to share data ethically and, where relevant, protect the privacy of your research participants. You should ensure that your datasets have been de-identified in accordance with the Safe Harbor method before submission.
Data sensitivity is not only connected to human research participants, so please check your datasets for other sensitive elements, for example the locations of endangered species or protected archaeological sites.
7. You should share any related software and code
All articles should include details of any software and code that are required to view the datasets described or to replicate the analysis.
For software
For all software used, please state the version, details of where the software can be accessed, and any variable parameters that could impact the outcome of the results. If you have coded software in-house, the source code should be written in (or be compatible with) an Open Source programming language, and should be archived under an open license and shared. For code stored in GitHub, you should create a ‘public registration’ for your project to obtain a DOI.
Information about software should be included in a software availability statement, which you can add to the end of your article, before the references list.
When drafting the statement, please include:
  • Software available from: URL for the website where software can be downloaded from, if applicable.
  • Source code available from: URL for versioning control system (for example GitHub).
  • Archived source code at time of publication: DOI and citation for project in Zenodo (please select the appropriate DOI for the version which underlies your article).
  • License: Must be an open license and preferably an OSI-approved license.
Where third-party proprietary software has been used, a non-proprietary, Open Source alternative software should be suggested by the author to allow for the replication of the analysis or research by all readers. We recognize that there may be cases where this may not be feasible. Please see the limited exceptions to these guidelines for more information.
If there are ethical or privacy considerations as to why the source code may not be made available, please contact the editorial team.
For analysis code
If you have created custom analysis code, this should be archived under an open license and shared. For analysis code stored in GitHub, you should create a ‘public registration’ for your project to obtain a DOI. We recommend using an OSI-approved license, but CC-BY 4.0 is also acceptable.
Information about your archived analysis code should be included in your data availability statement, which you can add to the end of your article, before the references list.
When drafting the statement, please include, under the heading “Extended Data”:
  • Analysis code available from: URL for versioning control system (for example GitHub)
  • Archived analysis code as at time of publication: DOI and citation, e.g. from Zenodo (please select the appropriate DOI for the version which underlies your article).
  • License: Must be an open license and preferably an OSI-approved license or CC-BY 4.0.
Code and software should be cited in the body of your article, be added to your reference list as you would any other bibliographic citation.
You may use your preferred referencing style but should include, at a minimum:
Creator(s); Publication year; Title; Publication venue; Publication date; Persistent Identifier (e.g. DOI); Version.
Please add either [Software] or [Code] as part of the reference to denote its type
8. Your dataset(s) must be useful and reusable by others, adhere to any relevant data sharing standards in your discipline and align with the FAIR Data Principles
The FAIR Data Principles: F1000Research endorses the FAIR Data Principles as a framework to promote the broadest reuse of research data. Datasets which are “FAIR” are Findable, Accessible, Interoperable and Reusable. More information on the FAIR Data Principles and how you can align your data sharing methods with them is available here.
Relevant data sharing standards: Data standards help you to align with commonly used data sharing practices in your field, for example how your data should be structured, formatted and annotated. Please check FAIRSharing.org for details of data standards specific to the topic of your research.
9. Your dataset(s) should link back to your article
Some data repositories provide functionality which allows you to add links to any published articles associated with your dataset. If possible, we recommend that you update your metadata record in the data repository to include a link to your published article. You can link to the article using your article DOI, which will be emailed to you when your article is published.
Limited exceptions to these guidelines
Ethical or security considerations
If data access is restricted for ethical or security reasons, please use your data availability statement to include a description of the restrictions on the data and all necessary information required for a reader or reviewer to apply for access to the data and the conditions under which access will be granted.
Data protection and participant privacy
Where human data cannot be sufficiently de-identified to protect participant privacy, we recommend depositing the data into a controlled access repository, if your ethical approval and participant consent permits you to do so.
If you cannot share the data in a repository, please include in your data availability statement: an explanation of the data protection concern; what, if anything, the relevant Institutional Review Board (IRB) or equivalent said about data sharing; and, where applicable, all necessary information required for a reader or reviewer to apply for access to the data and the conditions under which access will be granted.
Large data
Where data is too large to be feasibly hosted by a F1000Research-approved repository, please include all necessary information required for a reader or reviewer to access the data with a description of the access process as part of your data availability statement.
Data under license or provided by a third party
In cases where data has been obtained from a third party and restrictions apply to the availability of the data, the data availability statement must include all necessary information required for a reader or reviewer to access the data by the same means as the authors; and details of any publicly available data that is representative of the analysed dataset, which can be used to apply the methodology described in the article.
Proprietary software
Where third party proprietary software has been used, an open source alternative must be provided in the article to allow for the replication of the analysis or research by all readers. Exceptions may be made if the chosen proprietary software performs specific functions and there is no open source alternative that can carry out these functions in the same manner.
If this applies to your article, your data availability statement should include a clear description of the third party proprietary software used, including the name and version number, and what it was used for in the research. The article must also include a detailed Methods section that allows for replication; for example, the mathematics underpinning any of the simulations or calculations run using the proprietary software. You must also share any output data or analysis code generated during the research, openly and ideally in an open file format, and these must also be described in the data availability statement.
If you are unable to share your data, software or code for any reason not included here, or have additional questions about data sharing, please let our editorial team know and we will be happy to advise.
The FAIR Data Principles
F1000Research endorses the FAIR Data Principles as a framework to promote the broadest reuse of research data.
Additional, practical guidance can be found on the GoFAIR website.
For research software, please consult the FAIR4RS Principles.
Findable
Findable data should be easy for both humans and machines to find.
Findable data requires that:
  • F1. (Meta)data are assigned a globally unique and persistent identifier.
  • F2. Data are described with rich metadata (defined by R1 below).
  • F3. Metadata clearly and explicitly include the identifier of the data they describe.
  • F4. (Meta)data are registered or indexed in a searchable resource.
The best way to achieve Findable data is by:
  • Depositing your dataset into a recognized data repository which assigns globally unique persistent identifiers (such as DOIs).
  • Add as much contextual information (metadata) as possible when depositing your dataset into the repository.
Accessible
Accessible data refers to data that can be accessed once found; this may involve authentication of the user and authorization of access.
Accessible data requires that:
  • A1. (Meta)data are retrievable by their identifier using a standardized communications protocol
    • A1.1 The protocol is open, free, and universally implementable
    • A1.2 The protocol allows for an authentication and authorization procedure, where necessary
  • A2. Metadata are accessible, even when the data are no longer available
The best way to achieve Accessible data is by:
  • Depositing your dataset into a recognized data repository which uses standard communications protocols like http://.
  • Ensuring that the data repository you choose gives continued access to metadata even when datasets are removed.
Interoperable
Interoperable data refers to data that can be compared and combined with data from different sources, by both humans and machines.
Interoperable data requires that:
  • I1. (Meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation.
  • I2. (Meta)data use vocabularies that follow FAIR principles
  • I3. (Meta)data include qualified references to other (meta)data
The best way to achieve Interoperable data is by:
  • Checking FAIRsharing.org for the standards that apply to your data type and using them.
  • Ensuring that the data repository you choose allows you to include links or references to other related data.
  • Using open, non-proprietary file formats for your data.
Reusable
Sharing data which can be reused by others is the main goal of the FAIR Principles.
Reusable data requires that:
  • R1. (Meta)data are richly described with a plurality of accurate and relevant attributes
    • R1.1. (Meta)data are released with a clear and accessible data usage license
    • R1.2. (Meta)data are associated with detailed provenance
    • R1.3. (Meta)data meet domain-relevant community standards
The best way to achieve Reusable data is by:
  • Adding as much contextual information (metadata) as possible when depositing your dataset into a repository.
  • Applying an open license to your data, preferably CC0 or CC-BY 4.0.
  • Checking FAIRsharing.org for the standards that apply to your data type and using them.
F1000Research-approved repositories
Below is a list of repositories that have already been approved for hosting data alongside an F1000Research article.
If you are an author who wishes to use a repository not already on this list, including institutional data repositories, please contact us.
If you manage a repository and would like to be included on the list, please complete our Repository Evaluation form and return it to us.
In addition to your research data, you should ensure that your research materials and supporting documents are also deposited into an appropriate repository.
Some types of data benefit from visualization within the article. F1000Research welcomes the submission of articles featuring Plot.ly interactive figures and Code Ocean compute capsules. Videos and images can be displayed through a widget provided by Figshare. If you think your dataset would benefit from visualization, please contact us.
Datasets for which there is no discipline-specific repository; research materials and supporting documents
Data Type Where to submit* What to include in the data availability section of your article
Any Figshare$ Title, DOI
Any, but especially deposits with mixed data and code Zenodo Title, DOI
Any Dryad Title, DOI
Any, but especially data in SAV and POR formats Dataverse Title, DOI
Any, but especially deposits with mixed data, materials and documents Open Science Framework Title, DOI
Deposits of mixed data and code Code Ocean Title, DOI, embed code for interactive reanalysis tool
Any biological data, but especially data linked to studies in other databases BioStudies Title, accession number
Research materials Any appropriate public repository, such as Addgene, American Type Culture Collection, Arabidopsis Biological Resource Center, Bloomington Drosophila Stock Center, Caenorhabditis Genetics Center, DSMZ, European Conditional Mouse Mutagenesis Program, European Mouse Mutant Archive, Knockout Mouse Project, Jackson Laboratory, Mutant Mouse Regional Resource Centers and RIKEN Bioresource Centre Accession number(s) or unique identifier(s)
* Please note that many repositories have a limit on the size (usually 2 or 5 GB) of single file uploads and charge for larger data files.
$ If you think your data are suitable for visualization within your article through the Figshare viewer, please contact us.
† Deposits must be made public and your project must be registered to ensure that a record will remain persistent and unchangeable.
Software & source code
Data Type Where to submit What to include in the data availability section of your article
Latest source code GitHub or BitBucket URL
Archived source code Zenodo Title, DOI and license* used
Deposits of mixed data and code Code Ocean Title, DOI, embed code for interactive reanalysis tool
Software Authors may host software where they wish, though it is strongly recommended to use a stable URL URL
* An open license must be assigned and we strongly advise authors to use an OSI-approved license.
3D-printable models
Data Type Where to submit What to include in the data availability section of your article
All 3D-printable models (including molecular, cellular, medical/anatomical and labware models) NIH 3D Print Exchange Title, model ID, URL
Health data (allowing restricted access to protect anonymity of participants)
Data Type Where to submit What to include in the data availability section of your article
Addiction and HIV data National Addiction & HIV Data Archive Program Title, DOI, Route of access
Cancer imaging Cancer Imaging Archive Title, DOI, Route of access
Cancer-related clinical trial data Project Datasphere Title, DOI, Route of access
Clinical trial data Vivli Title, DOI, Route of access
Humanities and social science data
Data Type Where to submit What to include in the data availability section of your article
Any DANS-EASY* Title, DOI
Any, but reserved for ISCPR member institutions Open ICPSR Title, DOI
Any UK Data Archive* Title, DOI
Social and economic data UK Data Service Title, DOI
Qualitative social science data The Qualitative Data Repository Title, DOI
* Deposits must be open access.
Transcript data
Qualitative data resulting from recordings of interviews or focus group discussions should be anonymised by redaction and uploaded to a general data repository (see above). If it is not possible to anonymise the data sufficiently by redaction, a restricted route of data access should be provided by the authors and a comprehensive statement must be added to the Data Availability section of the article (see above for data that cannot be shared). If the transcript data cannot be shared under any circumstances, please contact the editorial team, who will be able to advise you.
Environmental and ecological data
Data Type Where to submit What to include in the data availability section of your article
Complex environmental and ecological data The Knowledge Network for Biocomplexity* Title, DOI
Environmental data collected by NERC-funded researchers NERC data centres Data centre name, title and DOI
Geospatial PANGAEA Title, DOI
Geochemical EarthChem Title, DOI
Climate data World Data Center for Climate (WDCC) Title, DOI
* Data entries must be made public.
Chemical and macromolecular structures
Data Type Where to submit What to include in the data availability section of your article
X-ray Crystallographic Information Files (CIFs), structure factors and checkCIF reports* Cambridge Crystallographic Data Centre Compound name, CCDC deposition number
3D protein structures Protein Data Bank PDB number
Crystallography* Crystallography Open Database COD ID
X-ray images Coherent X-ray Imaging Data Bank Title, DOI
Electron Microscopy Electron Microscopy Data Resource (EMDB) Accession number(s)
NMR Spectroscopy Biological Magnetic Resonance Data Bank (BMRB) Accession number(s)
Chemical structures, annotations and associated bioassay test results PubChem CID(s)
Chemical structures, spectra and syntheses ChemSpider ChemSpider ID
* X-ray crystallography validation reports should be submitted (as a PDF) directly to F1000Research via the submission system.
Neuroimaging data
Data Type Where to submit What to include in the data availability section of your article
Raw fMRI datasets OpenfMRI Title and accession number(s)
MRI and PET unthresholded statistical maps NeuroVault* Title and URL (which includes a unique data ID)
* Please note that authors will still be expected to deposit their raw neuroimaging data in an appropriate repository. Also, once submitted, administrative powers will be transferred to F1000Research. This is necessary to ensure stability of the dataset; this transfer does not affect the CC0 license assigned to all NeuroVault submissions.
Sequence and omics data
Data Type Where to submit What to include in the data availability section of your article
Expression and sequence data (including Nucleotide/protein sequence, microarray, SNP/SNV, GWAS, phenotype or sequence-based reagent data)

Systems and chemical biology data (including chemical entities, chemical reactions, computational models, metabolic profiles, or molecular interactions)
Any appropriate INSDC member repository, e.g. DDBJ, ENA or NCBI repositories.*

The GSA, which is working towards INSDC membership, is also acceptable.

Researchers in China may alternatively use the CNGB Sequence Archive.
Accession number(s).
For SNP/SNV data please provide HGVS name(s), local ID(s) and rs/ss number(s)
Metabolomic data Metabolomics Workbench$ Project DOI, Study ID
Proteomic data Any appropriate ProteomeXchange member repository Accession number(s)
* Some higher-level repositories, such as BioProject and BioStudies, provide access to data deposited in various archival databases. In these cases, please cite the accession numbers that are assigned to the data submissions by the archival databases in addition to the higher-level identifier.
$ Or any appropriate INSDC member repository, see above.
Physics
Data Type Where to submit What to include in the data availability section of your article
High Energy Physics HEPData Title, DOI
Materials Science
Data Type Where to submit What to include in the data availability section of your article
Ab initio electronic structures NOMAD Repository Title, DOI
Computational, but especially calculations with full provenance Materials Cloud Title, DOI