Metadata management - INRAE UMR 1332 BFP / Bordeaux Metabolome  


Metadata documentation



DEFINITION

Short name

  • Give your dataset a short name that can be seen as a persistent identifier or a name that will serve as a reference such as a project name. This name will then be mainly used to identify it and reference the metadata page via a URL. So it is not the main title of the dataset (see full title). Do not use spaces or non-alphanumeric characters. Only letters, numbers and underscores are allowed. The number of characters is limited to 20.

Full title

  • Concise and precise title of your dataset. It is this title that will appear on the data repository (Dataverse and Zenodo) as the main title. Can sometimes use the title of the publication if it is not too long.

Subject

  • The area of the study relevant to the dataset. The list of terms is imposed by Dataverse. It is also a mandatory field. Adding, modifying or deleting them may prevent uploading to the repository. See OECD and European Commission about field of science and technology classification.

Description of the dataset

  • A summary describing the purpose, nature and the scope of the dataset. It is the equivalent of the abstract for an article.

Notes

  • You can add a note which will be added in the data repository. Typically, put here the reference to funders who do not have an identifier recognized in international registers. This field will be pushed into the data repository (Dataverse and Zenodo).

STATUS

Status of the dataset

  • Choose a status for the dataset. This field is used for data management and will not subsequently be reported in the data repository (Dataverse or Zenodo).

    • Processed : Data are available and have been curated. Which means that the data is in a stable and redistributable version as is. Any dataset that must be published must have this status.

    • In progress : Some data is available but more is to come. Some are stable but others are not. Any dataset with this status must only be shared between partners of the same project.

    • Unprocessed : The data is available but not curated. Which means that the data is not yet stable and therefore cannot be distributed as is. Any dataset with this status must only be shared between partners of the same project.

Access rights to data

  • Indicates the status of the data regarding access and dissemination. This field is used for data management and will not subsequently be reported in the data repository (Dataverse or Zenodo).

    • Public : Either a link to the data is provided as a resource with password-free access (e.g filebrowser) or access to the data is possible to anyone with access to the storage space (locally or remotely via VPN).

    • Private : Access requiring specific rights (e.g. filebrowser or on storage space).

    • Mixte : Mix of public and private access

Language

  • The language in which the dataset files are documented.

Life cycle step

  • This concerns all stages of the data life cycle. Specifies the event happening over the data life cycle that is considered significant enough to document. The list of terms is imposed by Dataverse. Adding, modifying or deleting them may prevent uploading to the repository. See DDI -CV.
    Note: It can be omitted if you don't know what to put.

License

  • License/Data Use Agreement. See Choose a license and CC-BY licenses.
    Warning: Very important field because it precisely defines the rights and duties of users of your data.
    Note: Etalab 2.0 is the recommended license in accordance with France's open data policy

Start of collection

  • Start date of sample or data collection. A full date in ISO-8601 format is required even if the month and day are fictitious. This date allows us to know when the implementation of the experimental process began.

End of collection

  • End date of sample or data collection. A full date in ISO-8601 format is required even if the month and day are fictitious. This date allows you to know when data collection is complete.

DMP identifier

  • Preferably a URL pointing directly to the DMP but it can also be a permanent identifier (DOI, etc.). See what-is-a-dmp-id.

MANAGEMENT

Contacts

  • Person that users of the dataset can contact with questions. Based on the People dictionary. Note that people defined as contact must have an email address defined via the dictionary.

Authors

  • Person that created the dataset. Based on the People dictionary.

Data collectors

  • Person responsible for finding, gathering/collecting data under the guidelines of the author(s) or Principal Investigator (PI). Based on the People dictionary.

Data curators

  • Person who organizes, integrates, and annotates data collected from various sources in order that the value of the data is maintained over time and the data remains available for reuse and preservation. Based on the People dictionary.

Project members

  • Person on the membership list of a designated project/project team.. Based on the People dictionary.

Project leader

  • Person officially designated as head of project team or sub- project team instrumental in the work necessary to development of the resource. Based on the People dictionary.

WP leader

  • The Work Package Leader is responsible for ensuring the comprehensive contents, versioning, and availability of the Work Package during the development of the resource. Based on the People dictionary.

Depositor

  • Person or organization, that deposited the dataset in the repository.

Producer

  • The entity that serves to produce the dataset. Based on the Producer dictionary.

Grant Information

  • Information about the dataset's financial support. Based on the Grant dictionary.

DESCRIPTORS

Keywords

  • A key term that describes an important aspect of the dataset and information about any controlled vocabulary used. Based on BioPortal ontologies : EFO, JERM, EDAM, MS, NM, NCI, OBI, PO, PTO, AGRO, ECOCORE, IOBC, NCBITAXON.

Topic Classification

  • Indicates a broad, important topic or subject that the dataset covers and information about any controlled vocabulary used. Based on Thesaurus-INRAE.

Kind of Data

  • The type of data included in the files (e.g. survey data, machine-readable text, experimental data tables). The list of terms is imposed by Dataverse. Adding, modifying or deleting them may prevent uploading to the repository. See DDI-CV

Data origin

  • Origin of the data. The list of terms is imposed by Dataverse. Adding, modifying or deleting them may prevent uploading to the repository.

Experimental Factor

  • Specify experimental factors i.e controlled independent variable. Based on the Vocabulary dictionary.

Measurement type

  • Specify the types of measurements carried out, e.g metabolites, phenotypic data, environmental data, etc. Based on the Vocabulary dictionary.

Technology type

  • Specify the types of instrument used to carry out all or part of the measurements, e.g NMR, LC-MS, ... Based on the Vocabulary dictionary.

Publication - Citation

  • The full bibliographic citation for the related publication

Publication - ID Type

  • The type of the identifier that uniquely identifies a related publication. Based Datacite CV.

    • ark,arXiv,bibcode,doi,ean13,eissn,handle,isbn,issn,istc,lissn,lsid,pmid,purl,upc,url,urn :

Publication - ID Number

  • The identifier for a related publication corresponding to the specified identifier type.

Publication - URL

  • The URL of the publication web page, e.g. a journal article webpage

OTHER

Additional information

  • Although it is preferable to put all data documents online via appropriate repositories (e.g. protocols.io for protocols) or in an electronic laboratory notebook (e.g eLabFTW), you can nethertheless add any comments concerning the data which could prove useful in their generation, in their interpretation, in their location, ...

RESOURCES

Resource Type

  • Choose the type of the resource. Based on Datacite CV.

Media Type

  • Choose a media type if applicable. Based on Datacite CV.

Description

  • Provide a concise and accurate description of the resource. Must not exceed 30 characters.

Location

  • Preferably indicate a URL to an external resource accessible to all. But it can also be a password-protected resource (e.g. a disk space on the cloud). This can also be text clearly indicating where the resource is located (internal disk space). Finally, this can be the name of a file deposited on the same disk space as the metadata file, in order to be able to push it in the data repository at the same time as the metadata (see Publication on the online documentation).