Demonstration of the functionnalities of the R ODAM package

Description

  • ‘ODAM’ (Open Data for Access and Mining) is an experimental data table management software to make research data accessible and available for reuse with minimal effort on the part of the data provider. Designed to manage experimental data tables in an easy way for users, ODAM provides a model for structuring both data and metadata that facilitates data handling and analysis. It also encourages data dissemination according to FAIR principles by making the data interoperable and reusable by both humans and machines, allowing the dataset to be explored and then extracted in whole or in part as needed.

  • The Rodam package has only one class, odamws that provides methods to allow you to retrieve online data using ‘ODAM’ Web Services. This obviously requires that data are implemented according the ‘ODAM’ approach , namely that the data subsets were deposited in the suitable data repository in the form of TSV files associated with their metadata also described in TSV files.

  • The R ODAM package offers a set of functions for retrieve data and their metadata of datasets that are implemented help with the “Experimental Data Table Management System” (EDTMS) called ODAM, which stands for “Open Data for Access and Mining”.

  • See https://inrae.github.io/ODAM/ for further information.


Load the R ODAM package

library(Rodam)
## Loading required package: httr


Initialize the ODAM object

Initialize the ‘ODAM’ object with the wanted dataset along with its corresponding URL of webservice

dh <- new('odamws', wsURL='https://pmb-bordeaux.fr/getdata/', dsname='frim1')


Get the Data Tree

options(width=256)
options(warn=-1)
options(stringsAsFactors=FALSE)

show(dh)
##                        levelName SetID Identifier WSEntry                             Description Count
## 1  plants                            1    PlantID   plant                          Plant features   552
## 2   °--samples                       2   SampleID  sample                         Sample features  1287
## 3       ¦--aliquots                  3  AliquotID aliquot                       Aliquots features   530
## 4       ¦   ¦--cellwall_metabo       4  AliquotID aliquot      Cell wall Compound quantifications    75
## 5       ¦   ¦--cellwall_metaboFW     5  AliquotID aliquot Cell Wall Compound quantifications (FW)    75
## 6       ¦   ¦--activome              6  AliquotID aliquot                       Activome Features   266
## 7       ¦   ¦--plato_hexosesP       10  AliquotID aliquot                       Hexoses Phosphate   266
## 8       ¦   ¦--lipids_AG            11  AliquotID aliquot                               Lipids AG    57
## 9       ¦   °--AminoAcid            12  AliquotID aliquot                             Amino Acids    69
## 10      °--pools                     7     PoolID    pool                Pools of remaining pools   195
## 11          ¦--qMS_metabo            8     PoolID    pool             MS Compounds quantification    25
## 12          °--qNMR_metabo           9     PoolID    pool            NMR Compounds quantification    64


Get all WebService entries

Get all WebService entries defined in the data subset ‘samples’

dh$getWSEntryByName("samples")
##     Subset     Attribute   WSEntry
## 1   plants       PlantID     plant
## 2   plants          Rank       row
## 3   plants      PlantNum  plantnum
## 4   plants     Treatment treatment
## 5  samples      SampleID    sample
## 6  samples         Truss     truss
## 7  samples      DevStage     stage
## 8  samples      FruitAge       age
## 9  samples FruitPosition  position
## 10 samples FruitDiameter  diameter
## 11 samples   FruitHeight    height
## 12 samples       FruitFW  weightfw
## 13 samples       FruitDW  weightdw

NOTE:

a ‘WSEntry’ is an alias name associated with an attribute that allows user to query the data subset by putting a filter condition (i.e. a selection constraint) on the corresponding attribute. Not all attributes have a WSEntry but only few ones, especially the attributes within the identifier and factor categories. For instance, the WSEntry of the ‘SampleID’ attribute is ‘sample’. Thus, if you want to select only samples with their ID equal to 365, you have to specify the filter condition as ‘sample/365’.



Get data from ‘samples’ subset with a constraint

data <- dh$getDataByName('samples','sample/365')
data
##   PlantID Rank PlantNum Treatment SampleID Truss DevStage FruitAge HarvestDate HarvestHour FruitPosition FruitDiameter FruitHeight FruitFW  FruitDW DW
## 1     E35    E      311   Control      365    T6    FR.02    47DPA       40423         0.5             5         55.46       48.98   83.32 5.299152 NA
## 2     A17    A       17   Control      365    T6    FR.02    47DPA       40423         0.5             3         56.59       47.77   82.02 5.216472 NA
## 3      A8    A        8   Control      365    T6    FR.02    47DPA       40423         0.5             5         55.11       44.90   71.82 4.567752 NA
## 4      D3    D      210   Control      365    T6    FR.02    47DPA       40423         0.5             5         49.28       44.35   58.28 3.706608 NA
## 5     H11    H      356   Control      365    T6    FR.02    47DPA       40423         0.5             6         46.68       38.69   49.25 3.132300 NA


But if this WSEntry concept is not clear for you, you can retrieve the full data subset, then performe a local selection as shown below :

data <- dh$getDataByName('samples') 
data[data$SampleID==365, ]
##     PlantID Rank PlantNum Treatment SampleID Truss DevStage FruitAge HarvestDate HarvestHour FruitPosition FruitDiameter FruitHeight FruitFW  FruitDW DW
## 658     E35    E      311   Control      365    T6    FR.02    47DPA       40423         0.5             5         55.46       48.98   83.32 5.299152 NA
## 659     A17    A       17   Control      365    T6    FR.02    47DPA       40423         0.5             3         56.59       47.77   82.02 5.216472 NA
## 660      A8    A        8   Control      365    T6    FR.02    47DPA       40423         0.5             5         55.11       44.90   71.82 4.567752 NA
## 661      D3    D      210   Control      365    T6    FR.02    47DPA       40423         0.5             5         49.28       44.35   58.28 3.706608 NA
## 662     H11    H      356   Control      365    T6    FR.02    47DPA       40423         0.5             6         46.68       38.69   49.25 3.132300 NA


Convert all numeric values of date and time in a human-readable format

data$HarvestDate <- dh$dateToStr(data$HarvestDate)
data$HarvestHour <- dh$timeToStr(data$HarvestHour)
data[data$SampleID==365, ]
##     PlantID Rank PlantNum Treatment SampleID Truss DevStage FruitAge HarvestDate HarvestHour FruitPosition FruitDiameter FruitHeight FruitFW  FruitDW DW
## 658     E35    E      311   Control      365    T6    FR.02    47DPA  2010-09-02        12h0             5         55.46       48.98   83.32 5.299152 NA
## 659     A17    A       17   Control      365    T6    FR.02    47DPA  2010-09-02        12h0             3         56.59       47.77   82.02 5.216472 NA
## 660      A8    A        8   Control      365    T6    FR.02    47DPA  2010-09-02        12h0             5         55.11       44.90   71.82 4.567752 NA
## 661      D3    D      210   Control      365    T6    FR.02    47DPA  2010-09-02        12h0             5         49.28       44.35   58.28 3.706608 NA
## 662     H11    H      356   Control      365    T6    FR.02    47DPA  2010-09-02        12h0             6         46.68       38.69   49.25 3.132300 NA



Get ‘activome’ data subset

Get ‘activome’ data subset along with its metadata

ds <- dh$getSubsetByName('activome')
ds$samples   # Show the identifier defined in the data subset
## NULL
ds$facnames  # Show all factors defined in the data subset
## [1] "Treatment" "DevStage"  "FruitAge"
ds$varnames  # Show all quantitative variables defined in the data subset
##  [1] "PGM"             "cFBPase"         "PyrK"            "CitS"            "PFP"             "Aconitase"       "PFK"             "FruK"           
##  [9] "pFBPase"         "GluK"            "NAD_ISODH"       "Enolase"         "NADP_ISODH"      "PEPC"            "Aldolase"        "Succ_CoA_ligase"
## [17] "NAD_MalDH"       "AlaAT"           "Fumarase"        "AspAT"           "NADP_GluDH"      "NAD_GAPDH"       "NADP_GAPDH"      "NAD_GluDH"      
## [25] "TPI"             "PGK"             "Neutral_Inv"     "Acid_Inv"        "G6PDH"           "UGPase"          "SuSy"            "NAD_ME"         
## [33] "ShiDH"           "NADP_ME"         "PGI"             "StarchS"         "AGPase"          "SPS"
ds$qualnames # Show all qualitative variables defined in the data subset
## [1] "Rank"  "Truss"
ds$WSEntry   # Show all WS entries defined in the data subset
##      Subset       Attribute             WSEntry
## 1    plants         PlantID               plant
## 2    plants            Rank                 row
## 3    plants        PlantNum            plantnum
## 4    plants       Treatment           treatment
## 5   samples        SampleID              sample
## 6   samples           Truss               truss
## 7   samples        DevStage               stage
## 8   samples        FruitAge                 age
## 9   samples   FruitPosition            position
## 10  samples   FruitDiameter            diameter
## 11  samples     FruitHeight              height
## 12  samples         FruitFW            weightfw
## 13  samples         FruitDW            weightdw
## 14 aliquots        SampleID              sample
## 15 aliquots       AliquotID             aliquot
## 16 activome       AliquotID             aliquot
## 17 activome             PGM  Phosphoglucomutase
## 18 activome         pFBPase      bisphosphatase
## 19 activome             PGK              kinase
## 20 activome             SPS            synthase
## 21 activome             PFK phosphofructokinase
## 22 activome       Aconitase           Aconitase
## 23 activome            FruK        fructokinase
## 24 activome            GluK         Glucokinase
## 25 activome           ShiDH       dehydrogenase
## 26 activome         Enolase             Enolase
## 27 activome            PEPC         Carboxylase
## 28 activome        Aldolase            aldolase
## 29 activome Succ_CoA_ligase              ligase
## 30 activome           AlaAT        transaminase
## 31 activome        Fumarase            fumarase
## 32 activome           AspAT    aminotransferase
## 33 activome      NADP_GAPDH                NADP
## 34 activome       NAD_GAPDH                 NAD
## 35 activome       NAD_GluDH                 NAP
## 36 activome             PGI           isomerase
## 37 activome        Acid_Inv           invertase
## 38 activome          UGPase       phosphorylase
## 39 activome         NADP_ME              enzyme


Boxplot of all variables defined in ds$varnames

Rank <- simplify2array(lapply(ds$varnames, function(x) { round(mean(log10(ds$data[ , x]), na.rm=T)) }))
cols <- c('red', 'orange', 'darkgreen', 'blue', 'purple')
boxplot(log10(ds$data[, ds$varnames]), outline=F, horizontal=T, border=cols[Rank], las=2, cex.axis=0.8)


Find how many IDs in common there are between the subsets

Based on the subset network, the common ID to be considered is the “SampleID” identifier

 refID <- "SampleID"
 subsetList <- c( "samples", "activome", "qNMR_metabo", "cellwall_metabo" )
 n <- length(subsetList)
 Mintersubsets <- matrix(data=0, nrow=n, ncol=n)
 for (i in 1:(n-1))
     for (j in (i+1):n)
          Mintersubsets[i,j] <- length(dh$getCommonID(refID,subsetList[i],subsetList[j]))
 
 rownames(Mintersubsets) <- subsetList
 colnames(Mintersubsets) <- subsetList
 Mintersubsets[ -n, -1 ]
##             activome qNMR_metabo cellwall_metabo
## samples          254         188              70
## activome           0         188              70
## qNMR_metabo        0           0              23


Get the merged data of two data subsets based on their common identifiers

setNameList <- c("activome", "qNMR_metabo" )
dsMerged <- dh$getSubsetByName(setNameList)

Boxplot of all variables defined in ds$varnames

cols <- c( rep('red', length(dsMerged$varsBySubset[[setNameList[1]]])), 
           rep('darkgreen', length(dsMerged$varsBySubset[[setNameList[2]]])) )
boxplot(log10(dsMerged$data[, dsMerged$varnames]), outline=F, horizontal=T, border=cols, las=2, cex.axis=0.8)





R Session Information

options(width=128)
sessionInfo()
## R version 4.0.3 (2020-10-10)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 19045)
## 
## Matrix products: default
## 
## locale:
## [1] LC_COLLATE=French_France.1252  LC_CTYPE=French_France.1252    LC_MONETARY=French_France.1252 LC_NUMERIC=C                  
## [5] LC_TIME=French_France.1252    
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] Rodam_0.1.14 httr_1.4.2  
## 
## loaded via a namespace (and not attached):
##  [1] digest_0.6.27     R6_2.5.0          jsonlite_1.7.2    magrittr_2.0.1    evaluate_0.14     highr_0.8        
##  [7] rlang_0.4.11      stringi_1.5.3     curl_4.3.1        jquerylib_0.1.3   bslib_0.2.4       rmarkdown_2.17   
## [13] data.tree_1.0.0   tools_4.0.3       stringr_1.4.0     xfun_0.29         yaml_2.2.1        compiler_4.0.3   
## [19] htmltools_0.5.1.1 knitr_1.31        sass_0.4.2