Core Scientific Dataset Model: A lightweight and portable model and file format for multi-dimensional scientific data


Autoři: Deepansh J. Srivastava aff001;  Thomas Vosegaard aff002;  Dominique Massiot aff003;  Philip J. Grandinetti aff001
Působiště autorů: Department of Chemistry, Ohio State University, 100 West 18th Avenue, Columbus, OH 43210, United States of America aff001;  Laboratory for Biomolecular NMR Spectroscopy, Department of Molecular and Structural Biology, University of Aarhus, DK-8000 Aarhus C, Denmark aff002;  CEMHTI UPR3079 CNRS, Univ. Orléans, F-45071 Orléans, France aff003
Vyšlo v časopise: PLoS ONE 15(1)
Kategorie: Research Article
doi: 10.1371/journal.pone.0225953

Souhrn

The Core Scientific Dataset (CSD) model with JavaScript Object Notation (JSON) serialization is presented as a lightweight, portable, and versatile standard for intra- and interdisciplinary scientific data exchange. This model supports datasets with a p-component dependent variable, {U0, …, Uq, …, Up−1}, discretely sampled at M unique points in a d-dimensional independent variable (X0, …, Xk, …, Xd−1) space. Moreover, this sampling is over an orthogonal grid, regular or rectilinear, where the principal coordinate axes of the grid are the independent variables. It can also hold correlated datasets assuming the different physical quantities (dependent variables) are sampled on the same orthogonal grid of independent variables. The model encapsulates the dependent variables’ sampled data values and the minimum metadata needed to accurately represent this data in an appropriate coordinate system of independent variables. The CSD model can serve as a re-usable building block in the development of more sophisticated portable scientific dataset file standards.

Klíčová slova:

Data acquisition – Latitude – Longitude – Metadata – NMR spectroscopy – Programming languages – Scientists – Transmission electron microscopy


Zdroje

1. ECMA. Standard ECMA-404: The JSON Data Interchange Syntax; 2017. Available from: https://www.ecma-international.org/publications/standards/Ecma-404.htm.

2. Fowler M. UML Distilled, A Brief Guide to the Standard Object Modeling Language. Boston: Addison-Wesley; 2004.

3. Thompson A, Taylor BN. Guide for the use of the International System of Units (SI); 2008. Available from: https://nvlpubs.nist.gov/nistpubs/Legacy/SP/nistspecialpublication811e2008.pdf.

4. Consortium WWW. Architecture of the World Wide Web, Volume One; 2004. Available from: http://www.w3.org/TR/webarch/.

5. PythonWare. Python Imaging Library (PIL);. Available from: http://www.pythonware.com/products/pil/.

6. Collaboration A, Robitaille TP, Tollerud EJ, Greenfield P, Droettboom M, Bray E, et al. Astropy: A community Python package for astronomy. aap. 2013;558:A33.

7. van der Walt S, Colbert SC, Varoquaux G. The NumPy Array: A Structure for Efficient Numerical Computation. Computing in Science & Engineering. 2011;13:22–30. doi: 10.1109/MCSE.2011.37

8. Church JA, White NJ. Sea-Level Rise from the Late 19th to the Early 21st Century. Surveys in Geophysics. 2011;32:585–602. doi: 10.1007/s10712-011-9119-1

9. Lancashire RJ. JCAMP-DX; 2006. Available from: http://wwwchem.uwimona.edu.jm/spectra/index.html.

10. Cardona A, Saalfeld S, Preibisch S, Schmid B, Cheng A, Pulokas J, et al. An Integrated Micro- and Macro architectural Analysis of the Drosophila Brain by Computer-Assisted Serial Section Electron Microscopy. PLoS Biology. 2010;8:e1000502. doi: 10.1371/journal.pbio.1000502 20957184

11. The Hubble Heritage Project; 2016. Available from: https://archive.stsci.edu/prepds/heritage/bubble/introduction.html.

12. Balsgart NM, Vosegaard T. Fast Forward Maximum entropy reconstruction of sparsely sampled data. J Magn Reson. 2012;223:164–169. doi: 10.1016/j.jmr.2012.07.002 22975245

13. Weggelaar J;. Available from: https://pixnio.com/fauna-animals/raccoons/raccoon-procyon-lotor.

14. ERDDAP: Marine Domain Awareness (MDA) ERDDAP Server—JRC Italy;. Available from: http://mda.marine.ie/erddap/griddap/NCEP_Global_Best.html.

15. Whitaker J. Matplotlib Basemap Toolkit; 2011. Available from: https://matplotlib.org/basemap/.

16. Hunter JD. Matplotlib: A 2D Graphics Environment. Computing in Science & Engineering. 2007;9:90–95. doi: 10.1109/MCSE.2007.55

17. Diffusion tensor MRI datasets; 2000. Available from: http://www.sci.utah.edu/~gk/DTI-data/.

18. Srivastava DJ, Florian P, Baltisberger JH, Grandinetti PJ. Correlating geminal 2JSi–O–Si couplings to structure in framework silicates. Phys Chem Chem Phys. 2018;20:562–571. doi: 10.1039/C7CP06486A

19. Bak M, Rasmussen JT, Nielsen NC. SIMPSON: A General Simulation Program for Solid-State NMR Spectroscopy. J Magn Reson. 2000;147:296–330. doi: 10.1006/jmre.2000.2179 11097821

20. Tosšner Z, Andersen R, Stevensson B, Edén M, Nielsen NC, Vosegaard T. Computer-intensive simulation of solid-state NMR experiments using SIMPSON. J Magn Reson. 2014;246:79–93. doi: 10.1016/j.jmr.2014.07.002 25093693

21. Massiot D, Fayon F, Capron M, King I, Le Calvé S, Alonso B, et al. Modelling one- and two-dimensional solid-state NMR spectra. Magn Reson Chem. 2002;40:70–76. doi: 10.1002/mrc.984

22. Vosegaard T. jsNMR: an embedded platform-independent NMR spectrum viewer. Magn Reson Chem. 2015;53:285–290. doi: 10.1002/mrc.4195 25641013

23. PhySy Ltd. RMN 2.0; 2019. Available from: https://www.physyapps.com/rmn.


Článek vyšel v časopise

PLOS One


2020 Číslo 1