Scientific DataSet (SDS) is a managed library for reading, writing and sharing array-oriented scientific data: time series, matrices, satellite/medical imagery and multidimensional numerical grids. The library is optimized to handle data in the form of arrays, e.g. time series and tables, vectors and matrices, multidimensional grids. SDS bundles several related arrays and associated metadata in a single self-descriptive package and enforces certain constraints on arrays' shapes to ensure data consistency. An underlying data model of the SDS is based on a long-term community experience. The SDS Data Model has commonality to the Unidata’s
Common Data Model, which was chosen because CDM has been successfully tested by time. The model is widely spread among scientists working with data and is quite simple.
The idea of the Scientific DataSet is to provide a single data model with implementations for multiple specific data formats. Applications are able to store and retrieve data uniformly having an abstract view on various custom data storages. This makes an application less dependent on data formats and significantly eases data transfer between software components.
Scientific DataSet features:
- Rich metadata to create self-descriptive data packages.
- Support for multiple data formats that are popular in this area such as NetCDF.
- The ability to scale out from simple text files to multi-terabyte Microsoft Azure archives.
- Concurrent access to the data from multiple computing agents in multicore and distributed settings.
- The ability to perform consistency checks and transactional updates.
Using SDS in your computational program gives you the following advantages:
- Your program is more interoperable. It can import/export data in different formats.
- Your program is more scalable. It can seamlessly switch from the readable text files that are useful in small scale experiments and debugging to high performance binary data formats in production mode.
- Your program can immediately become part of a sophisticated concurrent data flow system.
- It is easy to visualize results of your program using DataSet Viewer.
An extensible set of dynamically loadable providers allows you to choose from different storage formats and different data access mechanisms. For example, depending on the
DataSet URI parameter supplied, different runs of the same program canread or write data differently using text files in CSV format, binary NetCDF files or other format/communication mechanism.
The Scientific DataSet package includes the following components:
If you need to run an application that uses SDS, on a machine where no SDS installed, see this
topic.