https://github.com/niconoe/darwinsql
DarwinCore Archives expressed as SQLite
https://github.com/niconoe/darwinsql
biodiversity biodiversity-informatics biodiversity-standards gbif tdwg
Last synced: 7 months ago
JSON representation
DarwinCore Archives expressed as SQLite
- Host: GitHub
- URL: https://github.com/niconoe/darwinsql
- Owner: niconoe
- License: bsd-2-clause
- Created: 2017-07-04T11:26:42.000Z (over 8 years ago)
- Default Branch: master
- Last Pushed: 2017-07-27T12:16:40.000Z (about 8 years ago)
- Last Synced: 2025-01-11T20:31:31.260Z (9 months ago)
- Topics: biodiversity, biodiversity-informatics, biodiversity-standards, gbif, tdwg
- Size: 4.88 KB
- Stars: 1
- Watchers: 2
- Forks: 0
- Open Issues: 4
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# DarwinSQL
Like DarwinCore Archive, but using SQLite## Rationale
The [Darwin Core Archive](https://en.wikipedia.org/wiki/Darwin_Core_Archive) format is widely for biodiversity
informatics (within the [GBIF](http://www.gbif.org) community for example), and has largely shown its effectiveness
as an exchange format.Due to its nature (basically, a bunch of CSV files zipped together) it is however a poor candidate for data use and
analysis: many users get data in Darwin Core Archive, but then immediately extract data from the Archive and transfer it
to some custom/non-standard format that's easier to manipulate, such as a spreadsheet or a relational database.## DarwinSQL: the concept
The aim of DarwinSQL is to propose an alternative, standardized file format that can be used for exchanging data,
but also for simple data use and analysis.There are two main milestones to this project:
- Defining a new file format with the following characteristics:
- Can hold the same content (data and metadata) as Darwin Core Archives
- Keeps the perks of DwC-A: simple, contained in a single file, ...
- Based on [SQLite](https://www.sqlite.org/):
- improved performance
- queryable with SQL
- directly compatible with the huge SQLite ecosystem (admin tools, GUI, converters, ...)
- Stable, with the goal to ultimately become an official standard
- Providing a basic implementation of this format:
- A command-line tool to automatically convert (no questions asked): Darwin Core Archive -> DarwinSQL
- (Possibly) a small webapp to perform this conversion
- (Possibly) a DarwinSQL -> Darwin Core Archive converter
- (Possibly) some examples
- ...## The DarwinSQL format
**Work in progress, subject to change at any time, comments are welcome!**
- DarwinSQL files are SQLite3 databases
- They use the *.dwsql* file extension
- It contains a table for each data/csv file in the source DwC-A (core and extensions). Rows/columns match the content
of the source DwC-A. All fields are of TEXT type. Those tables *may* contains more fields for interpreted values. Those
fields are named: '_interpreted'
- It contains an `info` table describing the content of the DarwinSQL in a standardized format, similar to the
`Metafile` of a Darwin Core Archives.
- It contains a `files` table that hold the raw content of each file of the Archive, except the data (CSV) files. 2 fields:
`path` (relative path of the file in the originating Darwin Core Archive) and `content` (BLOB)