https://github.com/dativebase/lingsync2old
Migrate your LingSync corpus to an OLD (Online Linguistic Database)
https://github.com/dativebase/lingsync2old
Last synced: 8 months ago
JSON representation
Migrate your LingSync corpus to an OLD (Online Linguistic Database)
- Host: GitHub
- URL: https://github.com/dativebase/lingsync2old
- Owner: dativebase
- License: apache-2.0
- Created: 2015-12-16T22:18:18.000Z (over 10 years ago)
- Default Branch: master
- Last Pushed: 2016-08-04T17:30:28.000Z (almost 10 years ago)
- Last Synced: 2025-10-06T03:01:42.451Z (9 months ago)
- Language: Python
- Size: 69.3 KB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 1
-
Metadata Files:
- Readme: README.rst
- License: LICENSE
Awesome Lists containing this project
README
================================================================================
LingSync-to-OLD Migrator
================================================================================
This is a command-line utility that migrates a LingSync corpus to an Online
Linguistic Database (OLD). Both the source LingSync corpus and the destination
OLD must be accessible at URLs (possibly local) via HTTP.
Warnings/disclaimers
--------------------------------------------------------------------------------
- DEPENDENCY: requires that the Python Requests library be installed. All other
imports are from the standard library.
- It is assumed that the destination OLD is empty. Migrating a LingSync corpus
to an OLD that already has data in it may result in errors or corrupted data.
- Some LingSync data points (entire documents or specific fields/attributes)
are purposefully not migrated. You will need to check the resulting OLD to
verify that the conversion is satisfactory.
Usage
--------------------------------------------------------------------------------
Just run `lingsync2old.py` and you will be prompted for the required arguments::
$ ./lingsync2old.py
You can also supploy the required arguments as options::
$ ./lingsync2old.py \
--ls-url=https://corpus.lingsync.org \
--ls-corpus=my-lingsync-corpus-name \
--ls-username=my-lingsync-username \
--ls-password=my-lingsync-password \
--old-url=my-old-url \
--old-username=my-old-username \
--old-password=my-old-password
Full param/option listing::
--force-download: boolean that, when `True`, forces the downloading of the
LingSync/CouchDB data, even if we have already downloaded it. Default
is `False`.
--force-convert: boolean that, when `True`, forces the converting of the
LingSync JSON data to OLD JSON data, even if we have already converted
it. Default is `False`.
--force-file-download: boolean that, when `True`, forces the downloading of
a LingSync file (e.g., audio), even if we have already downloaded and
saved it.
--verbose: boolean that makes this script say more about what it's doing.
--ls-url: The LingSync CouchDB URL that we can make requests to for
extracting the LingSync data. Defaults to 'https://corpus.lingsync.org'.
--ls-corpus: The name of the LingSync corpus that we want to
migrate.
--ls-username: The username of a user who has sufficient privileges to
request the LingSync corpus' data from the CouchDB API.
--ls-password: The password corresponding to the LingSync
username.
--old-url: The OLD URL that we will upload the converted LingSync
data to.
--old-username: The username of a user on the destination OLD who
has sufficient privileges to make create, update and delete requests,
i.e., an admin or a contributor.
--old-password: The password corresponding to the OLD username.
Algorithm
--------------------------------------------------------------------------------
It's essentially a three-step algorithm:
1. Download. Request LingSync data as JSON using the CouchDB API (and save it
locally).
2. Convert. Build a JSON structure (from 1) that the OLD can digest (and save it
locally).
3. Upload. Use the output of (2) to send JSON/REST POST requests to the relevant
OLD web service.
Here is the general mapping from LingSync documents (or implicit entities) to
OLD resources.
+------------+-------------+
| LingSync | OLD |
+============+=============+
| tags | tags |
+------------+-------------+
| users | users |
+------------+-------------+
| speakers | speakers |
+------------+-------------+
| files | files |
+------------+-------------+
| datums | forms |
+------------+-------------+
| datalists | corpora |
+------------+-------------+
| sessions | collections |
+------------+-------------+
Questions
--------------------------------------------------------------------------------
1. Are there tags in LingSync sessions?
2. Are there files in LingSync sessions?
3. Should we fill in empty values with the values of other attributes. E.g., if
the morpheme_break value is empty, should the transcription value be copied
to it?
TODOs
--------------------------------------------------------------------------------
- large file (> 20MB) upload to OLD still not implemented.
- downloading LingSync image files still not implemented.
- make this script sensitive to OLD versions, and maybe to LingSync ones too.