Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/imperial-genomics-facility/limsmetadataparsing
A pyspark based codebase for fetching and formatting metadata from a LIMS db for IGF
https://github.com/imperial-genomics-facility/limsmetadataparsing
apache-arrow apache-spark pandas pyodbc python-3-6 sparksql
Last synced: 12 days ago
JSON representation
A pyspark based codebase for fetching and formatting metadata from a LIMS db for IGF
- Host: GitHub
- URL: https://github.com/imperial-genomics-facility/limsmetadataparsing
- Owner: imperial-genomics-facility
- License: apache-2.0
- Created: 2019-11-21T13:19:10.000Z (about 5 years ago)
- Default Branch: master
- Last Pushed: 2021-07-22T18:00:13.000Z (over 3 years ago)
- Last Synced: 2024-12-03T16:17:25.697Z (about 1 month ago)
- Topics: apache-arrow, apache-spark, pandas, pyodbc, python-3-6, sparksql
- Language: Python
- Size: 42 KB
- Stars: 0
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# LimsMetadataParsing
A pyspark based codebase for fetching and formatting metadata from a LIMS db for IGF## Set up environment
* Step 1: Get Miniconda
wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh* Step 2: Clone git repo
git clone https://github.com/imperial-genomics-facility/LimsMetadataParsing.git
* Step 3: Install conda env from the environment.yml fileconda env create -n ENV_NAME --file environment.yml
* Step 4: Create egg file for LimsMetadataParsing repopython setup.py bdist_egg
## Get UCanAccess
Download UCanAccess from the following link and unzip the contents
- [http://ucanaccess.sourceforge.net/site.html](http://ucanaccess.sourceforge.net/site.html)## Usage
parseAccessDbForMetadata.py [-h] -a ACCESS_DB_PATH -q QUOTE_FILE_PATH
-o OUTPUT_PATH -k KNOWN_PROJECTS_LIST -j
UCANACCESS_JAR_PATHoptional arguments:
-h, --help show this help message and exit
-a ACCESS_DB_PATH, --access_db_path ACCESS_DB_PATH
Path to Access LIMS db
-q QUOTE_FILE_PATH, --quote_file_path QUOTE_FILE_PATH
Path to quote xls file
-o OUTPUT_PATH, --output_path OUTPUT_PATH
Output dir path for metadta files
-k KNOWN_PROJECTS_LIST, --known_projects_list KNOWN_PROJECTS_LIST
File containing list of known projects
-j UCANACCESS_JAR_PATH, --ucanaccess_jar_path UCANACCESS_JAR_PATH
Path to ucanaccess jar files
## Run spark codespark-submit \
--master local[NUMBER_OF_CPUS] \
--py-files /path/igfLimsParsing-0.0.1-py3.6.egg \
/path/LimsMetadataParsing/scripts/parseAccessDbForMetadata.py \
-a /path/Database.accdb \
-q /path/Quotes.xlsx \
-o /path/csv_dir \
-k /path/project_list.csv \
-j /path/UCanAccess-4.0.4-bin