https://github.com/skyeav/tablassert3.0.0
Tablassert is a multipurpose tool that crafts knowledge assertions from tabular data, augments knowledge with configuration, and exports knowledge as Knowledge Graph Exchange (KGX) consistent TSVs.
https://github.com/skyeav/tablassert3.0.0
bioinformatics knowledge-graph ncats-translator python table-mining
Last synced: 8 months ago
JSON representation
Tablassert is a multipurpose tool that crafts knowledge assertions from tabular data, augments knowledge with configuration, and exports knowledge as Knowledge Graph Exchange (KGX) consistent TSVs.
- Host: GitHub
- URL: https://github.com/skyeav/tablassert3.0.0
- Owner: SkyeAv
- License: mit
- Created: 2025-02-11T22:46:45.000Z (8 months ago)
- Default Branch: main
- Last Pushed: 2025-02-20T21:46:23.000Z (8 months ago)
- Last Synced: 2025-02-20T22:30:56.187Z (8 months ago)
- Topics: bioinformatics, knowledge-graph, ncats-translator, python, table-mining
- Language: Python
- Homepage:
- Size: 85 KB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Tablassert(3.0.0)
## By Skye Goetz (ISB) & Gwênlyn Glusman (ISB)
Tablassert is a multipurpose tool that crafts knowledge assertions from tabular data, augments knowledge with configuration, and exports knowledge as Knowledge Graph Exchange (KGX) consistent TSVs.
### Dependencies (Python)
```python
pip install scikit-learn
pip install requests
pip install openpyxl
pip install pyyaml
pip install pandas
pip install numpy
pip install nltk
pip install xlrd
```### Usage (Unix)
```bash
pip install -e
main
```### KG Config
KG Configs are YAML configuration for any new Tablassert-generated knowledge graph. They contain basic information about the graph, what databases you want to connect, the table_configs you wish to include, which vectorizers/models to use, and some knowledge graph-wide hyperparameters.
```yaml
knowledge_graph_name :
vesion_number : # Must be a string like '1.0.0'
max_workers : # Max Parallel Processes
p_value_cutoff : # Max P Value
progress_handler_timeout : # For all SQL Databases and Queriesconfig_directories:
- # List of Directories That Contain Table Configs
override_sqlite : # Paths
supplement_sqlite :
babel_sqlite :
kg2_sqlite :
predicates_sqlite :confidence_model : # Pretrained Sklearn Linear Regression Model
tfidf_vectorizer : # Pretrained Sklearn TFIDF Vectorizer Model
```### Table Configs
Table Configs are YAML configuration for tabular data incorporated in a knowledge graph. They contain information about what to mine for knowledge, how to mine it for knowledge, adjustments to that knowledge, and hyperparameters dictating how Tablassert should behave. Typically, there are multiple Table Configs in each Tablassert-generated knowledge graph.
```yaml
# USE "~" FOR "None"column_style: # alphabetic (A-ZZ), numeric (1-100), else normal
method_notes : # Addtional details describing the methodology of the tabular data
data_location :
path_to_file : #
delimiter : # ONLY if CSV, TSV, TXT File
sheet_to_use : # ONLY if XLS or XLSX File
first_line : # First Line to Use / ONLY if XLS or XLSX File
last_line : # Last Line to Use / ONLY if XLS or XLSX Fileprovenance :
publication : # PMC/PMID/doi Identifier
publication_name : # Paper Title
authors :
year_published :
journal :
table_url : # Valid URL Telling Tablassert Where to Download the Desired Table
yaml_curator :
curator_organization :subject :
curie : #
# value :
# curie_column_name :
# value_column_name :
expected_classes : # List
- # biolink:Class
taxons : # Only For biolink:Gene Filtering / List
- # NCBITaxon:Taxon
regex_replacements : # List
- pattern :
replacement :predicate : # biolink:Predicate
object :
value_column_name : # Name of Column with Values
prefix : # List
- prefix : # Prefix
suffix :
- suffix : # Suffix
explode : # Delimeter to Split Values by Before Exploding to Separate Rows
fill_values : # How to Fill Empty Values in Column (ffil or bfill)reindex : # List
- mode : # Mode (greater_than_or_equal_to, less_than_or_equal_to, not_equal_to)
column : #nGoes By Final Column Names ONLY if Column is Included in the Final KG
value :attributes :
p : # P-Value
value : #
# column_name :
math : # List
- operation : #
parameter : # Optional:
order_last : # Optional:
# order_last is Required when parameter is Specified (Vice-Versa)
n : # Sample Size
relationship_strength : # Field Describing the Strength of an Edge
relationship_type : # Method for Strength
p_correction_method : # Field Describing If/How P-Value was Corrected
knowledge_level :
agent_type :sections : # Can List Multiple
- #
# For example...
# attributes :
# p :
# value :
# object :
# curie :
# prefix :
# - prefix :
```