https://github.com/translator-catrax/curie-clean
A Clojure utility for analyzing and resolving duplicate nodes in knowledge graph exchange (KGX) formatted knowledge graphs, with Tablassert integration and Biolink Model compliance. It rewrites Tablassert-integrated "table_configs" through a CLI.
https://github.com/translator-catrax/curie-clean
clojure knowledge-graph ncats-translator quality-control tablassert yaml-configuration
Last synced: 7 months ago
JSON representation
A Clojure utility for analyzing and resolving duplicate nodes in knowledge graph exchange (KGX) formatted knowledge graphs, with Tablassert integration and Biolink Model compliance. It rewrites Tablassert-integrated "table_configs" through a CLI.
- Host: GitHub
- URL: https://github.com/translator-catrax/curie-clean
- Owner: Translator-CATRAX
- License: other
- Created: 2025-03-26T23:26:20.000Z (7 months ago)
- Default Branch: main
- Last Pushed: 2025-03-26T23:36:41.000Z (7 months ago)
- Last Synced: 2025-03-27T00:26:14.809Z (7 months ago)
- Topics: clojure, knowledge-graph, ncats-translator, quality-control, tablassert, yaml-configuration
- Language: Clojure
- Homepage:
- Size: 0 Bytes
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# curie-clean (1.0.0)
## Skye Goetz (ISB) 03/26/2025
A Clojure utility for analyzing and resolving duplicate nodes in knowledge graph exchange **(KGX)** formatted knowledge graphs, with [**Tablassert**](https://github.com/SkyeAv/Tablassert3.0.0) integration and **Biolink Model** compliance. It **updates** Tablassert-integreated **"table_configs"** through a **CLI**.
## Features
- **Duplicate Detection**: Identifies duplicate entries in TSV files
- **Interactive Resolution**: CLI prompts for handling conflicts
- **YAML Configuration**: Generates config files to prevent future duplicates## Requirements
- **Clojure 1.10+**
- Java JDK 8+
- Leiningen## Usage
For arguments...
```
# With Leiningen
lein run -h
```**To resolve duplicates...**
```bash
# With Leiningen
lein run -n nodes.tsv -e edges.tsv
```To test the application...
```bash
# With Leiningen
lein test
```## Directory Structure
```txt
curie-clean/
├── src/
│ └── duplicate_utility/
│ ├── io/
│ │ ├── tsv.clj
│ │ └── yaml.clj
│ ├── processing/
│ │ ├── duplicates.clj
│ │ └── resolution.clj
│ ├── core.clj
│ └── validation.clj
└── test/
└── duplicate_utility/
├── io/
│ ├── tsv_test.clj
│ └── yaml_test.clj
├── processing/
│ ├── duplicates_test.clj
│ └── resolution_test.clj
├── core_test.clj
├── test_utils.clj
└── validation_test.clj
```