https://github.com/ggonnella/expectation_rules
A collection of expectations about the contents of prokaryotic genomes
https://github.com/ggonnella/expectation_rules
Last synced: 3 months ago
JSON representation
A collection of expectations about the contents of prokaryotic genomes
- Host: GitHub
- URL: https://github.com/ggonnella/expectation_rules
- Owner: ggonnella
- Created: 2023-03-25T21:41:01.000Z (about 2 years ago)
- Default Branch: main
- Last Pushed: 2023-05-17T07:08:37.000Z (about 2 years ago)
- Last Synced: 2025-01-16T16:23:20.734Z (4 months ago)
- Language: Python
- Size: 1.55 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
This is a database of expectation rules
about the contents of prokaryotic genomes,
manually extracted from scientific literature.# Format
The rules are expressed in EGC format,
described in the manuscript
"EGC: a format for expressing prokaryotic genomes content expectations",
available at
https://doi.org/10.48550/arXiv.2303.08758An implementation in TextFormats of the EGC format
is available at
https://github.com/ggonnella/egc-spec# Organization
```
1_find_articles creation of articles lists for searching expectations
|
|- protocols protocols / pipelines for the creation
| |
| |- H related to genomes of prokaryotes in hydrothermal
| | vents; done using Pubmed queries
| |
| |- AB related to the bacterial (B) and archaeal (A);
| from entries in the NCBI assembly database
|
|- statistics analysis of the results, e.g. basic statistics2_process_articles extraction of expectations from articles
|
|- protocols protocols (scratchpad) for the H, A and B lists
|
|- results list of processed documents;
| extracted sentences/tables/paragraphs
|
|- validation scripts/pipeline for results validation
|
|- statistics basic statisics about the processed articles
| and the resultsscripts scripts are contained here and linked in the
protocols and statistics directories3_process_extracts final EGC files with the rules collections
|
|-validation validation of the EGC files
|
|-statistics statistics about the contents of the EGC files
```## Acknowledgements
This rule collection has been created in context of the DFG project GO 3192/1-1
“Automated characterization of microbial genomes and metagenomes by collection
and verification of association rules”. The funders had no role in study
design, data collection and analysis.