https://github.com/baderlab/geneeval
A Python library for evaluating gene embeddings.
https://github.com/baderlab/geneeval
Last synced: 11 months ago
JSON representation
A Python library for evaluating gene embeddings.
- Host: GitHub
- URL: https://github.com/baderlab/geneeval
- Owner: BaderLab
- License: apache-2.0
- Created: 2020-07-15T17:58:40.000Z (almost 6 years ago)
- Default Branch: master
- Last Pushed: 2023-03-06T15:57:21.000Z (about 3 years ago)
- Last Synced: 2025-06-29T08:40:55.896Z (11 months ago)
- Language: Python
- Homepage:
- Size: 281 KB
- Stars: 4
- Watchers: 2
- Forks: 0
- Open Issues: 13
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README

[](https://codecov.io/gh/BaderLab/GeneEval)

# GeneEval
A Python library for benchmarking gene function prediction.
## Installation
Latest PyPI release
```bash
pip install geneeval
```
From source
```bash
# Install poetry for your system: https://python-poetry.org/docs/#installation
curl -sSL https://raw.githubusercontent.com/python-poetry/poetry/master/get-poetry.py | python
# Clone and move into the repo
git clone https://github.com/BaderLab/GeneEval.git
cd GeneEval
# Install the package with poetry
poetry install
```
If you plan on evaluating fixed-length feature vectors (see [Usage](#usage)), please install with `pip install "geneeval[features]"` (or `poetry install -E "features"` if installing from source).
## Usage
First, download the benchmark with the `prepare` command
```bash
geneeval prepare "./benchmark.json"
```
There are two ways to run the evaluation, depending on your method.
### Methods that produce fixed-length feature vectors
If your method produces a fixed-length feature vector for each gene ID in the benchmark, collect these in a comma-separated file, e.g.
```
Q8W5R2, 0.2343, -0.1242, 0.5431, -0.3475, 0.9373
Q99732, -0.9323, 0.2212, -0.4331, -0.8634, 0.8373
P83774, 0.5633, -0.6242, 0.3723, -0.2375, -0.1673
Q1ENB6, 0.1433, -0.3242, 0.5323, -0.9975, -0.4573
Q9XF19, 0.5621, -0.4272, 0.9743, -0.1373, -0.2173
```
> You can prepare a `.csv`, `.tsv`, `.txt` (separated by spaces) or a `.json` file (where the vectors are keyed by gene IDs). We will correctly parse the file based on its file extension.
and then call the `evaluate features` command
```bash
geneeval evaluate features "./features.csv"
```
These features will be used as input to simple classifiers, which will be evaluated with a grid search over the benchmark tasks.
### Methods that do not produce fixed-length feature vectors
For all other methods, you simply need to produce predictions for each task in the benchmark that you wish to evaluate on. Predictions should be collected in a `.json` file keyed by task name, data partition, and gene ID, e.g.
```json
{
"subcellular_localization": {
"train": {
"Q8W5R2": "M",
"Q99732": "M",
"P83774": "S"
},
"valid": {
"Q1ENB6": "S"
},
"test": {
"Q9XF19": "S"
}
}
}
```
> We make no assumptions about how these predictions are obtained.
then, the `evaluate predictions` command can be used to obtain a score on the tasks
```bash
geneeval predictions "./predictions.json"
```