https://github.com/hubmapconsortium/soft-assay-rules
Rules for "soft" assay classification, and tools to generate and test them.
https://github.com/hubmapconsortium/soft-assay-rules
ot2od030545
Last synced: 16 days ago
JSON representation
Rules for "soft" assay classification, and tools to generate and test them.
- Host: GitHub
- URL: https://github.com/hubmapconsortium/soft-assay-rules
- Owner: hubmapconsortium
- License: mit
- Created: 2023-12-21T21:54:11.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2025-08-21T20:20:29.000Z (10 months ago)
- Last Synced: 2025-08-21T22:46:25.253Z (10 months ago)
- Topics: ot2od030545
- Language: Python
- Homepage:
- Size: 319 KB
- Stars: 0
- Watchers: 13
- Forks: 0
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# soft-assay-rules
Rules for "soft" assay classification, and tools to generate and test them.
## About
Between the time a dataset is submitted by a data provider and the time it is accessed
by a potential user, many steps must occur.
* The provided dataset or upload must be validated as syntactially correct.
* The data must be "ingested", so that its type, location, and properties are known to the
larger system.
* The dataset must be processed to make its content useful. For example, image stitching or
RNA analysis may be required. The steps required depend on the detailed structure of the
data.
* The data and the results of the analysis must be displayed to the user. This again
depends on the detailed structure of the data and that of the derived data produced by any
analysis.
The *Soft Assay Classifier Rule Engine* is one mechanism by which these relationships are
managed. A set of rules is applied to a detailed description of the original data format. Rules
that match the data are activated, yielding a summary of the properties of the data which can
be used by various downstream components to decide how to describe, display, or process the
data. This repo contains the development history of the rule chain, plus tools to generate
and test the rule chain. When a new version of the rule chain is ready it is exported to
another repo to actually be installed in the rule engine.
Once installed, the rule chain can be triggered in response to a POST request containing
a metadata.tsv record in JSON form, or in response to a GET request including a uuid or
HuBMAP/SenNet ID. In the the former case the rule chain is passed only the given JSON
with an added pair with with key "sample_is_human" and a boolean value. This POST
mechanism is used when validating and ingesting new external data.
When called with a GET request and uuid or ID, the entity JSON block for the given
entity is fetched and several values are produced from that metadata if possible,
including:
* the ingest metadata, if present
* the entity type, typically 'Dataset' or 'Publication'
* information from the dag provenance list, or an empty list if it is unavailable
* data_types information
* the entity creation action
* sample_is_human, as inferred from the entity provenance
These values are used to construct a JSON block which is passed to the rule chain.
## Unit Tests
Assuming the python environment specified in `requirements.txt` is in place, unit tests can be
run from the top level directory test.sh script:
```
bash ./test.sh
```
The rule chain is tested, using examples stored in src/soft_assay_rules/test_examples and making
use of cached entity-api output where necessary (see below). The function source_is_human() is also
tested against cached entity-api output.
## Running Other Test Routines
The `src/soft_assay_rules` directory contains two test routines, `rule_tester.py` and `local_rule_tester.py` .
Both use the samples in the `test_examples` subdirectory. local_rule_tester.py uses cached values previously
fetched from the appropriate services (see the section on cached REST endpoint responses below).
The first of these accesses an ingest-api URL to run tests against a remote running rule engine,
and thus requires a live token. The token is provided through the environment variable AUTH_TOK . Since
opertions in the context of SENNET differ slightly from those in the HUBMAP context, that context must
also be provided. For example,
```
env AUTH_TOK= APP_CTX= python rule_tester.py test_examples/*
```
tests the remote rule engine against all the samples in the `test_examples` directory. If the SENNET
context is specified, examples taken from the HuBMAP side will fail, and vice versa.
`local_rule_tester.py` instantiates a local rule engine and installs the rules found in the
current `testing_rule_chain.json` file. It can be used to test new rules. Because it cannot query
entity-api when a uuid is specified, it must use cached results from the necessary queries. (See
the section on cached REST endpoint responses below). This test routine is invokes
as follows:
```
$ python ./local_rule_tester.py test_examples/*
```
## Cached REST Endpoint Responses
The utility routines `cache_responses.py` and `cache_ubkg_responses.py`
can be used to prefetch and save the entity-api, ingest-api, and UBKG metadata JSON
blocks associated with a given uuid, HuBMAP/SenNet ID, or UBKG code. They are called as follows:
```
env AUTH_TOK= APP_CTX= python cache_responses.py uuid1 [uuid2 [uuid3...]]
env AUTH_TOK= APP_CTX= python cache_ubkg_responses.py ubkg_code
```
The first causes the entity-api JSON content for the uuid and the ingest-api/assayclassifier/metadata JSON
content to be fetched and stored locally. The JSON returned by the deployed version of the rule chain
is printed, for convenience in setting up new unit tests. The second does the same for the UBKG response
associated with the given code.
Thus a new unit test corresponding to a
specific uuid in a specific APP_CTX can be set up by:
* prefetching and saving the appropriate JSON using `cache_responses.py`
* prefetching the ubkg_code used by that output using `cache_ubkg_responses.py`
* creating a new test case using that uuid, or the ingest metadata for that uuid
* saving the expected JSON output of the rule chain as the desired test output