{"id":15063975,"url":"https://github.com/bnediction/scboolseq","last_synced_at":"2026-03-09T19:13:49.224Z","repository":{"id":62591255,"uuid":"475625610","full_name":"bnediction/scBoolSeq","owner":"bnediction","description":"scBoolSeq: scRNA-Seq data binarisation and synthetic generation from Boolean dynamics","archived":false,"fork":false,"pushed_at":"2025-01-30T19:31:23.000Z","size":260,"stargazers_count":9,"open_issues_count":1,"forks_count":1,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-03-18T00:29:52.230Z","etag":null,"topics":["bioinformatics","boolean-networks","computational-biology","machine-learning","pandas","python3","scikit-learn","scrna-seq","single-cell-rna-seq"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/bnediction.png","metadata":{"files":{"readme":"README.md","changelog":"changelog.md","contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-03-29T21:29:41.000Z","updated_at":"2025-01-30T18:55:43.000Z","dependencies_parsed_at":"2024-04-05T17:42:46.602Z","dependency_job_id":"03adbe35-cc5c-41dd-847d-7fefa5bb5c07","html_url":"https://github.com/bnediction/scBoolSeq","commit_stats":null,"previous_names":[],"tags_count":8,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bnediction%2FscBoolSeq","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bnediction%2FscBoolSeq/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bnediction%2FscBoolSeq/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bnediction%2FscBoolSeq/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/bnediction","download_url":"https://codeload.github.com/bnediction/scBoolSeq/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248208634,"owners_count":21065203,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bioinformatics","boolean-networks","computational-biology","machine-learning","pandas","python3","scikit-learn","scrna-seq","single-cell-rna-seq"],"created_at":"2024-09-25T00:09:36.163Z","updated_at":"2026-03-09T19:13:49.178Z","avatar_url":"https://github.com/bnediction.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\n# scBoolSeq\n\nscRNA-Seq data binarisation and synthetic generation from Boolean dynamics.\n\n## Installation\n\n### Pip\n\n```\npip install scboolseq\n```\n\n### Conda\n\n```\nconda install -c conda-forge -c colomoto scboolseq\n```\n\n### Docker\n\n`scBoolSeq` is included in the [ColoMoTo Docker](http://colomoto.org/notebook) distribution.\n\n## Usage\n\n\u003c!--\n### Command line\n\nscBoolSeq provides a rich CLI allowing programmatic access to its main functionalities, namely the `binarization` of RNA-Seq data and the \ngeneration of synthetic RNA-Seq data `synthesis` reflecting activation states from Boolean Network simulations. Once correctly instaled, \nthe tool's and subcommand's help explain all the possible parameters. Some minimal examples are here presented.\n\n#### Main CLI\n\n```bash\n$ scBoolSeq -h\nusage: scBoolSeq \u003ccommand\u003e [\u003cargs\u003e]\n\nAvailable commands:\n\t* binarize\t Binarize a RNA-Seq dataset.\n\t* synthesize\t Simulate a RNA-Seq experiment from Boolean dynamics.\n\t* from_file\t Repeat a binarization or synthethic generation experiment, based on a config file.\n\nNOTE on TSV/CSV file specs:\n* If '.csv', the file is assumed to use the standard separator for columns ','.\n* The index (gene or sample identifiers) is assumed to be the first column.\n* The scBoolSeq is designed with consistency in mind. \n  The `output` (binarized or synthetic expression frame) will have the same disposition \n  (genes x observations | observations x genes) as the `input`. \n  If a `reference` is specified, its disposition must match the `input`'s.\n\nscBoolSeq: bulk and single-cell RNA-Seq data binarization and synthetic generation from Boolean dynamics.\n\npositional arguments:\n  command     Subcommand to run\n\noptional arguments:\n  -h, --help  show this help message and exit\n```\n\n#### Binarization\n\nMinimal example of binarization, specifying some optional parameters.\n\n```bash\ncurl -fOL https://github.com/pinellolab/STREAM/raw/master/stream/tests/datasets/Nestorowa_2016/data_Nestorowa.tsv.gz\n\nls\n# data_Nestorowa.tsv.gz\ntime scBoolSeq binarize data_Nestorowa.tsv.gz --genes-are-rows\\\n--output Nestorowa_binarized.csv --n-threads 10 --dump-config --dump-criteria\n# ________________________________________________________\n# Executed in   34.49 secs   fish           external \n#   usr time   30.04 secs  1211.00 micros   30.04 secs \n#   sys time    3.90 secs  171.00 micros    3.89 secs \n\nls\n# data_Nestorowa.tsv.gz    scBoolSeq_criteria_data_Nestorowa_2022-04-27_15h14m27.tsv\n# Nestorowa_binarized.csv  scBoolSeq_experiment_config_2022-04-27_15h14m27.toml\n\n# Visualize the binarized expression frame. \n# Note that some entries are undefined (NaN)\n# These might be discarded genes for which no binarization or synthesis can occur,\n# or observations which did not pass the thresholds to be set to 0 or 1.\npython -c 'import pandas as pd; pd.read_csv(\"Nestorowa_binarized.csv\", index_col=0).iloc[0:7, 0:7]'\n#             Clec1b  Kdm3a  Coro2b  8430408G22Rik  Clec9a  Phf6  Usp14\n# HSPC_025       NaN    1.0     NaN            NaN     NaN   0.0    0.0\n# HSPC_031       NaN    1.0     NaN            NaN     NaN   0.0    0.0\n# HSPC_037       NaN    0.0     1.0            NaN     NaN   0.0    1.0\n# LT-HSC_001     NaN    0.0     1.0            NaN     NaN   1.0    0.0\n# HSPC_001       NaN    0.0     1.0            NaN     NaN   1.0    0.0\n# HSPC_008       1.0    1.0     NaN            NaN     NaN   1.0    0.0\n# HSPC_014       NaN    0.0     NaN            NaN     NaN   0.0    1.0\n```\n\n#### Synthetic generation from Boolean states\n\n```bash\ncat minimal_boolean_example.csv \n# the output is not commented out so that it can be copied\n# and perhaps be read with `x = pandas.read_clipboard(sep=',', index_col=0)`\n,HSPC_025,HSPC_031,HSPC_037,LT-HSC_001,HSPC_001,HSPC_008,HSPC_014,HSPC_020,HSPC_026,HSPC_038,LT-HSC_002,HSPC_002,HSPC_009,HSPC_015,HSPC_021\nKdm3a,1.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0\nCoro2b,1.0,1.0,1.0,1.0,1.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,1.0\n8430408G22Rik,1.0,0.0,0.0,1.0,0.0,1.0,1.0,1.0,1.0,0.0,0.0,0.0,1.0,0.0,1.0\nClec9a,1.0,0.0,0.0,1.0,1.0,0.0,1.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0\nPhf6,0.0,0.0,0.0,1.0,1.0,1.0,0.0,1.0,1.0,1.0,0.0,1.0,0.0,1.0,0.0\n\n\n# Generate 20 samples per boolean state, using 12 threads\n# setting the random number generator's seed ensures reproductiblility.\ntime scBoolSeq synthesize --genes-are-rows minimal_boolean_example_T.csv --reference data_Nestorowa.tsv.gz\\\n--n-samples 20 --output new_synthetic.tsv --n-threads 12 --rng-seed 1234\n# ________________________________________________________\n# Executed in   43.85 secs   fish           external \n#    usr time   22.08 secs    0.00 millis   22.08 secs \n#    sys time    3.65 secs    3.31 millis    3.65 secs \n\n# visualize the newly generated synthetic scRNA-Seq experiment\npython -c 'import pandas as pd; pd.read_csv(\"new_synthetic.tsv\", index_col=0, sep=\"\\t\").iloc[0:3, 0:7]'\n#                HSPC_025  HSPC_031  HSPC_037  LT-HSC_001  HSPC_001  HSPC_008  HSPC_014\n# Kdm3a          7.328819  8.536391  0.000000    0.000000  0.821561  7.030519  1.891949\n# Coro2b         0.000000  0.000000  6.457878    5.479887  0.000000  0.000000  5.503554\n# 8430408G22Rik  0.000000  0.005110  0.000000    0.000000  0.000000  6.428994  0.000000\n```\n--\u003e\n\n### Python API\n\nHere a minimal example is presented, using the same dataset as the CLI usage guide.\nFor further information, please check the documentation.\n\n```python\nimport pandas as pd\nfrom scboolseq import scBoolSeq\n\n# read in the normalized expression data\nnestorowa = pd.read_csv(\"data_Nestorowa.tsv.gz\", index_col=0, sep=\"\\t\")\nnestorowa.iloc[1:5, 1:5] \n#                HSPC_031  HSPC_037  LT-HSC_001  HSPC_001\n# Kdm3a          6.877725  0.000000    0.000000  0.000000\n# Coro2b         0.000000  6.913384    8.178374  9.475577\n# 8430408G22Rik  0.000000  0.000000    0.000000  0.000000\n# Clec9a         0.000000  0.000000    0.000000  0.000000\n#\n# NOTE : here, genes are rows and observations are columns\n\nscbool_nest = scBoolSeq()\n\n##\n## Binarization\n##\n\n# scBoolSeq expects genes to be columns, thus we transpose the DataFrame.\nscbool_nest.fit(nestorowa.T) # compute binarization criteria\n\nbinarized = scbool_nestorowa.binarize(nestorowa.T)\nbinarized.iloc[1:5, 1:5] \n#             Kdm3a  Coro2b  8430408G22Rik  Phf6\n# HSPC_031      1.0     NaN            NaN   0.0\n# HSPC_037      0.0     1.0            NaN   0.0\n# LT-HSC_001    0.0     1.0            NaN   1.0\n# HSPC_001      0.0     1.0            NaN   1.0\n\n\n##\n## Synthetic RNA-Seq generation from Boolean states\n##\n\n# We load in a boolean trace obtained from the simulation of a Boolean model\nboolean_trace = pd.read_csv(\"boolean_dynamics.csv\", index_col=0)\nboolean_trace\n#             Kdm3a  Coro2b  8430408G22Rik  Phf6\n# init          1.0     0.0            1.0   0.0\n# transient_1   0.0     1.0            1.0   0.0\n# transient_2   0.0     1.0            0.0   1.0\n# stable_state  0.0     1.0            1.0   1.0\n\nsynthetic_scrna_pseudocounts = scbool_nestorowa.sample_counts(boolean_trace) \n```\n\n## Contributors\n\n* [Gustavo Magaña López](https://github.com/gmagannaDevelop)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbnediction%2Fscboolseq","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbnediction%2Fscboolseq","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbnediction%2Fscboolseq/lists"}