{"id":48577374,"url":"https://github.com/ptrebert/sciddo","last_synced_at":"2026-04-08T15:47:01.527Z","repository":{"id":55960588,"uuid":"130335585","full_name":"ptrebert/sciddo","owner":"ptrebert","description":"Home of the SCIDDO tool","archived":false,"fork":false,"pushed_at":"2022-09-15T17:51:40.000Z","size":1052,"stargazers_count":1,"open_issues_count":3,"forks_count":1,"subscribers_count":2,"default_branch":"master","last_synced_at":"2026-04-08T15:47:00.849Z","etag":null,"topics":["bioinformatics","chromatin","epigenomics","tool"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ptrebert.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2018-04-20T08:46:38.000Z","updated_at":"2021-08-02T05:21:11.000Z","dependencies_parsed_at":"2023-01-18T09:15:39.247Z","dependency_job_id":null,"html_url":"https://github.com/ptrebert/sciddo","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/ptrebert/sciddo","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ptrebert%2Fsciddo","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ptrebert%2Fsciddo/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ptrebert%2Fsciddo/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ptrebert%2Fsciddo/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ptrebert","download_url":"https://codeload.github.com/ptrebert/sciddo/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ptrebert%2Fsciddo/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31562696,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-08T14:31:17.711Z","status":"ssl_error","status_checked_at":"2026-04-08T14:31:17.202Z","response_time":54,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bioinformatics","chromatin","epigenomics","tool"],"created_at":"2026-04-08T15:47:01.408Z","updated_at":"2026-04-08T15:47:01.513Z","avatar_url":"https://github.com/ptrebert.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# SCIDDO: Score-based identification of differential chromatin domains\n\n## Publication\nManuscript: [DOI: 10.1093/bioinformatics/btaa960](https://doi.org/10.1093/bioinformatics/btaa960)\n\nbioRxiv preprint: [DOI: 10.1101/441766 ](https://doi.org/10.1101/441766) \n\n## Use cases\nSCIDDO is a tool for the differential analysis of histone chromatin data.\nSCIDDO uses chromatin state segmentation maps, e.g., as generated by ChromHMM or EpiCSeg,\nfor identifying regions of differential chromatin state between individual samples\nor groups of replicated samples.\n\nThe detected differential chromatin domains can be expected to overlap largely \nwith regulatory regions or differentially expressed genes (see our manuscript \npreprint for detailed results). Moreover, the score-based approach implemented \nin SCIDDO affords a straightforward customization of scoring chromatin state \ndifferences to emphasize different aspects of chromatin dynamics.\n\n## Code maturity\nSCIDDO is currently in BETA status\n\nmaster branch:\n[![Build Status](https://travis-ci.org/ptrebert/sciddo.svg?branch=master)](https://travis-ci.org/ptrebert/sciddo)\n\ndev branch:\n[![Build Status](https://travis-ci.org/ptrebert/sciddo.svg?branch=develop)](https://travis-ci.org/ptrebert/sciddo)\n\n## Setup\nSCIDDO supports only Linux environments (that is unlikely to change in the future) and is developed using Python3.6.\nOther Python3.x versions may or may not work, but are not officially supported.\n\nFor easy setup, it is highly recommended to install SCIDDO inside a dedicated Conda environment.\nA suitable environment is specified in `environments/sciddo_env.yml`.\n\nOtherwise, install the HDF5 library (tested with version 1.8.18) as appropriate for your local environment,\nand the necessary Python dependencies from the `requirements.txt` file:\n\n```bash\nsudo apt-get install libhdf5\nsudo pip install -r requirements.txt\n```\n\nEmpirically, the setup of PyTables and HDF5 can create some headaches.\nIn this case, the best advice is to use Conda.\n\nAfter all dependencies have been installed successfully,\nrun the SCIDDO setup as appropriate for your environment:\n\n```bash\n[sudo] python setup.py install\n```\n\n## Execution\n\n### Input and output data formats\nSCIDDO supports common text-based input and output data formats. Chromatin state segmentations as tabular (BED-like) files\nshould be compatible as long as they have a fixed bin width of at least 100 bp. Output files from ChromHMM or EpiCSeg\nare supported out-of-the-box, and SCIDDO is designed to be used immediately downstream of these tools (e.g., SCIDDO knows\nthat ChromHMM segmentation files have the suffix \"_segments.bed\" and will strip that from file names before determining\npossible sample labels). Auxiliary files such as chromatin state label or color mappings are supoprted in form of simple\ntab-separated \"key-value\" text files.\n\nSCIDDO's internal data managements is realized with the popular [pandas Python package](https://pandas.pydata.org/), and\ndata are stored in HDF5 files (*.h5) that are created with pandas. The main reason for using\nHDF5 files for storing data and metadata is efficiency, but all contents of a HDF5 file can be dumped to text.\nAfter the first step in a SCIDDO analysis of converting the input data to HDF5, all subsequent operations will be performed\non this HDF5 file.\n\nWhen dumping identified differential chromatin domains (DCDs) or raw candidate regions to text, the output adheres to the\nBED column layout (with header) `chromosome, start, end, name, score`, plus additional columns containing statistics and sample/group names.\nIf downstream tools cannot work with non-standard BED-like text files, a simple\n`cut -f 1,2,3,4,5 \u003cSCIDDO_TABLE\u003e.tsv \u003e \u003cSCIDDO_TABLE\u003e.bed` can be used to restrict the output to the first five,\nBED-compliant columns.\n\n### Getting help\n\n`sciddo.py --help` or `sciddo.py \u003cSUBCOMMAND\u003e --help` is your friend.\n\nFor a step-by-step help on how to use SCIDDO, please refer to the [tutorial hosted as part of this repositry](testdata/tutorial.md).\n\n### Standard analysis run\n\nA standard SCIDDO analysis run is split into several distinct steps that are realized by different code modules.\nBesides module specific parameters, there are several global parameters to adjust SCIDDO's runtime behavior.\nImportantly, these global parameters always have to be specified before the subcommand, i.e.,\n\n```\nsciddo.py [GLOBAL_PARAMETERS] \u003cSUBCOMMAND\u003e [MODULE_PARAMETERS]\n```\n\nThe global parameters are:\n\n```bash\n--workers: number of CPUs to use (no sanity checks!)\n--debug: print debug messages to stderr; otherwise, SCIDDO operates silently\n--config-dump: folder to dump run configuration (JSON); defaults to current working directory\n--no-dump: do not dump run configuration\n```\n\n#### Step 1: convert\n \nConvert all input data (state segmentations plus metadata) into a binary HDF5 file. Currently, ChromHMM\nand EpiCSeg output files are supported out-of-the-box. This creates the SCIDDO DATA file.\n\n```bash\nsciddo.py [GLOBAL_PARAMETERS] convert --help\n```\n\n#### Step 2: stats\n\nCompute a bunch of statistics (e.g., state composition per sample) that are potentially needed downstream.\n\n```bash\nsciddo.py [GLOBAL_PARAMETERS] stats --help\n```\n\n#### Step 3: score\n\nAdd scoring schemes (matrices) to the dataset. These can be derived automatically from the state segmentation\nmodel emissions (if provided during the convert step), or can be supplied in form of a user-defined file.\nNote that, in principle, an arbitrary number of scoring schemes can be added to a dataset.\n\n```bash\nsciddo.py [GLOBAL_PARAMETERS] score --help\n```\n\n#### Step 4: scan\n\nScan the dataset for differential chromatin domains. As opposed to the previous commands, this creates a separate\noutput file per run, i.e., the SCIDDO RUN file.\n\n```bash\nsciddo.py [GLOBAL_PARAMETERS] scan --help\n```\n\n#### Step 5: dump\n\nAll data and metadata in the SCIDDO DATA and RUN file can be dumped to text files (e.g., TSV tables or BED files) for downstream analysis.\n\n```bash\nsciddo.py [GLOBAL_PARAMETERS] dump --help\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fptrebert%2Fsciddo","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fptrebert%2Fsciddo","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fptrebert%2Fsciddo/lists"}