{"id":35122027,"url":"https://github.com/bialimed/miniti","last_synced_at":"2026-05-21T19:38:09.019Z","repository":{"id":238583869,"uuid":"550846970","full_name":"bialimed/MInITI","owner":"bialimed","description":"Detecting microsatellites instability by high throughput sequencing.","archived":false,"fork":false,"pushed_at":"2024-06-09T18:45:51.000Z","size":14512,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-09-05T01:43:03.355Z","etag":null,"topics":["machine-learning","msi","ngs"],"latest_commit_sha":null,"homepage":"","language":"HTML","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/bialimed.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":"CITATION.cff","codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2022-10-13T12:28:12.000Z","updated_at":"2024-06-09T18:45:54.000Z","dependencies_parsed_at":null,"dependency_job_id":"130b1222-000d-42db-bbb0-58998cf57883","html_url":"https://github.com/bialimed/MInITI","commit_stats":null,"previous_names":["bialimed/miniti"],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/bialimed/MInITI","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bialimed%2FMInITI","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bialimed%2FMInITI/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bialimed%2FMInITI/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bialimed%2FMInITI/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/bialimed","download_url":"https://codeload.github.com/bialimed/MInITI/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bialimed%2FMInITI/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33311954,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-21T12:23:38.849Z","status":"ssl_error","status_checked_at":"2026-05-21T12:22:11.673Z","response_time":62,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["machine-learning","msi","ngs"],"created_at":"2025-12-28T00:09:09.819Z","updated_at":"2026-05-21T19:38:09.010Z","avatar_url":"https://github.com/bialimed.png","language":"HTML","funding_links":[],"categories":[],"sub_categories":[],"readme":"# \u003cimg src=\"doc/img/logo/MInITI_logo.png\" width=60 /\u003eMInITI: Microsatellites INstability from hIgh Throughput sequencIng\n\n![license](https://img.shields.io/badge/license-GPLv3-blue)\n[![DOI](https://zenodo.org/badge/550846970.svg)](https://zenodo.org/doi/10.5281/zenodo.11536493)\n\n## Table of Contents\n* [Description](#description)\n* [Workflows steps](#workflows-steps)\n* [Installation](#installation)\n* [Usage](#usage)\n* [Performances](#performances)\n* [Copyright](#copyright)\n\n## Description\nThis workflow classify microsatellites instability from high throughput\nsequencing on Illumina's instruments.\n\nSample classification is based on comparison with a learning model creating from\na panel of stable and unstable. As consequence, it does not need a normal tissue\nfor evaluated sample and the application come with two workflows: learn and tag.\n\n`Learn` produces the learning model from a list of samples, their known\nclassification and the list of MSI targets. It must be run on data coming from\nyour laboratory process and the resulting model should be used to classify data\ngenerated using the same protocols.\n\n`Tag` classifies loci and samples, produces confidence score for these\nclassifications and writes an interactive report.\n\n## Workflows steps\n### 1. MInITI learn\n\u003cfigure\u003e\n    \u003cimg src=\"doc/img/wf/MInITI_learn.png\" /\u003e\n    \u003cfigcaption align = \"center\"\u003e\u003cb\u003eFig.1 - Learn steps\u003c/b\u003e\u003c/figcaption\u003e\n\u003c/figure\u003e\n\nThe `learn` workflow produces the learning model from a list of samples, their\nknown classification and the list of MSI targets. It can be run only once for\nyour panel. The model created will be one of the input of all run of the MInITI\ntag on the same panel with the same laboratory protocol.\n\nWorkflow (see Fig.1):\n* If you start from the FastQ, firsts steps are the alignment of reads and the\nduplicates marking.\n* Then, the distribution of reads lengths for each locus is retrieve.\n* Finally, features will be used in classifiers decision (MInITI tag) are\ncalculated. Lengths distributions, classifiers features and status of each loci\nare then stored in the model file.\n\n### 2. MInITI tag\n\u003cfigure\u003e\n    \u003cimg src=\"doc/img/wf/MInITI_tag.png\" /\u003e\n    \u003cfigcaption align = \"center\"\u003e\u003cb\u003eFig.2 - Tag steps\u003c/b\u003e\u003c/figcaption\u003e\n\u003c/figure\u003e\n\nThe `tag` workflow classifies loci and samples, produces confidence score for\nthese classifications and writes an interactive report.\n\nWorkflow (see Fig.2):\n* If you start from the FastQ, firsts steps are the alignment of reads and the\nduplicates marking.\n* Then, the distribution of reads lengths for each locus is retrieve.\n* This distribution is used by three independant classifiers to tag loci by\ncomparison to model. Then the sample class is inferred by instability ratio on\nthese loci. The classifiers used on loci are:\n * An [mSINGS](https://bitbucket.org/uwlabmed/msings/src/master/) reimplementation,\n * An [MSISensor-pro](https://github.com/xjtu-omics/msisensor-pro) pro algorithm's\n reimplementation,\n * A classifier from [sklearn](https://scikit-learn.org/stable/) (default: random\n forest)\n* Finally, results from all classifiers are merged and a report is produced.\n\n## Installation\n### 1. Download code\nUse one of the following:\n\n* [user way] Downloads the latest released versions from\n`https://github.com/bialimed/miniti/archive/releases`.\n* [developper way] Clones the repository from the latest unreleased version:\n\n      git clone --recurse-submodules git@github.com:bialimed/miniti.git\n\n### 2. Install dependencies\n* conda (\u003e=4.6.8):\n\n      # Install conda\n      wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh \u0026\u0026 \\\n      sh Miniconda3-latest-Linux-x86_64.sh\n\n      # Install mamba\n      conda activate base\n      conda install -c conda-forge mamba\n\n  More details on miniconda install [here](https://docs.conda.io/en/latest/miniconda.html).\n\n* snakemake (\u003e=5.4.2):\n\n      mamba create -c conda-forge -c bioconda -n miniti snakemake==6.15.0\n\n  More details on snakemake install [here](https://snakemake.readthedocs.io/en/stable/getting_started/installation.html).\n\n* In folder `${APP_DIR}/test/config` set variable corresponding to genome\n(`##${GENOME_HG38}##`) in `wf_learn_config.yml` and `wf_tag_config.yml`.\n\n* Install rules dependencies (bwa, picard, ...):\n\n      conda activate miniti\n      cd ${APP_DIR}/test\n      snakemake \\\n        --cores 1 \\\n        --use-conda \\\n        --conda-prefix ${application_env_dir} \\\n        --conda-create-envs-only\n        --snakefile ${APP_DIR}/Snakefile_learn \\\n        --configfile config/wf_learn_config.yml\n\n### 3. Test install\n* Launch test with following command:\n\n      conda activate miniti\n      ${APP_DIR}/test/launch_wf.sh \\\n        ${CONDA_ENVS_DIR} \\\n        ${WORK_DIR} \\\n        ${CLUSTER_PARAMS}\n\n  Example with scheduler slurm:\n\n      conda activate miniti\n      ~/soft/miniti/test/launch_wf.sh \\\n        /work/$USER/conda_envs/envs \\\n        /work/$USER/test_miniti \\\n        'sbatch --partition={resources.partition} --mem={resources.mem} --cpus-per-task={threads}'\n\n* See results in `${WORK_DIR}/tag/report/run.html`.\n\n## Usage\n### 1. MInITI learn\n#### Important considerations\n\nNumber of Samples for model creation: To provide an accuracy compatible with\nmedical process it is necessary to capture all the diversity on your data for\neach class. A large number of samples in each class can easily this goal and\nreduce bias. If only a small number of samples are available, at least 30 stable\nand 30 unstable samples per locus should be a minimum to create the model on one\npathology.\n\nStatus per loci: MInITI classifies each microsatellite by comparison of these\nrepeat lengths (from reads) to a set of stable and unstable lengths distribution\non same locus. Since an unstable sample may contain both stable and unstable\nloci, it is best to define the status of each locus for each sample in model.\nHowever, it is possible not to differentiate the status of the loci. In this\ncase, indicate the same status as the sample for all loci in `input.known_status`\nparameter (see configuration). Take care, this configuration is not recommended\nand requires two elements:\n* A larger number of samples in model.\n* A majority of samples where majority of loci with the same status as the sample.\n\n\n#### Configuration\nCopy `${APP_DIR}/config/config_learn_tpl.yml` in your current directory and\nchange values before launching. Minium required changes:\n * Set `input.aln_pattern` if you start from mak duplicates alignments or set \n `input.R1_pattern` and `input.R2_pattern` if you start from FastQ.\n * Set path to file describing status in `input.known_status`. This TSV file\n contain status (MSI or MSS or Undetermined) of each analysed locus (columns)\n for each sample (rows). See example test/config/known_status.tsv.\n * Set path to the file containing locations of targeted microsatellites in\n `reference.microsatelites` (format BED).\n * Set path to the reference genome file in `reference.sequences` (format Fasta).\n It must be indexed (FAI) and if start from FastQ it must be indexed for BWA.\n\nDetails on each parameter can be found in config_learn_tpl.yml.\n\n#### launch command\n    conda activate miniti\n    snakemake \\\n      --use-conda \\\n      --conda-prefix ${application_env_dir} \\\n      --jobs ${nb_jobs} \\\n      --jobname \"miniti.{rule}.{jobid}\" \\\n      --latency-wait 100 \\\n      --snakefile ${application_dir}/Snakefile_learn \\\n      --cluster \"sbatch --partition={resources.partition} --mem={resources.mem} --cpus-per-task={threads}\" \\\n      --configfile workflow_parameters.yml \\\n      --directory ${out_dir} \\\n      \u003e ${out_dir}/wf_log.txt \\\n      2\u003e ${out_dir}/wf_stderr.txt\n\n#### Output directory\nThe main elements of the output directory are the following:\n\n    out_dir/\n    ├── ...\n    └── microsat/\n        ├── microsatModel_info.tsv\n        └── microsatModel.json\n\n`${out_dir}/microsat/microsatModel_info.tsv` contains number of sample kept in\nmodel for each locus and status (see Fig.3).\n\n\u003cfigure\u003e\n    \u003cimg src=\"doc/img/reports/learn.png\" /\u003e\n    \u003cfigcaption align = \"center\"\u003e\u003cb\u003eFig.3 - Model report\u003c/b\u003e\u003c/figcaption\u003e\n\u003c/figure\u003e\n\n`${out_dir}/microsat/microsatModel.json` contains lengths distributions,\npre-calculated classifiers features and associated status in computer readable\nformat defined by [AnaCore](https://github.com/bialimed/AnaCore) library.\n\n### 2. MInITI tag\n#### Configuration\nCopy `${APP_DIR}/config/config_tag_tpl.yml` in your current directory and change\nvalues before launching. Minium required changes:\n * Set path to learning model generated by MInITI learn in `classifier.model`.\n * Set the instability threshold for your panel in `classifier.sample.instability_threshold`.\n * Set `input.aln_pattern` if you start from mak duplicates alignments or set \n `input.R1_pattern` and `input.R2_pattern` if you start from FastQ.\n * Set path to the file containing locations of targeted microsatellites in\n `reference.microsatelites` (format BED).\n * Set path to the reference genome file in `reference.sequences` (format Fasta).\n It must be indexed (FAI) and if start from FastQ it must be indexed for BWA.\n\nDetails on each parameter can be found in config_tag_tpl.yml.\n\n#### launch command\n    conda activate miniti\n    snakemake \\\n      --use-conda \\\n      --conda-prefix ${application_env_dir} \\\n      --jobs ${nb_jobs} \\\n      --jobname \"miniti.{rule}.{jobid}\" \\\n      --latency-wait 100 \\\n      --snakefile ${application_dir}/Snakefile_tag \\\n      --cluster \"sbatch --partition={resources.partition} --mem={resources.mem} --cpus-per-task={threads}\" \\\n      --configfile workflow_parameters.yml \\\n      --directory ${out_dir} \\\n      \u003e ${out_dir}/wf_log.txt \\\n      2\u003e ${out_dir}/wf_stderr.txt\n\n#### Output directory\nThe main elements of the output directory are the following:\n\n    out_dir/\n    ├── ...\n    └── report/\n        ├── data/\n        |   └── sample-A_stabilityStatus.json\n        ├── ...\n        ├── run.html\n        └── sample-A.html\n\n`${out_dir}/report/data/${sample}_stabilityStatus.json` contains classification\ninformation about sample in computer readable format defined by\n[AnaCore](https://github.com/bialimed/AnaCore) library.\n\n`${out_dir}/report/${sample}.html` (see Fig.4) is an interactive report\nto inspect:\n * Sample classification and confidence score from all classifiers.\n * Loci sequencing depths, distribution lengths profile (see Fig.5),\n classifications and confidence score from all classifiers.\n\n\u003cfigure\u003e\n    \u003cimg src=\"doc/img/reports/tag.png\" /\u003e\n    \u003cfigcaption align = \"center\"\u003e\u003cb\u003eFig.4 - Sample report\u003c/b\u003e\u003c/figcaption\u003e\n\u003c/figure\u003e\n\u003cfigure\u003e\n    \u003cimg src=\"doc/img/reports/tag_example_distrib.png\" /\u003e\n    \u003cfigcaption align = \"center\"\u003e\u003cb\u003eFig.5 - Lengths distribution panel\u003c/b\u003e\u003c/figcaption\u003e\n\u003c/figure\u003e\n\n## Performances\nPerformance was evaluated on a dataset from 120 colorectal cancer patients.\nSamples were sequenced with a targeted panel (mutation hotspots and MSI) from\nFFPE block. The results summarized in [assessment/report.html](assessment/report.html).\nCommands and configurations used in evaluation process can be found in\n`assessment` folder.\n\n## Copyright\n2022 Laboratoire d'Anatomo-Cytopathologie du CHU Toulouse\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbialimed%2Fminiti","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbialimed%2Fminiti","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbialimed%2Fminiti/lists"}