{"id":43513913,"url":"https://github.com/moka-guys/samplesheet_validator","last_synced_at":"2026-02-03T13:17:38.901Z","repository":{"id":76535454,"uuid":"304252206","full_name":"moka-guys/samplesheet_validator","owner":"moka-guys","description":"A package to validate the formatting of Illumina SampleSheets and which notifies the user of any issues","archived":false,"fork":false,"pushed_at":"2026-01-30T10:05:39.000Z","size":175,"stargazers_count":0,"open_issues_count":2,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-01-31T02:22:48.915Z","etag":null,"topics":["samplesheet","validation-tool"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/moka-guys.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2020-10-15T07:59:29.000Z","updated_at":"2026-01-21T11:28:09.000Z","dependencies_parsed_at":"2025-06-06T16:53:18.431Z","dependency_job_id":"8c6474c4-7fe1-45d3-967c-013f07ea25eb","html_url":"https://github.com/moka-guys/samplesheet_validator","commit_stats":null,"previous_names":["moka-guys/samplesheet_validator"],"tags_count":5,"template":false,"template_full_name":null,"purl":"pkg:github/moka-guys/samplesheet_validator","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/moka-guys%2Fsamplesheet_validator","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/moka-guys%2Fsamplesheet_validator/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/moka-guys%2Fsamplesheet_validator/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/moka-guys%2Fsamplesheet_validator/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/moka-guys","download_url":"https://codeload.github.com/moka-guys/samplesheet_validator/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/moka-guys%2Fsamplesheet_validator/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29046503,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-03T10:09:22.136Z","status":"ssl_error","status_checked_at":"2026-02-03T10:09:16.814Z","response_time":96,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["samplesheet","validation-tool"],"created_at":"2026-02-03T13:17:38.757Z","updated_at":"2026-02-03T13:17:38.891Z","avatar_url":"https://github.com/moka-guys.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Samplesheet Validator\n\nThis tool is designed to validate NGS samplesheets prior to downstream processing by performing a series of checks.\n\nIt can be used as a standalone process but was designed for integration into automated workflows through instantiation of the SamplesheetCheck class, which records validation outcome in a boolean flag Attribute (self.errors) and errors in a dict (self.errors_dict).\n\n## Use case\n\nThe tool has been designed for:\n1. Illumina sequencing runs with Samplesheets expected to end in \"_SampleSheet.csv\".\n2. AVITI runs.\n\nExpect run types include:\n1. Panel based NGS testing\n2. TSO500\n3. Oncodeep\n4. Archer\n5. MSK\n\n**Please note** this tool has been specifically designed for the Genome Informatics Service at Synnovis (including the use of the [seglh-naming](https://github.com/moka-guys/seglh-naming/) library) and therefore might require modifications for integration into alternative workflows.\n\n\n## Protocol\n\nSamplesheet validation is carried out in a series of consecutive steps with any errors identified recorded in the log file as per the [config file](samplesheet_validator/config.py).\n\nChecks:\n1. Samplesheet path provided is valid.\n2. Samplesheet matches expected naming:\n    - Illumina: checked against[seglh-naming](https://github.com/moka-guys/seglh-naming/) library\n    - AVITI: samplesheet name matches run folder name.\n3. The sequencer_id is in the allowed/validated list of sequencers for that run type.\n4. The samplesheet is not empty (\u003e10 bytes)\n5. If the run is a development run. **N.B.** If the run is a dev run no further samplesheet validation is performed. Further checks are only carried out for clinical runs.\n6. Samplesheet contains the minimum expected section headers\n7. Content in columns \"Sample_ID\" and \"Sample_Name\" match for each sample in the samplesheet\n8. Samplesheet doesn't contain any illegal characters\n9. Sample name matches expected naming convention for all samples. Assessed against [seglh-naming](https://github.com/moka-guys/seglh-naming/) library.\n10. The test code (pannumber) for each sample is in the list of expected test codes for the run type.\n11. Whether any TSO samples have been included on the run - Sets Boolean Attribute to true\n12. Whether any OKD samples are included on the run - Sets Boolean Attribute to true\n\n\n## Installation \u0026 Usage\n\n### From Python package\n\n1. Clone a copy of the repository locally\n\n    `git clone https://github.com/moka-guys/samplesheet_validator.git`\n\n2. cd in to the project root directory\n\n3. Install from python package\n\n    `python3 setup.py install`\n\n    NB's: Requires setuptools to be installed; Use the --user flag or install into an virtualenv/pipenv if not installing globally.\n\n4. Execute functionality from within a python script.\n\n    ```python\n\n    from samplesheet_validator.samplesheet_validator import SamplesheetCheck\n\n    sscheck_obj = SamplesheetCheck(\n        samplesheet_path,  # str\n        sequencer_ids,  # list\n        panels,  # list\n        tso_panels,  # list\n        okd_panels, # list\n        dev_pannos,  # list\n        logdir,  # str\n        illumina, # bool\n        runname, # str\n    )\n    sscheck_obj.ss_checks()  # Carry out samplesheeet validation\n\n    print(sscheck_obj.errors_dict)  # View the dictionary of error messages\n    ```\n\n### Command line\n\nTo use the validator from the command line set up an environment as below:\n```bash\npython3 -m venv venv\nsource venv/bin/activate\npip3 install -r requirements.txt\n```\n\nThe script can then be executed as follows:\n```bash\nusage: Used to validate a samplesheet using the seglh-naming conventions\n\nGiven an input samplesheet, will validate the samplesheet using seglh-naming conventions and output a logfile\n\noptions:\n  -h, --help            show this help message and exit\n  -S SAMPLESHEET_PATH, --samplesheet_path SAMPLESHEET_PATH\n                        Path to samplesheet requiring validation\n  -SI SEQUENCER_IDS, --sequencer_ids SEQUENCER_IDS\n                        Comma separated string of allowed sequencer IDS\n  -P PANELS, --panels PANELS\n                        Comma separated string of allowed panel numbers\n  -T TSO_PANELS, --tso_panels TSO_PANELS\n                        Comma separated string of tso panels\n  -O OKD_PANELS, --okd_panels OKD_PANELS\n                        Comma separated string of okd panels\n  -D DEV_PANNOS, --dev_pannos DEV_PANNOS\n                        Comma separated development pan numbers\n  -L LOGDIR, --logdir LOGDIR\n                        Directory to save the output logfile to\n  -NSH NO_STREAM_HANDLER, --no_stream_handler NO_STRAM_HANDLER\n                        Provide flag when we dont want a stream handler (prevents\n                        duplication of log messages to terminal if using another\n                        logging instance)\n  -R RUN_FOLDER_NAME, --runname RUN_FOLDER_NAME\n                        Str for processed folder name\n```\n\n## Testing\n\nThis repository currently has **93% test coverage**.\n\nTest datasets are stored in [/test/data](../test/data). The script has a full test suite:\n* [test_samplesheet_validator.py](../test/test_samplesheet_validator.py)\n\nSee [test/README.md](test/README.md) for details about test cases.\n\nThese tests should be run before pushing any code to ensure all tests in the GitHub Actions workflow pass. These can be run as follows:\n\n```bash\npython3 -m pytest\n```\n**N.B. Tests and test cases/files MUST be maintained and updated accordingly in conjunction with script development. This includes ensuring that the arguments passed to pytest in the [pytest.ini](pytest.ini) file are kept up to date**\n\n\n## Logging\n\nLogging is performed by [ss_logger](samplesheet_validator/ss_logger.py). The directory to save the log file to is supplied as an argument. The output log file is named by the script as follows:\n- `$LOGFILE_DIR/$RUNFOLDER_NAME_$TIMESTAMP_samplesheet_validator.log`\n\nThe script also collects the error messages as it runs, which can be used by other scripts when this script is used as an import.\n\n\n### Developed by the Synnovis Genome Informatics Team\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmoka-guys%2Fsamplesheet_validator","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmoka-guys%2Fsamplesheet_validator","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmoka-guys%2Fsamplesheet_validator/lists"}