{"id":42819768,"url":"https://github.com/nbtm-sh/samplesheet-utils","last_synced_at":"2026-01-30T06:48:50.089Z","repository":{"id":257536926,"uuid":"858524464","full_name":"nbtm-sh/samplesheet-utils","owner":"nbtm-sh","description":"Utility script(s) used for creating samplesheets from various sources","archived":false,"fork":false,"pushed_at":"2025-07-24T01:55:43.000Z","size":109,"stargazers_count":0,"open_issues_count":2,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-09-09T12:58:22.371Z","etag":null,"topics":["bioinformatics","python","samplesheet","tool","utils"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/nbtm-sh.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2024-09-17T03:46:22.000Z","updated_at":"2025-07-24T01:47:55.000Z","dependencies_parsed_at":"2025-09-09T12:58:23.434Z","dependency_job_id":null,"html_url":"https://github.com/nbtm-sh/samplesheet-utils","commit_stats":null,"previous_names":["australian-structural-biology-computing/create-samplesheet","nbtm-sh/samplesheet-utils"],"tags_count":17,"template":false,"template_full_name":null,"purl":"pkg:github/nbtm-sh/samplesheet-utils","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nbtm-sh%2Fsamplesheet-utils","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nbtm-sh%2Fsamplesheet-utils/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nbtm-sh%2Fsamplesheet-utils/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nbtm-sh%2Fsamplesheet-utils/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/nbtm-sh","download_url":"https://codeload.github.com/nbtm-sh/samplesheet-utils/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nbtm-sh%2Fsamplesheet-utils/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28906985,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-30T06:42:00.998Z","status":"ssl_error","status_checked_at":"2026-01-30T06:41:58.659Z","response_time":66,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bioinformatics","python","samplesheet","tool","utils"],"created_at":"2026-01-30T06:48:49.130Z","updated_at":"2026-01-30T06:48:50.075Z","avatar_url":"https://github.com/nbtm-sh.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# samplesheet-utils\n![Testing](https://github.com/Australian-Structural-Biology-Computing/create-samplesheet/actions/workflows/python-app.yml/badge.svg)\n\n`samplesheet-utils` (or `samplesheetutils`) is a collection of scripts and utilities for working with samplesheets and FASTA files at the command line. It is primarily designed for use within pipelines.\n\n## Installation\n### pip\n```bash\npip3 install samplesheetutils\n```\n### git\n```bash\ngit clone https://github.com/nbtm-sh/create-samplesheet\ncd create-samplesheet\npip3 install .\n```\n\n## Commands\n### sample-name\nThis command is used to read the sample name(s) from a FASTA file. This is useful for dynamically creating directories based on the actual sample name.\n```bash\nsample-name [ARGS] [FASTA(s)]\n```\n- `-i --index`: Index of the sample you wish to read the name from. This can be an integer, -1 for the last sample, or a range `(1:5)`\n- `--sanitize --sanitise`: Replaces any problematic characters in the sample name(s) with an underscore\n- `-d --delim`: Change the delimiter between each sample name. By default this is a new-line character\n\n### create-samplesheet\nThis command is used to create a samplesheet from different inputs, including string, and directories containing FASTA files\n```bash\ncreate-samplesheet [ARGS]\n```\n- `-a --aa-string`: Input a single amino acid sequence\n- `-d --directory`: Input a directory containing FASTA files\n- `-o --output-file`: Samplesheet filename. Default is `samplesheet.[ext]` [ext] depends on mode\n- `-j --json`: Ouptut JSON formatted samplesheet\n- `-y --yaml`: Output YAML formatted samplesheet\n- `-m --msa-dir`: Directory to search for corresponding MSA files in (Only accessible in yaml output)\n- `--yaml-rfaa`: Output YAML formatted sequence files with a samplesheet.csv file\n\n#### `--msa-dir`\nWhen using the YAML output mode (`-y`, `--yaml`), you can provide a path to a directory containg sample's pre-computed multiple sequence alignment files (`.a3m` files). In order for these files to automatically be associated with it's corresponding sample, the filenames must follow the following format:\n\n```\n[SAMPLE NAME].a3m\n```\n\n**Example Usage:**\n```bash\ncreate-samplesheet --directory /home/nathan/experiment/fastas --msa-dir /home/nathan/experiment/fastas/msas --yaml\n```\n\n**Directory Structure**\n```\n/home/nathan/experiment/fastas\n├── A1.fasta\n├── A2.fasta\n└── msas\n    ├── A1.a3m\n    └── A2.a3m\n```\n\u003e **_NOTE:_** Assume that each FASTA file contains a sample with the same name as the file itself. `create-samplesheet` will search for a3m files based on the **sample name** in the FASTA file, not the FASTA filename itself.\n\n### truncate-msa\nThis command can be used to edit an a3m file to target a specific region of the alignment for special use cases. The module will preserve the first sample of the file, and truncate the remaining entries. \n```bash\ntruncate-msa [ARGS] [INPUT_FILE] [REGION_START] [REGION_END]\n```\n- `-i --in-place` Modify the input file directly, instead of making a new file.\n- `-r --inverse` Invert the output, removing the target region instead.\n- `--version` Output version information.\n- `--debug` Display debug information.\n\n### TODO\n- [ ] Finish documentation\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnbtm-sh%2Fsamplesheet-utils","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fnbtm-sh%2Fsamplesheet-utils","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnbtm-sh%2Fsamplesheet-utils/lists"}