{"id":35262463,"url":"https://github.com/yodeng/fsplit","last_synced_at":"2026-04-06T06:33:12.365Z","repository":{"id":43491043,"uuid":"491797792","full_name":"yodeng/fsplit","owner":"yodeng","description":"for split fastq from mix bcl or fastq data by barcode/index","archived":false,"fork":false,"pushed_at":"2024-03-20T05:50:03.000Z","size":15813,"stargazers_count":5,"open_issues_count":1,"forks_count":1,"subscribers_count":1,"default_branch":"master","last_synced_at":"2024-03-20T06:39:55.971Z","etag":null,"topics":["barcode","bcl2fastq","split"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/yodeng.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2022-05-13T07:21:55.000Z","updated_at":"2024-03-13T03:21:23.000Z","dependencies_parsed_at":"2024-01-02T03:30:49.987Z","dependency_job_id":"9c4d05dc-bf44-4dba-82c0-f600bbffeccb","html_url":"https://github.com/yodeng/fsplit","commit_stats":null,"previous_names":[],"tags_count":2,"template":false,"template_full_name":null,"purl":"pkg:github/yodeng/fsplit","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yodeng%2Ffsplit","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yodeng%2Ffsplit/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yodeng%2Ffsplit/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yodeng%2Ffsplit/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/yodeng","download_url":"https://codeload.github.com/yodeng/fsplit/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yodeng%2Ffsplit/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31463014,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-05T21:22:52.476Z","status":"online","status_checked_at":"2026-04-06T02:00:07.287Z","response_time":112,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["barcode","bcl2fastq","split"],"created_at":"2025-12-30T09:23:25.310Z","updated_at":"2026-04-06T06:33:12.360Z","avatar_url":"https://github.com/yodeng.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# fsplit\n\n`fsplit`是用于根据`barcode`信息从`BCL`或`fastq`混合数据中拆分样本数据的软件。\n\n\n\n## 软件环境\n\n+ python \u003e=2.7.10, \u003c=3.11\n+ bcl2fastq\n+ linux\n\n\n\n## 安装\n\n```\npip install git+https://github.com/yodeng/fsplit.git\n```\n\n\n\n## 用法\n\n`fsplit`可用于`fastq`或`bcl`数据拆分。\n\n\n\n### fastq数据拆分\n\n对于含有barcode的fastq混合数据，可根据barcode信息将其拆分为样本信息数据。\n\n需提前为fastq数据创建索引文件，加快程序运行速度。\n\n#### 1) fsplit index\n\n`1.0.3`之前版本采用索引多进程方式实现，`1.0.4`及以后版本不在需要索引。\n\nfastq文件创建fai索引文件，输出`test.fastq.fai`文件，自动识别`gzip`压缩格式。\n\n```\nfsplit index -i test.fastq.gz\n```\n\n也可以使用`samtools`建立索引，`fsplit`兼容`samtools fqidx`的索引输出格式.\n\n\n\n#### 2) fsplit split\n\n根据barcode序列，从fastq文件中拆分属于各样本的fastq数据。若fastq索引文件不存在，会先创建索引文件，然后运行split程序。\n\n`1.0.4`及以后版本不在需要索引，直接读取fastq并处理。\n\n`fsplit split --help`查看帮助：\n\n| 参数          | 描述                                                         |\n| ------------- | ------------------------------------------------------------ |\n| -i/--input    | 输入的fastq文件                                              |\n| -I/--Input    | 输入的paired fastq文件, read2                                |\n| -b/--barcode  | barcode信息文件，两列或三列，第一列为样本名，第二列为barcode1序列，第三列为barcode2序列 |\n| -m/--mismatch | barcode拆分时运行的错配碱基数，默认0，不允许错配             |\n| -o/--output   | 结果输出目录，不存在会自动创建                               |\n| -d/--drup     | 输出结果中是否去除barcode序列，默认不去除                    |\n| -rc1/--rc-bc1 | 对barcode1进行反向互补查找                                   |\n| -rc2/--rc-bc2 | 对barcode2进行反向互补查找                                   |\n| --output-gzip | 输出gzip压缩的fastq文件，使用`python zlib`接口，会减慢运行速度。 |\n\n\n\n### BCL数据拆分\n\n支持BCL原始芯片测序数据的拆分，封装bcl2fastq软件，根据barcode信息拆分为各自样本的fastq数据，兼容单端或双端index拆分。\n\n\n\n#### 参数说明\n\n使用`fsplit bcl2fq`命令，拆分`bcl`数据，相关参数如下：\n\n| 参数             | 描述                                                         |\n| ---------------- | ------------------------------------------------------------ |\n| -i/--input       | 输入的BCL数据flowcell目录                                    |\n| -s/--sample      | sample sheet信息文件，两列或三列，空白隔开，第一列为样本名，第二列为indel1(i7)序列，第三列为index2(i5)序列 |\n| -m/--mismatch    | barcode拆分时运行的错配碱基数，默认1，允许1个碱基错配        |\n| -t/--threads     | 运行使用的cpu核数                                            |\n| -o/--output      | 结果输出目录，不存在会自动创建                               |\n| -rc1/--rc-index1 | 将index1(i7)序列反向互补                                     |\n| -rc2/--rc-index2 | 将index2(i5)序列反向互补                                     |\n| --bcl2fq         | 指定bcl2fastq软件路径，不指定会自动从$PATH或sys.prefix中查找 |\n\n\n\n## 版本更新记录\n\n#### version 1.0.0\n\n+ 设计多进程并发读取和运行方式\n+ 仅支持fastq数据拆分\n+ 需建立fastq索引\n\n\n\n#### version 1.0.1\n\n+ 添加运行时间记录\n+ 优化进程共享队列，批量处理输出\n\n\n\n#### version 1.0.2\n\n+ 新增BCL数据单端index拆分功能\n+ fastq读取索引优化\n\n\n\n#### version 1.0.3\n\n+ 新增BCL双端index拆分功能\n+ 新增屏幕输出logging日志记录\n+ 优化fastq index步骤，采用稀疏索引，减小索引文件大小，加快读取速度\n+ 采用互斥锁取代进程共享队列\n\n\n\n#### version 1.0.4\n\n+ 单线程读取，子进程解压，处理后序列直接写入文件，取消建立索引步骤，取消多进程处理，取消文件互斥锁\n+ `split`步骤同时添加`golang`实现[gsplit](src/gsplit.go).\n\n\n\n#### version 1.0.5\n\n+ 新增bcl2fq子命令封装bcl2fastq软件，用于bcl数据拆分\n\n#### version 1.0.6\n\n+ 新增split子命令对双端paired fastq拆分支持\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fyodeng%2Ffsplit","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fyodeng%2Ffsplit","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fyodeng%2Ffsplit/lists"}