{"id":16618733,"url":"https://github.com/stuntspt/4pipe4","last_synced_at":"2025-10-29T19:31:46.675Z","repository":{"id":2342823,"uuid":"3305207","full_name":"StuntsPT/4Pipe4","owner":"StuntsPT","description":"A NGS data analysis pipeline with emphasis on SNP detection","archived":false,"fork":false,"pushed_at":"2016-11-22T11:25:22.000Z","size":9741,"stargazers_count":11,"open_issues_count":0,"forks_count":4,"subscribers_count":5,"default_branch":"master","last_synced_at":"2025-02-02T04:41:12.479Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"http://cobig2.com","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/StuntsPT.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2012-01-30T13:51:20.000Z","updated_at":"2023-03-08T18:36:07.000Z","dependencies_parsed_at":"2022-08-26T12:51:53.340Z","dependency_job_id":null,"html_url":"https://github.com/StuntsPT/4Pipe4","commit_stats":null,"previous_names":[],"tags_count":12,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/StuntsPT%2F4Pipe4","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/StuntsPT%2F4Pipe4/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/StuntsPT%2F4Pipe4/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/StuntsPT%2F4Pipe4/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/StuntsPT","download_url":"https://codeload.github.com/StuntsPT/4Pipe4/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":238882492,"owners_count":19546529,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-10-12T02:21:20.999Z","updated_at":"2025-10-29T19:31:45.679Z","avatar_url":"https://github.com/StuntsPT.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"### INTRODUCTION\n\nFirst of all, thank for downloading CoBiG² 4Pipe4 analysis pipeline. We hope you\nfind it useful. 4Pipe4 allows you to automate NGS data analysis process into a\nsimple script run. It is designed to be as simple to use as possible.\n4Pipe4 can now be used with illumina data, however, keep in mind that 4Pipe4 was\ndesigned for 454 data, and although everything should run using illumina data,\nit's performance was not nearly was tested as the performance with 454 data.\nTherefore, the software is usable, but YMMV with the default values for illumina\ndata. In fact, any feedback with this sort of data is **very** appreciated.\n\n### INSTALLING\n\n4Pipe4 currently has no installing system. It is simply a set of scripts that\nyou can run from anywhere you want. Although it is recommended\n(for simplicity's sake) that you either copy them into somewhere in your $PATH\nor add 4Pipe4's directory to your $PATH.\n\n```\nPATH=$PATH:/path/to/4Pipe4/\n```\n\nPlease check the README.md section on the helper scripts for a semi-automatic\nway to install all of 4Pipe4's dependencies.\n\n### FILES\n\n4Pipe4 contains the following files (in alphabetical order):\n\n* 4Pipe4.py - Main file: this is the script you want to run;\n* 4Pipe4rc - Configuration file;\n* BAM_to_TCS.py - Module to convert .bam files into the TCS format.\n* LICENSE - License file;\n* Metrics.py - Module for generating dataset metrics;\n* ORFmaker.py - Module for finding ORFs;\n* pipeutils.py - Module with code common to several other modules.\n* README.md - This file;\n* Reporter.py - Module for generating putative SNP reports;\n* SAM_to_BAM.py - Module for converting the .sam files into .bam files.\n* sff_extractor.py - Module for extracting fasta and fasta.qual from sff files. Originally developed by José Blanca, but forked and ported to python 3 since it was removed from the original website.\n* SNPgrabber.py - Module for organizing SNP information;\n* SSRfinder.py - Module for finding SSRs;\n* Templates/Report.html - Template for report \"front page\";\n* Testdata/4Pipe4_test.sff - Test data file;\n* Testdata/README.md - Documentation on the test data;\n\nAs time progresses and 4Pipe4 sees new development, this list will be updated.\n\n### REQUIREMENTS\n\n4Pipe4 is written in python 3. Therefore an installation of python 3 is required\nto run 4Pipe4. If you are using linux you can get python 3 from you\ndistribution's package manager (sudo apt-get install python3 for Ubuntu) or get\nit from the website (http://python.org/download/). Also required are the python3\nheader files (the package name in Ubuntu is python3-dev).\nDue to the latest release of pysam, python \u003e= 3.4 is now required.\nNot strictly required, but highly recommended to for best results are the\nexternal programs that 4Pipe4 uses in it's processes. By default, these are:\n\n* seqclean (http://compbio.dfci.harvard.edu/tgi/software/)\n* mira 4.x series (http://mira-assembler.sourceforge.net/)\n* getorf (http://emboss.sourceforge.net/apps/cvs/emboss/apps/getorf.html)\n* blast (ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/)\n* blast2go4pipe (http://www.blast2go.com/data/blast2go/b2g4pipe_v2.5.zip)  - *Seems to have been discontinued*\n* etandem (http://helixweb.nih.gov/emboss/html/etandem.html)\n* 7zip (http://www.7-zip.org/)\n* pysam (https://github.com/pysam-developers/pysam)\n\nThese programs are mentioned as \"optinal\" since you can have for example an\nalready assembled dataset and just want to run the SNP detection routines,\nstarting the pipeline from step #4. This would not require sff_extract, seqclean\nnor mira to be installed.\nAll of these programs aer required if you wish to run all the steps in 4Pipe4.\nBeware that 4Pipe4 currently relies on the 4.x version of mira, and is not\nbackward compatible with 3.x versions. If you wish to use a 3.x version of\nmira for your assembly you can use an older version of 4Pipe4 (v1.1.3 and\nbelow).\n\nYou should also have a local database of NCBI's\nUnivec (http://www.ncbi.nlm.nih.gov/VecScreen/UniVec.html)\nand nr (ftp://ftp.ncbi.nlm.nih.gov/blast/db/) or equivalent.\nOnce again, if you are using linux remember that some of these programs are\nlikely in your distribution's repositories (such as 7zip or blast).\n\n### HELPER SCRIPTS\n\nInside the directory \"helper-scripts\" you will find 4 shell scripts:\n\n* user-installer.sh\n* emboss-user-installer.sh\n* database-downloader.sh\n* rc-generator.sh\n\nIf they are run in the order they are shown here, they will:\n\n1. Download and locally install the programs: \"sff_extract\", \"seqclean\", \"mira\",\n\"blast\", \"p7zip\", \"pysam\", \"cython\" and \"setuptools\".\n2. Download, compile and locally install emboss' \"getorf\" and \"etandem\". (This\nscript requires build tools such as \"make\" and \"gcc\". They should be readily\navailable on any \\*nix machine you have access to but don't have root access.)\n3. Download local copies of NCBI's \"Univec\" and \"nr\" databases.\n4. Generate pre-configured entries for all of the above ready to be copied \u0026\npasted into 4Pipe4rc.\n\nThese scripts should significantly speed up the installation process of these\nexternal 4Pipe4 programs.\n\nBy default these scripts will install all the software to \"~/Software\", but this\ncan be easily changed in the scripts themselves.\n\n### USAGE\n\nUsing 4Pipe4 should be relatively simple. Simply calling \"4Pipe4.py -h\" or\n\"4Pipe4.py --help\" should printthe following help message:\n\n--------------------------------------------\n\n```\n\nusage: 4Pipe4 [-h] -i sff_file -o basefile [-c configfile] [-s [RUN_LIST]]\n\noptional arguments:\n  -h, --help     show this help message and exit\n  -i input_file    Provide the full path to your target input file\n  -o basefile    Provide the full path to your results directory, plus the name you want to give your results\n  -c configfile  Provide the full path to your configuration file. If none is provided, the program will look in the current working directory and  then in ~/.config/4Pipe4rc (in this order) for one. If none is found the  program will stop\n  -s [RUN_LIST]  Specify the numbers corresponding to the pipeline steps that will be run. The string after -s must be given inside quotation marks, and numbers can be joined together or separated by any symbol. The numbers are the pipeline steps that should be run. This is an optional argument and it's omission will run all steps by default'. The numbers, from 1 to 9 represent the following steps:\n                        1 - SFF extraction\n                        2 - SeqClean\n                        3 - Mira\n                        4 - DiscoveryTCS\n                        5 - SNP grabber\n                        6 - ORF finder\n                        7 - Blast2go\n                        8 - SSR finder\n                        9 - 7zip the report\n\n  -d 454/solexa    Declare the type of data being used. Currentlly suported are 454 (454) and Illumina (solexa). Default is 454.\n  -p [True/False]  Is the data paired end? True/False, default is                     False.\n\nThe idea here is that to resume an analysis that was interrupted for example after the assembling process you should issue -s '4,5,6,7,8,9' or -s '456789'. Note that some steps depend on the output of previous steps, so using some combinations can cause errors. The arguments can be given in any order.\n```\n\n--------------------------------------------\n\nIf you wish to run the entire pipeline on 454 data, just issue something like\n\n```\npython3 4Pipe4.py -i /path/to/file.sff -o /path/to/results/basefilename\n```\n\nHowever, if you wish to run the pipeline with Illumina data, skip steps 1 and 2,\nand add the \"-d solexa\" switch:\n\n```\npython3 4Pipe4.py -i /path/to/reads.fastq -o /path/to/results/basefilename\\\n-d solexa -s 3,4,5,6,7,8,9\n```\n\nUse the -s option to specify only the steps you wish to run from the analysis\nand the -c option to point 4Pipe4 to a specific configuration file.\n\nIn the directory \"Testdata\" you will find an example sff file for testing\npurposes, as well as documentation on how to do an example run of 4Pipe4.\n\n### CONFIGURATION\n\nThe configuration file contains information on every option. You should change\nthose options to reflect your own system and SNP detection preferences.\nDo not forget that the helper scripts will generate most of the config file\nfor you if you wish.\n\n### CONTACT\n\nIf you have questions or feedback you can contact the author by email:\nf.pinamartins@gmail.com\nFor bug reporting, you can use github tracker:\nhttps://github.com/StuntsPT/4Pipe4/issues\nFor other programs, also please be sure to check out our group's website:\nhttp://cobig2.com\n\n### CITING\n\n[Francisco Pina-Martins, Bruno M. Vieira, Sofia G. Seabra, Dora Batista and Octávio S. Paulo 2016. 4Pipe4 – A 454 data analysis pipeline for SNP detection in datasets with no reference sequence or strain information. BMC Bioinformatics, 17:46.](http://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-016-0892-1)\n\nOld citation method:\nZenodo  - [![DOI](https://zenodo.org/badge/3305207.svg)](https://zenodo.org/badge/latestdoi/3305207)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fstuntspt%2F4pipe4","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fstuntspt%2F4pipe4","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fstuntspt%2F4pipe4/lists"}