Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/robaina/parallelbam

Tools to parallelize operations on large BAM files
https://github.com/robaina/parallelbam

Last synced: 10 days ago
JSON representation

Tools to parallelize operations on large BAM files

Awesome Lists containing this project

README

        

{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Parallelizing operations on SAM/BAM files\n",
"\n",
"SAM/BAM files are typically large, thus, operations on these files are time intensive. This project provides tools to parallelize operations on SAM/BAM files. The workflow follows:\n",
"\n",
"1. Split BAM/SAM file in _n_ chunks\n",
"2. Perform operation in each chunk in a dedicated process and save resulting SAM/BAM chunk \n",
"3. Merge results back into a single SAM/BAM file\n",
"\n",
"Depends on:\n",
"\n",
"1. Samtools\n",
"\n",
"# Installation\n",
"\n",
"1. Git clone project\n",
"2. cd to cloned project directory\n",
"3. ```sudo python setup.py install```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Usage\n",
"\n",
"There is one main function named ```parallelizedBAMoperation```. This function takes as mandatory arguments:\n",
"\n",
"1. path to original bam file (should be ordered)\n",
"2. a callable function to perform the operation on each bam file chunk\n",
"\n",
"The callable function must accept the following two first arguments: (i) path to input bam file and (ii) path to resulting output bam file, in this order.\n",
"\n",
"# Note\n",
"\n",
"Preparing a bam file to run an operation in parallel takes a while, thus is not worth it when the operatin itself takes a short time. For example, preparing a typical bam file for parallelization (in 8 processes) can take almost a minute."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import shutil\n",
"from parallelbam.parallelbam import parallelizeBAMoperation, getNumberOfReads"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"def foo(input_bam, output_bam):\n",
" shutil.copyfile(input_bam, output_bam)\n",
" \n",
" \n",
"parallelizeBAMoperation('parallelbam2/tests/toy_sample.bam',\n",
" foo, output_path=None,\n",
" n_processes=4)"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"1000"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"getNumberOfReads('parallelbam2/tests/toy_sample.bam')"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"1000"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"getNumberOfReads('parallelbam2/tests/processed.bam')"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3.10.6 ('parallelbam')",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.6"
},
"vscode": {
"interpreter": {
"hash": "d7acfd6951bc0fa0485d92016d3c99713d079b52e001e9aed733ba0bc441773f"
}
}
},
"nbformat": 4,
"nbformat_minor": 5
}