https://github.com/rraadd88/beditor

A Computational Workflow for Designing Libraries of sgRNAs for CRISPR-Mediated Base Editing, and much more
https://github.com/rraadd88/beditor
base-editing crispr genome-wide-targeted-mutagenesis guide-rna-library
Last synced: about 1 month ago
JSON representation
A Computational Workflow for Designing Libraries of sgRNAs for CRISPR-Mediated Base Editing, and much more
Host: GitHub
URL: https://github.com/rraadd88/beditor
Owner: rraadd88
License: gpl-3.0
Created: 2018-07-27T20:50:04.000Z (almost 7 years ago)
Default Branch: master
Last Pushed: 2024-05-26T18:34:04.000Z (about 1 year ago)
Last Synced: 2025-04-15T09:53:05.140Z (about 2 months ago)
Topics: base-editing, crispr, genome-wide-targeted-mutagenesis, guide-rna-library
Language: Python
Homepage:
Size: 1.87 MB
Stars: 19
Watchers: 5
Forks: 4
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project

awesome-CRISPR - beditor - [python] - A Computational Workflow for Designing Libraries of sgRNAs for CRISPR-Mediated Base Editing. (Guide design)
README

        ``beditor``([v2](#v2))

====================

A Computational Workflow for Designing Libraries of sgRNAs for CRISPR-Mediated Base Editing, and much more

[![build](https://img.shields.io/github/actions/workflow/status/rraadd88/beditor/build.yml?style=for-the-badge)](https://github.com/rraadd88/beditor/actions/workflows/build.yml)

[![Issues](https://img.shields.io/github/issues/rraadd88/beditor.svg?style=for-the-badge)](https://github.com/rraadd88/beditor/issues)

[![Downloads](https://img.shields.io/pypi/dm/beditor?style=for-the-badge)](https://pepy.tech/project/beditor)

[![GNU License](https://img.shields.io/github/license/rraadd88/beditor.svg?style=for-the-badge)](https://github.com/rraadd88/beditor/blob/master/LICENSE)

# Usage

## 🖱️ GUI-mode

```

beditor gui

```

[![](./examples/gui.jpg)](#usage)

Note: GUI is recommended for designing small libraries and prioritization of the guides.

## ▶️ CLI-mode

```

beditor cli --editor BE1 -m path/to/mutations.tsv -o path/to/output_directory/ --species human --ensembl-release 110

or

beditor cli -c beditor_config.yml

```

  Parameters

    usage: beditor cli  [--editor EDITOR] [-m MUTATIONS_PATH] [-o OUTPUT_DIR_PATH]

                        [--species SPECIES] [--ensembl-release ENSEMBL_RELEASE]

                        [--genome-path GENOME_PATH] [--gtf-path GTF_PATH] [-r RNA_PATH] [-p PRT_PATH]

                        [-c CONFIG_PATH]

                        [--search-window SEARCH_WINDOW] [-n]

                        [-w WD_PATH] [-t THREADS] [-k KERNEL_NAME] [-v VERBOSE] [-i IGV_PATH_PREFIX] [--ext EXT] [-f] [-d] [--skip SKIP]

    

    optional arguments:

      -h, --help            show this help message and exit

      --editor EDITOR       base-editing method, available methods can be listed using command: 'beditor resources'

      -m MUTATIONS_PATH, --mutations-path MUTATIONS_PATH

                            path to the mutation file, the format of which is available at https://github.com/rraadd88/beditor/README.md#Input-format.

      -o OUTPUT_DIR_PATH, --output-dir-path OUTPUT_DIR_PATH

                            path to the directory where the outputs should be saved.

      --species SPECIES     species name.

      --ensembl-release ENSEMBL_RELEASE

                            ensemble release number.

      --genome-path GENOME_PATH

                            path to the genome file, which is not available on Ensembl.

      --gtf-path GTF_PATH   path to the gene annotations file, which is not available on Ensembl.

      -r RNA_PATH, --rna-path RNA_PATH

                            path to the transcript sequences file, which is not available on Ensembl.

      -p PRT_PATH, --prt-path PRT_PATH

                            path to the protein sequences file, which is not available on Ensembl.

      --search-window SEARCH_WINDOW

                            number of bases to search on either side of a target, if not specified, it is inferred by beditor.

      -n, --not-be          False

                            do not process as a base editor.

      -c CONFIG_PATH, --config-path CONFIG_PATH

                            path to the configuration file.

      -w WD_PATH, --wd-path WD_PATH

                            path to the working directory.

      -t THREADS, --threads THREADS

                            1

                            number of threads for parallel processing.

      -k KERNEL_NAME, --kernel-name KERNEL_NAME

                            'beditor'

                            name of the jupyter kernel.

      -v VERBOSE, --verbose VERBOSE

                            'WARNING'

                            verbose, logging levels: DEBUG > INFO > WARNING > ERROR (default) > CRITICAL.

      -i IGV_PATH_PREFIX, --igv-path-prefix IGV_PATH_PREFIX

                            prefix to be added to the IGV URL.

      --ext EXT             file extensions of the output tables.

      -f, --force           False

      -d, --dbug            False

      --skip SKIP           skip sections of the workflow

        

    Examples:

        

        

    Notes:

        Required parameters for assigning a species:

            species

            ensembl_release

            or

            genome_path

            gtf_path

            rna_path

            prt_path

# Installation

    

## Virtual environment and namming kernel (recommended)

```

conda env create -n beditor python=3.9;           # options: conda/mamba, python=3.9/3.8

python -m ipykernel install --user --name beditor

```

## Installation of the package

```

pip install beditor[all]                           

```

## Optional dependencies, as required:

```

pip install beditor                                # only cli

pip install beditor[gui]                           # plus gui

```

For fast processing of large genomes (highly recommended for human genome):

```

conda install install bioconda::ucsc-fatotwobit bioconda::ucsc-twobittofa bioconda::ucsc-twobitinfo # options: conda/mamba

```

Else, for moderately fast processing,

```

conda install install bioconda::bedtools           # options: conda/mamba

```

# Input format 

Note: The coordinates are 1-based (i.e. `X:1-1` instead of `X:0:1`) and IDs correspond to the chosen genome assemblies (e.g. from Ensembl).

Point mutations

```

chrom start  end strand mutation

    5  1123 1123 +      C

```

Position scanning

```

chrom start  end strand

    5  1123 1123 +     

```

Region scanning

```

chrom start  end strand

    5  1123 2123 +     

```

Protein point mutations

```

protein id aa pos mutation

  ENSP1123     43        S    

```

Protein position scanning

```

protein id aa pos

  ENSP1123     43    

```

Protein region scanning

```

protein id aa start aa end

  ENSP1123       43    143

```

Note: Ensembl protein IDs are used.  

# Output format

Note: output contains 0-based coordinates are used.

```

guide sequence          guide locus          offtargets score {columns in the input}

AGCGTTTGGCAAATCAAACAAAA 4:1003215-1003238(+)          0     1 ..

```

# Supported base editing methods

| method 
|-------------|-- 
| A3A-BE3     | C 
| ABE7.10     | A 
| ABE7.10*    | A 
| ABE7.9      | A 
| ABESa       | A 
| BE-PLUS     | C 
| BE1         | C 
| BE2         | C 
| BE3         | C 
| BE4-Gam     | C 
| BE4/BE4max  | C 
| Cas12a-BE   | C 
| eA3A-BE3    | C 
| EE-BE3      | C 
| HF-BE3      | C 
| Sa(KKH)-ABE | A 
| SA(KKH)-BE3 | C 
| SaBE3       | C 
| SaBE4       | C 
| SaBE4-Gam   | C 
| Target-AID  | C 
| Target-AID  | C 
| VQR-ABE     | A 
| VQR-BE3     | C 
| VRER-ABE    | A 
| VRER-BE3    | C 
| xBE3        | C 
| YE1-BE3     | C 
| YE2-BE3     | C 
| YEE-BE3     | C

| nucleotide | nucleotide mutation | window start | window end | guide length | PAM    | PAM position | ----------|---------------------|--------------|------------|--------------|--------|--------------| | T                   | 4            | 8          | 20           | NGG    | down         | | G                   | 4            | 7          | 20           | NGG    | down         | | G                   | 4            | 8          | 20           | NGG    | down         | | G                   | 5            | 8          | 20           | NGG    | down         | | G                   | 6            | 12         | 21           | NNGRRT | down         | | T                   | 4            | 14         | 20           | NGG    | down         | | T                   | 4            | 8          | 20           | NGG    | down         | | T                   | 4            | 8          | 20           | NGG    | down         | | T                   | 4            | 8          | 20           | NGG    | down         | | T                   | 4            | 8          | 20           | NGG    | down         | | T                   | 4            | 8          | 20           | NGG    | down         | | T                   | 10           | 12         | 23           | TTTV   | up           | | T                   | 4            | 8          | 20           | NGG    | down         | | T                   | 5            | 6          | 20           | NGG    | down         | | T                   | 4            | 8          | 20           | NGG    | down         | | G                   | 6            | 12         | 21           | NNNRRT | down         | | T                   | 3            | 12         | 21           | NNNRRT | down         | | T                   | 3            | 12         | 21           | NNGRRT | down         | | T                   | 3            | 12         | 21           | NNGRRT | down         | | T                   | 3            | 12         | 21           | NNGRRT | down         | | T                   | 2            | 4          | 20           | NGG    | down         | | T                   | 2            | 4          | 20           | NG     | down         | | G                   | 4            | 6          | 20           | NGA    | down         | | T                   | 4            | 11         | 20           | NGAN   | down         | | G                   | 4            | 6          | 20           | NGCG   | down         | | T                   | 3            | 10         | 20           | NGCG   | down         | | T                   | 4            | 8          | 20           | NG     | down         | | T                   | 5            | 7          | 20           | NGG    | down         | | T                   | 5            | 6          | 20           | NGG    | down         | | T                   | 5            | 6          | 20           | NGG    | down         |

Favorite base editor not listed?  

Please send the required info using a PR, or an issue.

# Change log

## v2

**New features**:  

1. Design libraries for base or amino acid mutational scanning, at defined positions and regions. 

2. The `gui` contains library filtering and prioritization options.

3. Non-base editing applications, e.g. CRISPR-tiling, using `not_be` option.  

**Key updates**:  

1. Quicker installation due to reduced number of dependencies (`bwa` comes in the package, and `samtools` not needed).

2. Faster run-time, compared to v1, because of the improvements in the dependencies e.g. `pandas` etc.  

3. Faster run-time on large genomes e.g. human genome, because of the use of 2bit tools.  

4. Direct command line options to use non-model species which e.g. not indexed on Ensembl.  

5. Configuration made optional.

**Technical updates**:

1. The `gui` is powered by `mercury`, thus overcomming the limitations of v1.

2. Use of one base editor (`method`) per run, instead of multiple.  

3. Due to overall faster run-times, parallelization within a run is disabled. However, multiple runs can be parallelized, externally e.g. using Python's built-in `multiprocessing`.

5. Only the sgRNAs for which target lies within the optimal activity window are reported. Therefore unneeded penalty for target not being in activity window is now not utilized, but options retained for back-compatibility.  

6. Many refactored functions can now be imported and executed independently for "much more" applications.  

7. Reports generated for each run in the form of a jupyter notebook.

8. Automated testing on GitHub for continuous integration.  

9. The `cli` is compatible with python 3.8 and 3.9 (even higher untested versions), however the `gui` not supported on python 3.7 due lack of dependencies.

# Future directions, for which contributions are welcome:  

- [ ] Adding option to provide 0-based co-ordinates in the input.

# Similar projects:

- http://www.rgenome.net/be-designer/

- http://yang-laboratory.com/BEable-GPS

- https://github.com/maxwshen/be_predict_bystander

- https://github.com/maxwshen/be_predict_efficiency

- https://fgcz-shiny.uzh.ch/PnBDesigner/

# How to cite?  

## v2

1. Using BibTeX:   

```

@software{Dandage_beditor,

  title   = {beditor: A Computational Workflow for Designing Libraries of sgRNAs for CRISPR-Mediated Base Editing},

  author  = {Dandage, Rohan},

  year    = {2024},

  url     = {https://doi.org/10.5281/zenodo.10648264},

  version = {v2.0.1},

  note    = {The URL is a DOI link to the permanent archive of the software.},

}

```

2. DOI link: [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.10648264.svg)](https://doi.org/10.5281/zenodo.10648264), or  

3. Using citation information from [CITATION.CFF file](https://github.com/rraadd88/beditor/blob/main/CITATION.cff).

v1

1. Using BibTeX:

```

@software{Dandage_beditorv1,

  title   = {beditor: A Computational Workflow for Designing Libraries of sgRNAs for CRISPR-Mediated Base Editing},

  author  = {Dandage, Rohan},

  year    = {2019},

  url     = {https://doi.org/10.1534/genetics.119.302089},

  version = {v1},

}

```

  

# Future directions, for which contributions are welcome:  

- [ ] Allowing 0-based coordinates in the input.

# Similar projects:

- http://www.rgenome.net/be-designer/

- http://yang-laboratory.com/BEable-GPS

- https://github.com/maxwshen/be_predict_bystander

- https://github.com/maxwshen/be_predict_efficiency

- https://fgcz-shiny.uzh.ch/PnBDesigner/

# API



## module `beditor.lib.get_mutations`

Mutation co-ordinates using pyensembl 

---



### function `get_protein_cds_coords`

```python

get_protein_cds_coords(annots, protein_id: str) → DataFrame

```

Get protein CDS coordinates 

**Args:**

 

 - `annots`:  pyensembl annotations 

 - `protein_id` (str):  protein ID 

**Returns:**

 

 - `pd.DataFrame`:  output table 

---



### function `get_protein_mutation_coords`

```python

get_protein_mutation_coords(data: DataFrame, aapos: int, test=False) → tuple

```

Get protein mutation coordinates 

**Args:**

 

 - `data` (pd.DataFrame):  input table 

 - `aapos` (int):  amino acid position 

 - `test` (bool, optional):  test-mode. Defaults to False. 

**Raises:**

 

 - `ValueError`:  invalid positions 

**Returns:**

 

 - `tuple`:  aapos,start,end,seq 

---



### function `map_coords`

```python

map_coords(df_: DataFrame, df1_: DataFrame, verbose: bool = False) → DataFrame

```

Map coordinates 

**Args:**

 

 - `df_` (pd.DataFrame):  input table 

**Returns:**

 

 - `pd.DataFrame`:  output table 

---



### function `get_mutation_coords_protein`

```python

get_mutation_coords_protein(

    df0: DataFrame,

    annots,

    search_window: int,

    outd: str = None,

    force: bool = False,

    verbose: bool = False

) → DataFrame

```

Get mutation coordinates for protein 

**Args:**

 

 - `df0` (pd.DataFrame):  input table 

 - `annots` (_type_):  pyensembl annotations 

 - `search_window` (int):  search window length on either side of the target 

 - `outd` (str, optional):  output directory path. Defaults to None. 

 - `force` (bool, optional):  force. Defaults to False. 

 - `verbose` (bool, optional):  verbose. Defaults to False. 

**Returns:**

 

 - `pd.DataFrame`:  output table 

---



### function `get_mutation_coords`

```python

get_mutation_coords(

    df0: DataFrame,

    annots,

    search_window: int,

    verbose: bool = False,

    **kws_protein

) → DataFrame

```

Get mutation coordinates 

**Args:**

 

 - `df0` (pd.DataFrame):  input table 

 - `annots` (_type_):  pyensembl annotation 

 - `search_window` (int):  search window length on either side of the target 

 - `verbose` (bool, optional):  verbose. Defaults to False. 

**Returns:**

 

 - `pd.DataFrame`:  output table 



## module `beditor.lib.get_scores`

Scores 

---



### function `get_ppamdist`

```python

get_ppamdist(

    guide_length: int,

    pam_len: int,

    pam_pos: str,

    ppamdist_min: int

) → DataFrame

```

Get penalties set based on distances of the mismatch/es from PAM 

:param guide_length: length of guide sequence :param pam_len: length of PAM sequence :param pam_pos: PAM location 3' or 5' :param ppamdist_min: minimum penalty :param pmutatpam: penalty for mismatch at PAM 

TODOs:  Use different scoring function for different methods. 

---



### function `get_beditorscore_per_alignment`

```python

get_beditorscore_per_alignment(

    NM: int,

    alignment: str,

    pam_len: int,

    pam_pos: str,

    pentalty_genic: float = 0.5,

    pentalty_intergenic: float = 0.9,

    pentalty_dist_from_pam: float = 0.1,

    verbose: bool = False

) → float

```

Calculates beditor score per alignment between guide and genomic DNA. 

:param NM: Hamming distance :param mismatches_max: Maximum mismatches allowed in alignment :param alignment: Symbol '|' means a match, '.' means mismatch and ' ' means gap. e.g. |||||.||||||||||.||||.| :param pentalty_genic: penalty for genic alignment :param pentalty_intergenic: penalty for intergenic alignment :param pentalty_dist_from_pam: maximum pentalty for a mismatch at PAM () :returns: beditor score per alignment. 

---



### function `get_beditorscore_per_guide`

```python

get_beditorscore_per_guide(

    guide_seq: str,

    strategy: str,

    align_seqs_scores: DataFrame,

    dBEs: DataFrame,

    penalty_activity_window: float = 0.5,

    test: bool = False

) → float

```

Calculates beditor score per guide. 

:param guide_seq: guide seqeunce 23nts :param strategy: strategy string eg. ABE;+;@-14;ACT:GCT;T:A; :param align_seqs_scores: list of beditor scores per alignments for all the alignments between guide and genomic DNA :param penalty_activity_window: if editable base is not in activity window, penalty_activity_window=0.5 :returns: beditor score per guide. 

---



### function `revcom`

```python

revcom(s)

```

---



### function `calc_cfd`

```python

calc_cfd(wt, sg, pam)

```

---



### function `get_cfdscore`

```python

get_cfdscore(wt, off)

```



## module `beditor.lib.get_specificity`

Specificities 

---



### function `run_alignment`

```python

run_alignment(

    src_path: str,

    genomep: str,

    guidesfap: str,

    guidessamp: str,

    guidel: int,

    mismatches_max: int = 2,

    threads: int = 1,

    force: bool = False,

    verbose: bool = False

) → str

```

Run alignment 

**Args:**

 

 - `src_path` (str):  source path 

 - `genomep` (str):  genome path 

 - `guidesfap` (str):  guide fasta path 

 - `guidessamp` (str):  guide sam path 

 - `threads` (int, optional):  threads. Defaults to 1. 

 - `force` (bool, optional):  force. Defaults to False. 

 - `verbose` (bool, optional):  verbose. Defaults to False. 

**Returns:**

 

 - `str`:  alignment file. 

---



### function `read_sam`

```python

read_sam(align_path: str) → DataFrame

```

read alignment file 

**Args:**

 

 - `align_path` (str):  path to the alignment file 

**Returns:**

 

 - `pd.DataFrame`:  output table 

**Notes:**

> Tag     Meaning NM      Edit distance MD      Mismatching positions/bases AS      Alignment score BC      Barcode sequence X0      Number of best hits X1      Number of suboptimal hits found by BWA XN      Number of ambiguous bases in the referenece XM      Number of mismatches in the alignment XO      Number of gap opens XG      Number of gap extentions XT      Type: Unique/Repeat/N/Mate-sw XA      Alternative hits; format: (chr,pos,CIGAR,NM;)* XS      Suboptimal alignment score XF      Support from forward/reverse alignment XE      Number of supporting seeds 

>Reference: https://bio-bwa.sourceforge.net/bwa.shtml 

---



### function `parse_XA`

```python

parse_XA(XA: str) → DataFrame

```

Parse XA tags 

**Args:**

 

 - `XA` (str):  XA tag 

**Notes:**

> format: (chr,pos,CIGAR,NM;) 

>

**Example:**

 XA='4,+908051,23M,0;4,+302823,23M,0;4,-183556,23M,0;4,+1274932,23M,0;4,+207765,23M,0;4,+456906,23M,0;4,-1260135,23M,0;4,+454215,23M,0;4,-1177442,23M,0;4,+955254,23M,1;4,+1167921,23M,1;4,-613257,23M,1;4,+857893,23M,1;4,-932678,23M,2;4,-53825,23M,2;4,+306783,23M,2;' 

---



### function `get_extra_alignments`

```python

get_extra_alignments(

    df1: DataFrame,

    genome: str,

    bed_path: str,

    alignments_max: int = 10,

    threads: int = 1

) → DataFrame

```

Get extra alignments 

**Args:**

 

 - `df1` (pd.DataFrame):  input table 

 - `alignments_max` (int, optional):  alignments max. Defaults to 10. 

 - `threads` (int, optional):  threads. Defaults to 1. 

**Returns:**

 

 - `pd.DataFrame`:  output table 

TODOs: 1. apply parallel processing to get_seq 

---



### function `to_pam_coord`

```python

to_pam_coord(

    pam_pos: str,

    pam_len: int,

    align_start: int,

    align_end: int,

    strand: str

) → tuple

```

Get PAM coords 

**Args:**

 

 - `pam_pos` (str):  PAM position 

 - `pam_len` (int):  PAM length 

 - `align_start` (int):  alignment start 

 - `align_end` (int):  alignment end 

 - `strand` (str):  strand 

**Returns:**

 

 - `tuple`:  start,end 

---



### function `get_alignments`

```python

get_alignments(

    align_path: str,

    genome: str,

    alignments_max: int,

    pam_pos: str,

    pam_len: int,

    guide_len: int,

    pam_pattern: str,

    pam_bed_path: str,

    extra_bed_path: str,

    **kws_xa

) → DataFrame

```

Get alignments 

**Args:**

 

 - `align_path` (str):  alignement path 

 - `genome` (str):  genome path 

 - `pam_pos` (str):  PAM position 

 - `pam_len` (int):  PAM length 

 - `guide_len` (int):  sgRNA length 

 - `pam_pattern` (str):  PAM pattern 

 - `pam_bed_path` (str):  PAM bed path 

**Returns:**

 

 - `pd.DataFrame`:  output path 

---



### function `get_penalties`

```python

get_penalties(

    aligns: DataFrame,

    guides: DataFrame,

    annots: DataFrame

) → DataFrame

```

Get penalties 

**Args:**

 

 - `aligns` (pd.DataFrame):  alignements 

 - `guides` (pd.DataFrame):  guides 

 - `annots` (pd.DataFrame):  annotations 

**Returns:**

 

 - `pd.DataFrame`:  output table 

---



### function `score_alignments`

```python

score_alignments(

    df4: DataFrame,

    pam_len: int,

    pam_pos: str,

    pentalty_genic: float = 0.5,

    pentalty_intergenic: float = 0.9,

    pentalty_dist_from_pam: float = 0.1,

    verbose: bool = False

) → tuple

```

score_alignments _summary_ 

**Args:**

 

 - `df4` (pd.DataFrame):  input table 

 - `pam_pos` (str):  PAM position 

 - `pentalty_genic` (float, optional):  penalty for offtarget in genic locus. Defaults to 0.5. 

 - `pentalty_intergenic` (float, optional):  penalty for offtarget in intergenic locus. Defaults to 0.9. 

 - `pentalty_dist_from_pam` (float, optional):  penalty for offtarget wrt distance from PAM. Defaults to 0.1. 

 - `verbose` (bool, optional):  verbose. Defaults to False. 

**Returns:**

 

 - `tuple`:  tables 

**Note:**

> 1. Low value corresponds to high penalty and vice versa, because values are multiplied. 2. High penalty means consequential offtarget alignment and vice versa. 

---



### function `score_guides`

```python

score_guides(

    guides: DataFrame,

    scores: DataFrame,

    not_be: bool = False

) → DataFrame

```

Score guides 

**Args:**

 

 - `guides` (pd.DataFrame):  guides 

 - `scores` (pd.DataFrame):  scores 

 - `not_be` (bool, optional):  not a base editor. Defaults to False. 

**Returns:**

 

 - `pd.DataFrame`:  output table 

Changes: penalty_activity_window disabled as only the sgRNAs with target in the window are reported. 



## module `beditor.lib.io`

Input/Output 

---



### function `download_annots`

```python

download_annots(species_name: str, release: int) → bool

```

Download annotations using pyensembl 

**Args:**

 

 - `species_name` (str):  species name 

 - `release` (int):  release number 

**Returns:**

 

 - `bool`:  whether annotation is downloaded or not 

---



### function `cache_subdirectory`

```python

cache_subdirectory(

    reference_name: str = None,

    annotation_name: str = None,

    annotation_version: int = None,

    CACHE_BASE_SUBDIR: str = 'beditor'

) → str

```

Which cache subdirectory to use for a given annotation database over a particular reference. All arguments can be omitted to just get the base subdirectory for all pyensembl cached datasets. 

**Args:**

 

 - `reference_name` (str, optional):  reference name. Defaults to None. 

 - `annotation_name` (str, optional):  annotation name. Defaults to None. 

 - `annotation_version` (int, optional):  annotation version. Defaults to None. 

 - `CACHE_BASE_SUBDIR` (str, optional):  cache path. Defaults to 'beditor'. 

**Returns:**

 

 - `str`:  output path 

---



### function `cached_path`

```python

cached_path(path_or_url: str, cache_directory_path: str)

```

When downloading remote files, the default behavior is to name local files the same as their remote counterparts. 

---



### function `to_downloaded_cached_path`

```python

to_downloaded_cached_path(

    url: str,

    annots=None,

    reference_name: str = None,

    annotation_name: str = 'ensembl',

    ensembl_release: str = None,

    CACHE_BASE_SUBDIR: str = 'pyensembl'

) → str

```

To downloaded cached path 

**Args:**

 

 - `url` (str):  URL 

 - `annots` (optional):  pyensembl annotation. Defaults to None. 

 - `reference_name` (str, optional):  reference name. Defaults to None. 

 - `annotation_name` (str, optional):  annotation name. Defaults to 'ensembl'. 

 - `ensembl_release` (str, optional):  ensembl release. Defaults to None. 

 - `CACHE_BASE_SUBDIR` (str, optional):  cache path. Defaults to 'pyensembl'. 

**Returns:**

 

 - `str`:  output path 

---



### function `download_genome`

```python

download_genome(

    species: str,

    ensembl_release: int,

    force: bool = False,

    verbose: bool = False

) → str

```

Download genome 

**Args:**

 

 - `species` (str):  species name 

 - `ensembl_release` (int):  release 

 - `force` (bool, optional):  force. Defaults to False. 

 - `verbose` (bool, optional):  verbose. Defaults to False. 

**Returns:**

 

 - `str`:  output path 

---



### function `read_genome`

```python

read_genome(genome_path: str, fast=True)

```

Read genome 

**Args:**

 

 - `genome_path` (str):  genome path 

 - `fast` (bool, optional):  fast mode. Defaults to True. 

---



### function `to_fasta`

```python

to_fasta(

    sequences: dict,

    output_path: str,

    molecule_type: str,

    force: bool = True,

    **kws_SeqRecord

) → str

```

Save fasta file. 

**Args:**

 

 - `sequences` (dict):  dictionary mapping the sequence name to the sequence. 

 - `output_path` (str):  path of the fasta file. 

 - `force` (bool):  overwrite if file exists. 

**Returns:**

 

 - `output_path` (str):  path of the fasta file 

---



### function `to_2bit`

```python

to_2bit(

    genome_path: str,

    src_path: str = None,

    force: bool = False,

    verbose: bool = False

) → str

```

To 2bit 

**Args:**

 

 - `genome_path` (str):  genome path 

 - `src_path` (str, optional):  source path. Defaults to None. 

 - `verbose` (bool, optional):  verbose. Defaults to False. 

**Returns:**

 

 - `str`:  output path 

---



### function `to_fasta_index`

```python

to_fasta_index(

    genome_path: str,

    bgzip: bool = False,

    bgzip_path: str = None,

    threads: int = 1,

    verbose: bool = True,

    force: bool = False,

    indexed: bool = False

) → str

```

To fasta index 

**Args:**

 

 - `genome_path` (str):  genome path 

 - `bgzip_path` (str, optional):  bgzip path. Defaults to None. 

 - `threads` (int, optional):  threads. Defaults to 1. 

 - `verbose` (bool, optional):  verbose. Defaults to True. 

 - `force` (bool, optional):  force. Defaults to False. 

 - `indexed` (bool, optional):  indexed or not. Defaults to False. 

**Returns:**

 

 - `str`:  output path 

---



### function `to_bed`

```python

to_bed(

    df: DataFrame,

    outp: str,

    cols: list = ['chrom', 'start', 'end', 'locus', 'score', 'strand']

) → str

```

To bed path 

**Args:**

 

 - `df` (pd.DataFrame):  input table 

 - `outp` (str):  output path 

 - `cols` (list, optional):  columns. Defaults to ['chrom','start','end','locus','score','strand']. 

**Returns:**

 

 - `str`:  output path 

---



### function `read_bed`

```python

read_bed(

    p: str,

    cols: list = ['chrom', 'start', 'end', 'locus', 'score', 'strand']

) → DataFrame

```

Read bed file 

**Args:**

 

 - `p` (str):  path 

 - `cols` (list, optional):  columns. Defaults to ['chrom','start','end','locus','score','strand']. 

**Returns:**

 

 - `pd.DataFrame`:  output table 

---



### function `to_viz_inputs`

```python

to_viz_inputs(

    gtf_path: str,

    genome_path: str,

    output_dir_path: str,

    output_ext: str = 'tsv',

    threads: int = 1,

    force: bool = False

) → dict

```

To viz inputs for the IGV 

**Args:**

 

 - `gtf_path` (str):  GTF path 

 - `genome_path` (str):  genome path 

 - `output_dir_path` (str):  output directory path 

 - `output_ext` (str, optional):  output extension. Defaults to 'tsv'. 

 - `threads` (int, optional):  threads. Defaults to 1. 

 - `force` (bool, optional):  force. Defaults to False. 

**Returns:**

 

 - `dict`:  configuration 

---



### function `to_igv_path_prefix`

```python

to_igv_path_prefix() → str

```

Get IGV path prefix 

**Returns:**

 

 - `str`:  URL 

---



### function `to_session_path`

```python

to_session_path(p: str, path_prefix: str = None, outp: str = None) → str

```

To session path 

**Args:**

 

 - `p` (str):  session configuration path 

 - `path_prefix` (str, optional):  path prefix. Defaults to None. 

 - `outp` (str, optional):  output path. Defaults to None. 

**Returns:**

 

 - `str`:  output path 

---



### function `read_cytobands`

```python

read_cytobands(

    cytobands_path: str,

    col_chrom: str = 'chromosome',

    remove_prefix: str = 'chr'

) → DataFrame

```

Read cytobands 

**Args:**

 

 - `cytobands_path` (str):  path 

 - `col_chrom` (str, optional):  column with contig. Defaults to 'chromosome'. 

**Returns:**

 

 - `pd.DataFrame`:  output table 

---



### function `to_output`

```python

to_output(inputs: DataFrame, guides: DataFrame, scores: DataFrame) → DataFrame

```

To output table 

**Args:**

 

 - `inputs` (pd.DataFrame):  inputs 

 - `guides` (pd.DataFrame):  guides 

 - `scores` (pd.DataFrame):  scores 

**Returns:**

 

 - `pd.DataFrame`:  output table 



## module `beditor.lib.make_guides`

Designing the sgRNAs 

---



### function `get_guide_pam`

```python

get_guide_pam(

    match: str,

    pam_stream: str,

    guidel: int,

    seq: str,

    pos_codon: int = None

)

```

---



### function `get_pam_searches`

```python

get_pam_searches(dpam: DataFrame, seq: str, pos_codon: int) → DataFrame

```

Search PAM occurance 

:param dpam: dataframe with PAM sequences :param seq: target sequence :param pos_codon: reading frame :param test: debug mode on :returns dpam_searches: dataframe with positions of pams 

---



### function `get_guides`

```python

get_guides(

    data: DataFrame,

    dpam: DataFrame,

    guide_len: int,

    base_fraction_max: float = 0.8

) → DataFrame

```

Get guides 

**Args:**

 

 - `data` (pd.DataFrame):  input table 

 - `dpam` (pd.DataFrame):  table with PAM info 

 - `guide_len` (int):  guide length 

 - `base_fraction_max` (float, optional):  base fraction max. Defaults to 0.8. 

**Returns:**

 

 - `pd.DataFrame`:  output table 

---



### function `to_locusby_pam`

```python

to_locusby_pam(

    chrom: str,

    pam_start: int,

    pam_end: int,

    pam_position: str,

    strand: str,

    length: int,

    start_off: int = 0

) → str

```

To locus by PAM from PAM coords. 

**Args:**

 

 - `chrom` (str):  chrom 

 - `pam_start` (int):  PAM start 

 - `pam_end` (int):  PAM end 

 - `pam_position` (str):  PAM position 

 - `strand` (str):  strand 

 - `length` (int):  length 

**Returns:**

 

 - `str`:  locus 

---



### function `to_pam_coord`

```python

to_pam_coord(

    startf: int,

    endf: int,

    startp: int,

    endp: int,

    strand: str

) → tuple

```

To PAM coordinates 

**Args:**

 

 - `startf` (int):  start flank start 

 - `endf` (int):  start flank end 

 - `startp` (int):  start PAM start 

 - `endp` (int):  start PAM end 

 - `strand` (str):  strand 

**Returns:**

 

 - `tuple`:  start,end 

---



### function `get_distances`

```python

get_distances(df2: DataFrame, df3: DataFrame, cfg_method: dict) → DataFrame

```

Get distances 

**Args:**

 

 - `df2` (pd.DataFrame):  input table #1 

 - `df3` (pd.DataFrame):  input table #2 

 - `cfg_method` (dict):  config for the method 

**Returns:**

 

 - `pd.DataFrame`:  output table 

---



### function `get_windows_seq`

```python

get_windows_seq(s: str, l: str, wl: str, verbose: bool = False) → str

```

Sequence by guide strand 

**Args:**

 

 - `s` (str):  sequence 

 - `l` (str):  locus 

 - `wl` (str):  window locus 

 - `verbose` (bool, optional):  verbose. Defaults to False. 

**Returns:**

 

 - `str`:  window sequence 

---



### function `filter_guides`

```python

filter_guides(

    df1: DataFrame,

    cfg_method: dict,

    verbose: bool = False

) → DataFrame

```

Filter sgRNAs 

**Args:**

 

 - `df1` (pd.DataFrame):  input table 

 - `cfg_method` (dict):  config of the method 

 - `verbose` (bool, optional):  verbose. Defaults to False. 

**Returns:**

 

 - `pd.DataFrame`:  output table 

---



### function `get_window_target_overlap`

```python

get_window_target_overlap(

    tstart: int,

    tend: int,

    wl: str,

    ws: str,

    nt: str,

    verbose: bool = False

) → tuple

```

Get window target overlap 

**Args:**

 

 - `tstart` (int):  target start 

 - `tend` (int):  target end 

 - `wl` (str):  window locus 

 - `ws` (str):  window sequence 

 - `nt` (str):  nucleotide 

 - `verbose` (bool, optional):  verbose. Defaults to False. 

**Returns:**

 

 - `tuple`:  window_overlaps_the_target,wts,nt_in_overlap,wtl 

---



### function `get_mutated_codon`

```python

get_mutated_codon(

    ts: str,

    tl: str,

    tes: str,

    tel: str,

    strand: str,

    verbose: bool = False

) → str

```

Get mutated codon 

**Args:**

 

 - `ts` (str):  target sequence 

 - `tl` (str):  target locus 

 - `tes` (str):  target edited sequence 

 - `tel` (str):  target edited locus 

 - `strand` (str):  strand 

 - `verbose` (bool, optional):  verbose. Defaults to False. 

**Returns:**

 

 - `str`:  mutated codon 

---



### function `get_coedits_base`

```python

get_coedits_base(

    ws: str,

    wl: str,

    wts: str,

    wtl: str,

    nt: str,

    verbose: bool = False

) → str

```

Get co-edited bases 

**Args:**

 

 - `ws` (str):  window sequence 

 - `wl` (str):  window locus 

 - `wts` (str):  window target overlap sequence 

 - `wtl` (str):  window target overlap locus 

 - `nt` (str):  nucleotide 

 - `verbose` (bool, optional):  verbose. Defaults to False. 

**Returns:**

 

 - `str`:  coedits 



## module `beditor.lib`



## module `beditor.lib.methods`

**Global Variables**

---------------

- **multint2reg**

- **multint2regcomplement**

---



### function `dpam2dpam_strands`

```python

dpam2dpam_strands(dpam: DataFrame, pams: list) → DataFrame

```

Duplicates dpam dataframe to be compatible for searching PAMs on - strand 

**Args:**

 

 - `dpam` (pd.DataFrame):  dataframe with pam information 

 - `pams` (list):  pams to be used for actual designing of guides. 

**Returns:**

 

 - `pd.DataFrame`:  table 

---



### function `get_be2dpam`

```python

get_be2dpam(

    din: DataFrame,

    methods: list = None,

    test: bool = False,

    cols_dpam: list = ['PAM', 'PAM position', 'guide length']

) → dict

```

Make BE to dpam mapping i.e. dict 

**Args:**

 

 - `din` (pd.DataFrame):  table with BE and PAM info all cols_dpam needed 

 - `methods` (list, optional):  method names. Defaults to None. 

 - `test` (bool, optional):  test-mode. Defaults to False. 

 - `cols_dpam` (list, optional):  columns to be used. Defaults to ['PAM', 'PAM position', 'guide length']. 

**Returns:**

 

 - `dict`:  output dictionary. 



## module `beditor.lib.utils`

Utilities 

**Global Variables**

---------------

- **cols_muts**

- **multint2reg**

- **multint2regcomplement**

---



### function `get_src_path`

```python

get_src_path() → str

```

Get the beditor source directory path. 

**Returns:**

 

 - `str`:  path 

---



### function `runbashcmd`

```python

runbashcmd(cmd: str, test: bool = False, logf=None)

```

Run a bash command 

**Args:**

 

 - `cmd` (str):  command 

 - `test` (bool, optional):  test-mode. Defaults to False. 

 - `logf` (optional):  log file instance. Defaults to None. 

---



### function `log_time_elapsed`

```python

log_time_elapsed(start)

```

Log time elapsed. 

**Args:**

 

 - `start` (datetime):  start tile 

**Returns:**

 

 - `datetime`:  difference in time. 

---



### function `rescale`

```python

rescale(

    a: ,

    mn: float = None

) → 

```

Rescale a vector. 

**Args:**

 

 - `a` (np.array):  vector. 

 - `mn` (float, optional):  minimum value. Defaults to None. 

**Returns:**

 

 - `np.array`:  output vector 

---



### function `get_nt2complement`

```python

get_nt2complement()

```

---



### function `s2re`

```python

s2re(s: str, ss2re: dict) → str

```

String to regex patterns 

**Args:**

 

 - `s` (str):  string 

 - `ss2re` (dict):  substrings to regex patterns. 

**Returns:**

 

 - `str`:  string with regex patterns. 

---



### function `parse_locus`

```python

parse_locus(s: str, zero_based: bool = True) → tuple

```

parse_locus _summary_ 

**Args:**

 

 - `s` (str):  location string. 

 - `zero_based` (bool, optional):  zero-based coordinates. Defaults to True. 

**Returns:**

 

 - `tuple`:  chrom, start, end, strand 

**Notes:**

> beditor outputs (including bed files) use 0-based loci pyensembl and IGV use 1-based locations 

---



### function `get_pos`

```python

get_pos(s: str, l: str, reverse: bool = True, zero_based: bool = True) → Series

```

Expand locus to positions mapped to nucleotides. 

**Args:**

 

 - `s` (str):  sequence 

 - `l` (str):  locus 

 - `reverse` (bool, optional):  reverse the - strand. Defaults to True. 

 - `zero_based` (bool, optional):  zero based coordinates. Defaults to True. 

**Returns:**

 

 - `pd.Series`:  output. 

---



### function `get_seq`

```python

get_seq(

    genome: str,

    contig: str,

    start: int,

    end: int,

    strand: str,

    out_type: str = 'str',

    verbose: bool = False

) → str

```

Extract a sequence from a genome file based on start and end positions using streaming. 

**Args:**

 

 - `genome` (str):  The path to the genome file in FASTA format. 

 - `contig` (str):  chrom 

 - `start` (int):  start 

 - `end` (int):  end 

 - `strand` (str):  strand 

 - `out_type` (str, optional):  type of the output. Defaults to 'str'. 

 - `verbose` (bool, optional):  verbose. Defaults to False. 

**Raises:**

 

 - `ValueError`:  invalid strand. 

**Returns:**

 

 - `str`:  The extracted sequence. 

---



### function `read_fasta`

```python

read_fasta(

    fap: str,

    key_type: str = 'id',

    duplicates: bool = False,

    out_type='dict'

) → dict

```

Read fasta 

**Args:**

 

 - `fap` (str):  path 

 - `key_type` (str, optional):  key type. Defaults to 'id'. 

 - `duplicates` (bool, optional):  duplicates present. Defaults to False. 

**Returns:**

 

 - `dict`:  data. 

**Notes:**

> 1. If `duplicates` key_type is set to `description` instead of `id`. 

---



### function `format_coords`

```python

format_coords(df: DataFrame) → DataFrame

```

Format coordinates 

**Args:**

 

 - `df` (pd.DataFrame):  table 

**Returns:**

 

 - `pd.DataFrame`:  formated table 

---



### function `fetch_sequences_bp`

```python

fetch_sequences_bp(p: str, genome: str) → DataFrame

```

Fetch sequences using biopython. 

**Args:**

 

 - `p` (str):  path to the bed file. 

 - `genome` (str):  genome path. 

**Returns:**

 

 - `pd.DataFrame`:  sequences. 

---



### function `fetch_sequences`

```python

fetch_sequences(

    p: str,

    genome_path: str,

    outp: str = None,

    src_path: str = None,

    revcom: bool = True,

    method='2bit',

    out_type='df'

) → DataFrame

```

Fetch sequences 

**Args:**

 

 - `p` (str):  path to the bed file 

 - `genome_path` (str):  genome path 

 - `outp` (str, optional):  output path for fasta file. Defaults to None. 

 - `src_path` (str, optional):  source path. Defaults to None. 

 - `revcom` (bool, optional):  reverse-complement. Defaults to True. 

 - `method` (str, optional):  method name. Defaults to '2bit'. 

 - `out_type` (str, optional):  type of the output. Defaults to 'df'. 

**Returns:**

 

 - `pd.DataFrame`:  sequences. 

---



### function `get_sequences`

```python

get_sequences(

    df1: DataFrame,

    p: str,

    genome_path: str,

    outp: str = None,

    src_path: str = None,

    revcom: bool = True,

    out_type: str = 'df',

    renames: dict = {},

    **kws_fetch_sequences

) → DataFrame

```

Get sequences for the loci in a table 

**Args:**

 

 - `df1` (pd.DataFrame):  input table 

 - `p` (str):  path to the beb file 

 - `outp` (str, optional):  output path. Defaults to None. 

 - `src_path` (str, optional):  source path. Defaults to None. 

 - `revcom` (bool, optional):  reverse complement. Defaults to True. 

 - `out_type` (str, optional):  output type. Defaults to 'df'. 

 - `renames` (dict, optional):  renames. Defaults to {}. 

**Returns:**

 

 - `pd.DataFrame`:  output sequences 

**Notes:**

> Input is 1-based Output is 0-based Saves bed file and gets the sequences 

---



### function `to_locus`

```python

to_locus(

    chrom: str = 'chrom',

    start: str = 'start',

    end: str = 'end',

    strand: str = 'strand',

    x: Series = None

) → str

```

To locus 

**Args:**

 

 - `chrom` (str, optional):  chrom. Defaults to 'chrom'. 

 - `start` (str, optional):  strart. Defaults to 'start'. 

 - `end` (str, optional):  end. Defaults to 'end'. 

 - `strand` (str, optional):  strand. Defaults to 'strand'. 

 - `x` (pd.Series, optional):  row of the dataframe. Defaults to None. 

**Returns:**

 

 - `str`:  locus 

---



### function `get_flanking_seqs`

```python

get_flanking_seqs(

    df1: DataFrame,

    targets_path: str,

    flanks_path: str,

    genome: str = None,

    search_window: list = None

) → DataFrame

```

Get flanking sequences 

**Args:**

 

 - `df1` (pd.DataFrame):  input table 

 - `targets_path` (str):  target sequences path 

 - `flanks_path` (str):  flank sequences path 

 - `genome` (str, optional):  genome path. Defaults to None. 

 - `search_window` (list, optional):  search window around the target. Defaults to None. 

**Returns:**

 

 - `pd.DataFrame`:  output table with sequences 

---



### function `get_strand`

```python

get_strand(

    genome,

    df1: DataFrame,

    col_start: str,

    col_end: str,

    col_chrom: str,

    col_strand: str,

    col_seq: str

) → DataFrame

```

Get strand by comparing the aligned and fetched sequence 

**Args:**

 

 - `genome`:  genome instance 

 - `df1` (pd.DataFrame):  input table. 

 - `col_start` (str):  start 

 - `col_end` (str):  end 

 - `col_chrom` (str):  chrom 

 - `col_strand` (str):  strand 

 - `col_seq` (str):  sequences 

**Returns:**

 

 - `pd.DataFrame`:  output table 

**Notes:**

> used for tests. 

---



### function `reverse_complement_multintseq`

```python

reverse_complement_multintseq(seq: str, nt2complement: dict) → str

```

Reverse complement multi-nucleotide sequence 

**Args:**

 

 - `seq` (str):  sequence 

 - `nt2complement` (dict):  nucleotide to complement 

**Returns:**

 

 - `str`:  sequence 

---



### function `reverse_complement_multintseqreg`

```python

reverse_complement_multintseqreg(

    seq: str,

    multint2regcomplement: dict,

    nt2complement: dict

) → str

```

Reverse complement multi-nucleotide regex patterns 

**Args:**

 

 - `seq` (str):  _description_ 

 - `multint2regcomplement` (dict):  mapping. 

 - `nt2complement` (dict):  nucleotide to complement 

**Returns:**

 

 - `str`:  regex pattern 

---



### function `hamming_distance`

```python

hamming_distance(s1: str, s2: str) → int

```

Return the Hamming distance between equal-length sequences 

**Args:**

 

 - `s1` (str):  sequence #1 

 - `s2` (str):  sequence #2 

**Raises:**

 

 - `ValueError`:  Undefined for sequences of unequal length 

**Returns:**

 

 - `int`:  distance. 

---



### function `align`

```python

align(

    q: str,

    s: str,

    test: bool = False,

    psm: float = 2,

    pmm: float = 0.5,

    pgo: float = -3,

    pge: float = -1

) → str

```

Creates pairwise local alignment between seqeunces. 

**Args:**

 

 - `q` (str):  query 

 - `s` (str):  subject 

 - `test` (bool, optional):  test-mode. Defaults to False. 

**Returns:**

 

 - `str`:  alignment with symbols. 

**Notes:**

> REF: http://biopython.org/DIST/docs/api/Bio.pairwise2-module.html The match parameters are: 

>CODE  DESCRIPTION x     No parameters. Identical characters have score of 1, otherwise 0. m     A match score is the score of identical chars, otherwise mismatch score. d     A dictionary returns the score of any pair of characters. c     A callback function returns scores. The gap penalty parameters are: 

>CODE  DESCRIPTION x     No gap penalties. s     Same open and extend gap penalties for both sequences. d     The sequences have different open and extend gap penalties. c     A callback function returns the gap penalties. 

---



### function `get_orep`

```python

get_orep(seq: str) → int

```

Get the overrepresentation 

---



### function `get_polyt_length`

```python

get_polyt_length(s: str) → int

```

Counts the length of the longest polyT stretch (RNA pol3 terminator) in sequence 

:param s: sequence in string format 

---



### function `get_annots_installed`

```python

get_annots_installed() → DataFrame

```

Get a list of annotations installed. 

**Returns:**

 

 - `pd.DataFrame`:  output. 

---



### function `get_annots`

```python

get_annots(

    species_name: str = None,

    release: int = None,

    gtf_path: str = None,

    transcript_path: str = None,

    protein_path: str = None,

    reference_name: str = 'assembly',

    annotation_name: str = 'source',

    verbose: bool = False,

    **kws_Genome

)

```

Get pyensembl annotation instance 

**Args:**

 

 - `species_name` (str, optional):  species name. Defaults to None. 

 - `release` (int, optional):  release number. Defaults to None. 

 - `gtf_path` (str, optional):  GTF path. Defaults to None. 

 - `transcript_path` (str, optional):  transcripts path. Defaults to None. 

 - `protein_path` (str, optional):  protein path. Defaults to None. 

 - `reference_name` (str, optional):  reference name. Defaults to 'assembly'. 

 - `annotation_name` (str, optional):  annotation name. Defaults to 'source'. 

 - `verbose` (bool, optional):  verbose. Defaults to False. 

**Returns:**

 pyensembl annotation instance 

---



### function `to_pid`

```python

to_pid(annots, gid: str) → str

```

To protein ID 

**Args:**

 

 - `annots`:  pyensembl annotation instance 

 - `gid` (str):  gene ID 

**Returns:**

 

 - `str`:  protein ID 

---



### function `to_one_based_coordinates`

```python

to_one_based_coordinates(df: DataFrame) → DataFrame

```

To one based coordinates 

**Args:**

 

 - `df` (pd.DataFrame):  input table 

**Returns:**

 

 - `pd.DataFrame`:  output table. 



## module `beditor.lib.viz`

Visualizations. 

---



### function `to_igv`

```python

to_igv(

    cfg: dict = None,

    gtf_path: str = None,

    genome_path: str = None,

    output_dir_path: str = None,

    threads: int = 1,

    output_ext: str = None,

    force: bool = False

) → str

```

To IGV session file. 

**Args:**

 

 - `cfg` (dict, optional):  configuration of the run. Defaults to None. 

 - `gtf_path` (str, optional):  path to the gtf file. Defaults to None. 

 - `genome_path` (str, optional):  path to the genome file. Defaults to None. 

 - `output_dir_path` (str, optional):  path to the output directory. Defaults to None. 

 - `threads` (int, optional):  threads. Defaults to 1. 

 - `output_ext` (str, optional):  extension of the output. Defaults to None. 

 - `force` (bool, optional):  force. Defaults to False. 

**Returns:**

 

 - `str`:  path to the session file. 

---



### function `get_nt_composition`

```python

get_nt_composition(seqs: list) → DataFrame

```

Get nt composition. 

**Args:**

 

 - `seqs` (list):  list of sequences 

**Returns:**

 

 - `pd.DataFrame`:  table with the frequencies of the nucleotides. 

---



### function `plot_ntcompos`

```python

plot_ntcompos(

    seqs: list,

    pam_pos: str,

    pam_len: int,

    window: list = None,

    ax: Axes = None,

    color_pam: str = 'lime',

    color_window: str = 'gold'

) → Axes

```

Plot nucleotide composition 

**Args:**

 

 - `seqs` (list):  list of sequences. 

 - `pam_pos` (str):  PAM position. 

 - `pam_len` (int):  PAM length. 

 - `window` (list, optional):  activity window bounds. Defaults to None. 

 - `ax` (plt.Axes, optional):  subplot. Defaults to None. 

 - `color_pam` (str, optional):  color of the PAM. Defaults to 'lime'. 

 - `color_window` (str, optional):  color of the wnindow. Defaults to 'gold'. 

**Returns:**

 

 - `plt.Axes`:  subplot 

---



### function `plot_ontarget`

```python

plot_ontarget(

    guide_loc: str,

    pam_pos: str,

    pam_len: int,

    guidepam_seq: str,

    window: list = None,

    show_title: bool = False,

    figsize: list = [10, 2],

    verbose: bool = False,

    kws_sg: dict = {}

) → Axes

```

plot_ontarget _summary_ 

**Args:**

 

 - `guide_loc` (str):  sgRNA locus 

 - `pam_pos` (str):  PAM position 

 - `pam_len` (int):  PAM length 

 - `guidepam_seq` (str):  sgRNA and PAM sequence 

 - `window` (list, optional):  activity window bounds. Defaults to None. 

 - `show_title` (bool, optional):  show the title. Defaults to False. 

 - `figsize` (list, optional):  figure size. Defaults to [10,2]. 

 - `verbose` (bool, optional):  verbose. Defaults to False. 

 - `kws_sg` (dict, optional):  keyword arguments to plot the sgRNA. Defaults to {}. 

**Returns:**

 

 - `plt.Axes`:  subplot 

TODOs: 1. convert to 1-based coordinates 2. features from the GTF file 

---



### function `get_plot_inputs`

```python

get_plot_inputs(df2: DataFrame) → list

```

Get plot inputs. 

**Args:**

 

 - `df2` (pd.DataFrame):  table. 

**Returns:**

 

 - `list`:  list of tables. 

---



### function `plot_library_stats`

```python

plot_library_stats(

    dfs: list,

    palette: dict = {True: 'b', False: 'lightgray'},

    cutoffs: dict = None,

    not_be: bool = True,

    dbug: bool = False,

    figsize: list = [10, 2.5]

) → list

```

Plot library stats 

**Args:**

 

 - `dfs` (list):  list of tables. 

 - `palette` (_type_, optional):  color palette. Defaults to {True:'b',False:'lightgray'}. 

 - `cutoffs` (dict, optional):  cutoffs to be applied. Defaults to None. 

 - `not_be` (bool, optional):  not a base editor. Defaults to True. 

 - `dbug` (bool, optional):  debug mode. Defaults to False. 

 - `figsize` (list, optional):  figure size. Defaults to [10,2.5]. 

**Returns:**

 

 - `list`:  list of subplots. 



## module `beditor.run`

Command-line options 

---



### function `validate_params`

```python

validate_params(parameters: dict) → bool

```

Validate the parameters. 

**Args:**

 

 - `parameters` (dict):  parameters 

**Returns:**

 

 - `bool`:  whther the parameters are valid or not 

---



### function `cli`

```python

cli(

    editor: str = None,

    mutations_path: str = None,

    output_dir_path: str = None,

    species: str = None,

    ensembl_release: int = None,

    genome_path: str = None,

    gtf_path: str = None,

    rna_path: str = None,

    prt_path: str = None,

    search_window: int = None,

    not_be: bool = False,

    config_path: str = None,

    wd_path: str = None,

    threads: int = 1,

    kernel_name: str = 'beditor',

    verbose='WARNING',

    igv_path_prefix=None,

    ext: str = None,

    force: bool = False,

    dbug: bool = False,

    skip=None,

    **kws

)

```

beditor command-line (CLI)  

**Args:**

 

 - `editor` (str, optional):  base-editing method, available methods can be listed using command: 'beditor resources'. Defaults to None. 

 - `mutations_path` (str, optional):  path to the mutation file, the format of which is available at https://github.com/rraadd88/beditor/README.md#Input-format. Defaults to None. 

 - `output_dir_path` (str, optional):  path to the directory where the outputs should be saved. Defaults to None. 

 - `species` (str, optional):  species name. Defaults to None. 

 - `ensembl_release` (int, optional):  ensemble release number. Defaults to None. 

 - `genome_path` (str, optional):  path to the genome file, which is not available on Ensembl. Defaults to None. 

 - `gtf_path` (str, optional):  path to the gene annotations file, which is not available on Ensembl. Defaults to None. 

 - `rna_path` (str, optional):  path to the transcript sequences file, which is not available on Ensembl. Defaults to None. 

 - `prt_path` (str, optional):  path to the protein sequences file, which is not available on Ensembl. Defaults to None. 

 - `search_window` (int, optional):  number of bases to search on either side of a target, if not specified, it is inferred by beditor. Defaults to None. 

 - `not_be` (bool, optional):  do not process as a base editor. Defaults to False. 

 - `config_path` (str, optional):  path to the configuration file. Defaults to None. 

 - `wd_path` (str, optional):  path to the working directory. Defaults to None. 

 - `threads` (int, optional):  number of threads. Defaults to 1. 

 - `kernel_name` (str, optional):  name of the jupyter kernel. Defaults to "beditor". 

 - `verbose` (str, optional):  verbose, logging levels: DEBUG > INFO > WARNING > ERROR (default) > CRITICAL. Defaults to "WARNING". 

 - `igv_path_prefix` (_type_, optional):  prefix to be added to the IGV url. Defaults to None. 

 - `ext` (str, optional):  file extensions of the output tables. Defaults to None. 

 - `force` (bool, optional):  overwrite the outputs of they exist. Defaults to False. 

 - `dbug` (bool, optional):  debug mode (developer). Defaults to False. 

 - `skip` (_type_, optional):  skip sections of the workflow (developer). Defaults to None. 

**Examples:**

 beditor cli -c inputs/mutations/protein/positions.yml 

**Notes:**

> Required parameters for a run: editor mutations_path output_dir_path 

>or 

>config_path 

---



### function `gui`

```python

gui()

```

---



### function `resources`

```python

resources()

```
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/rraadd88/beditor

Awesome Lists containing this project

README