https://github.com/rhpvorderman/phred-to-error-rate

Convert FASTQ phred scores to probabilities
https://github.com/rhpvorderman/phred-to-error-rate

Last synced: 3 months ago
JSON representation

Convert FASTQ phred scores to probabilities

Host: GitHub
URL: https://github.com/rhpvorderman/phred-to-error-rate
Owner: rhpvorderman
License: mit
Created: 2023-09-04T14:48:38.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2023-09-05T12:10:33.000Z (over 1 year ago)
Last Synced: 2025-01-17T08:43:57.951Z (4 months ago)
Language: C
Size: 9.77 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

        # phred-to-error-rate

Convert FASTQ phred scores to probabilities

## Calculating the score

Actually calculating the score requires calculating an exponent of 10. 

This is a lot slower than doing the lookup. Doing this is not recommended.

## Using a lookup table

FASTQ phred scores have a maximum of 94 discrete values. The lower bound is 33

the upper bound is 126. The current fastest 

way is to use a lookup table. One can be easily generated in python:

```python3

PHRED_TO_SCORE_LOOKUP = [10 ** (-(i -33) / 10) for i in range (127)]

```

Using [dnaio](https://github.com/marcelm/dnaio) one can easily calculate

the average probability for each read:

```python3

import math 

import dnaio

PHRED_TO_SCORE_LOOKUP = [10 ** (-(i -33) / 10) for i in range (127)]

for read in dnaio.open("my.fastq"):

    phreds = read.qualities_as_bytes()

    total_expected_errors = 0.0

    for phred in phreds:  # iterating over bytes gives integers

        total_expected_errors += PHRED_TO_SCORE_LOOKUP[phred]

    average_error_rate = total_expected_errors / len(read)

    read_phred_score = -10 * math.log10(average_error_rate)

```

This is how to do it in C.

```C

#include 

#include 

static inline double

average_error_rate(const uint8_t *phreds, size_t phreds_length) {

    const uint8_t *end_ptr = phreds + phreds_length;

    const uint8_t *cursor = phreds;

    double error_rate = 0.0;

    while (cursor < end_ptr) {

        uint8_t phred = *cursor - 33; 

        /* Because phred is unsigned, we only have to check the upper bound */

        if (phred > 93) {

            return -1.0;  // Error rate should always be positive, so this is a good error value.

        }

        error_rate += SCORE_TO_ERROR_RATE[phred];

        cursor += 1;

    }

    return error_rate / (double)phreds_length;

}

```

Code to generate score_to_error_rate.h is included 

[here](score_to_error_rate.py). An already produced example can be obtained 

[here](score_to_error_rate.h).

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/rhpvorderman/phred-to-error-rate

Awesome Lists containing this project

README