Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.
Awesome Lists | Featured Topics | Projects
https://github.com/geky/ramrsbd

Last synced: 1 day ago
JSON representation
Host: GitHub
URL: https://github.com/geky/ramrsbd
Owner: geky
License: bsd-3-clause
Created: 2024-10-08T15:46:50.000Z (about 1 month ago)
Default Branch: master
Last Pushed: 2024-10-26T19:32:36.000Z (25 days ago)
Last Synced: 2024-10-27T13:24:50.706Z (24 days ago)
Language: C
Size: 180 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE.md
Awesome Lists containing this project

README

        ## ramrsbd

An example of a Reed-Solomon based error-correcting block device backed

by RAM.

```

corrupted:

'                 ..:::::7::::b:.. '       '  8     .:::....:::.96a:9:859.e:.dfbed5e 94ab8 4c:.4

      .     48 .:::::::::::::::::::..   '           :::::::::::::44ef696b 6d'6:e:79e5.748f.8fc'9

 8      '    .::::::::::':::':::::::::.            8::::::::::' e75a6ac555.f8cf8.b:9'ed57ad4a ec

           .:::::::::::::::::::::::::::. ' .  '  . ::::::::::   'c4ddf9fd:9.7e  ef 7a5c 7fa6'c4'

          :::::'::::::7:::7.:::::::::::::    ... :: :':::'::'   bb'c:da5a7 d6b.87d5b57b 9f7a: a:

         4::::::::::::::'::::::::::::::'' .. '::: '     '''     '7 aece5:de .68467c79fdd'49:8597

         ::::::::::::::::::::::'::::''.:::::: '        .  .     fb6b.6:'44f4'dcdf5e'.fac..:6fa8e

         :7:::::::::::::::::::::'' ..:::::''  '               ' dbaa4ba6e5a'c9 5a adb5e:7f8b554e

  8     8:::::::::::::::::::'' ..:::::'' ..               .    .' 'be4d.dd4b6c64:a5e'ba4:b578d :

   8 ..: ::::::::::::::''' ..:::::'' ..:::                      .a55f.cf:4':9.b8:44ea9c5bfd9c7e.

 8.::::'  :::::::::'' ..::::::'' ..:::::::              .       f8.'c7d.c9b:49a'48a:7 5c:9cc.6:b

 :::'4    '7::'' ...::::::''.5.::::::::::        .              edfe94e8:.b4ff':c655 5d6f7f5d:7'

::'         ...::::::6' ..:::::::::::::'       8  8             c88f.'96f8fbe5787bbad4f.bf:d9b

     9...:::::::'' ..::::::::::'::::.:'              4          ebe5eecda4fdee9b5:58:ee f:cafe::

'::::::::'''4.  ::::::::::::::::7::''    .  8                   ece:'b 6:7:a7.e5a'6a dfe'b967fe:

                  '':::::::::::'''                        ..   '99ade89:df5 7b7b e475ad:c98efaac

```

```

corrected:

                  ..:::::::::::...                  .:::....:::.96a:9:859.e:.dfbed5e 94ab   c:.4

               .:::::::::::::::::::..               :::::::::::::44ef696b 6d'6:e:79e5.748f.afc'9

             .::::::::::::::::::::::::.             ::::::::::' e75a6ac555.f8cf8.b:9'ad56ad5a ec

           .:::::::::::::::::::::::::::.         . ::::::::::   'c4dcf9fd:9.7e  ef 7a5c67fa6'c4'

          .::::::::::::::::::::::::::::::    ... :: :'::::::'   bb'c:da5a74d6b.87d5b57b 9f7a: a:

          :::::::::::::::::::::::::::::'' .. '::: '     '''     '7 aece5:de .68467c:9fdd'c9:8597

         :::::::::::::::::::::::::::''..::::: '                 fb6b.6:'44f4'dcdf5e'.fac..:6fa8e

         :::::::::::::::::::::::'' ..:::::''                    dbaa4ba6e5''c9 5b.adb5e:6f8b554e

         :::::::::::::::::::'' ..:::::'' .                      ' 'be4d.dd4b6c64:a5ea.a4:b578d :

     ..: ::::::::::::::''' ..:::::'' ..:::                      .a55f.cf:4':9.b8:446a8c5bfd9c7e.

  ..:::'  :::::::::'' ..::::::'' ..:::::::                      f8 'c6d.c9b:49a'48a:7 5c79cc.6:b

 :::'     ':::'' ...::::::''...::::::::::                       edfe94e8:.b4ff':c65585d'f7b5d:7'

::'         ...::::::'' ..:::::::::::::'                        c88f:'96f8fbc5787bbad4f.bf:d9b8

     ....:::::::'' ..:::::::::::::::::'                         ebe5eecda4fdee8b5:58:ee e:cafe::

'::::::::'''    :::::::::::::::::::''                           ece:'b 6:7:af9e5a'6' dfe'b967fe:

                  '':::::::::::'''                              99adf89:df5 7b7b e475ad:c98efaac

```

Reed-Solomon codes are sort of the big brother to CRCs. Operating on

bytes instead of bits, they are both flexible and powerful, capable of

detecting and correcting a configurable number of byte errors

efficiently.

Assuming `ecc_size` bytes of ECC, ramrsbd can correct `floor(ecc_size/2)`

byte errors in up to 255 byte codewords (yes, 255 bytes, not 256 :/).

The main tradeoff is they are quite a bit more complex mathematically,

which adds some code, RAM, and brain cost.

A quick comparison of current ram-ecc-bds:

|            | code   | tables | stack | buffers  | runtime                  |

|:-----------|-------:|-------:|------:|---------:|-------------------------:|

| ramcrc32bd |  940 B |   64 B |  88 B |      0 B |      $O\left(n^e\right)$ |

| ramrsbd    | 1506 B |  512 B | 128 B | n + 4e B | $O\left(ne + e^2\right)$ |

See also:

- [littlefs][littlefs]

- [ramcrc32bd][ramcrc32bd]

## RAM?

Right now, [littlefs's][littlefs] block device API is limited in terms of

composability. While it would be great to fix this on a major API change,

in the meantime, a RAM-backed block device provides a simple example of

error-correction that users may be able to reimplement in their own

block devices.

## Testing

Testing is a bit jank right now, relying on littlefs's test runner:

``` bash

$ git clone https://github.com/littlefs-project/littlefs -b v2.9.3 --depth 1

$ make test -j

```

## Words of warning

Before we get into how the algorithm works, a couple words of warning:

1. I'm not a mathematician! Some of the definitions here are a bit

   handwavey, and I'm skipping over the history of [BCH][w-bch] codes,

   [PGZ][w-pgz], [Euclidean methods][w-euclidean], etc. I'd encourage you

   to also explore [Wikipedia][w-rs] and other relevant articles to learn

   more.

   My goal is to explain, to the best of my (limited) knowledge, how to

   implement Reed-Solomon codes, and how/why they work.

2. The following math relies heavily on [finite-fields][w-gf] (sometimes

   called Galois-fields) and the related theory.

   If you're not familiar with finite-fields, they are an abstraction we

   can make over finite numbers (bytes for [GF(256)][w-gf256], bits for

   [GF(2)][w-gf2]) that let us do most of math without worrying about

   pesky things like integer overflow.

   But there's not enough space here to fully explain how they work, so

   I'd suggest reading some of the above articles first.

3. This is some pretty intense math! Reed-Solomon has evolved over

   several decades and many PhDs. Just look how many names are involved

   in this thing. So don't get discouraged if it feels impenetrable.

   Feel free to take a break to remind yourself that math is not real and

   can not hurt you.

   I'd also encourage you to use GitHub's table-of-contents to jump

   around and keep track of where you are.

## How it works

Like CRCs, Reed-Solomon codes are implemented by concatenating the

message with the remainder after division by a predetermined "generator

polynomial", giving us a [systematic code][w-systematic-code].

However, two important differences:

1. Instead of using a binary polynomial in [GF(2)][w-gf2], we use a

   polynomial in a higher-order [finite-field][w-gf], usually

   [GF(256)][w-gf256] because operating on bytes is convenient.

2. We intentionally construct the polynomial to tell us information about

   any errors that may occur.

   Specifically, we constrain our polynomial (and implicitly our

   codewords) to evaluate to zero at $n$ fixed points. As long as we have

   $e \le \frac{n}{2}$ errors, we can solve for errors using these fixed

   points.

   Note this isn't really possible with CRCs in GF(2), because the only

   non-zero binary number is, well, 1. GF(256) has 255 non-zero elements,

   which we'll see becomes quite important.

### Constructing a generator polynomial

Ok, first step, constructing a generator polynomial.

If we want to correct $e$ byte-errors, we will need $n = 2e$ fixed

points. We can construct a generator polynomial $P(x)$ with $n$ fixed

points at $g^i$ where $i < n$ like so:







We could choose any arbitrary set of fixed points, but usually we choose

$g^i$ where $g$ is a [generator][w-generator] in GF(256), since it

provides a convenient mapping of integers to unique non-zero elements in

GF(256).

Note that for any fixed point $g^i$:







And since multiplying anything by zero is zero, this will make our entire

product zero. So for any fixed point $g^i$, $P(g^i)$ should evaluate to

zero:







This gets real nifty when you look at the definition of our Reed-Solomon

code for codeword $C(x)$ given a message $M(x)$:







As is true with normal math, subtracting the remainder after division

gives us a polynomial that is a multiple of $P(x)$. And since multiplying

anything by zero is zero, for any fixed point $g^i$, $C(g^i)$ should also

evaluate to zero:







#### Modeling errors

Ok, but what if there are errors?

We can think of introducing errors as adding an error polynomial $E(x)$

to our original codeword, where $E(x)$ contains up to $e$ non-zero terms:













Check out what happens if we plug in our fixed point $g^i$:







The original codeword drops out! Leaving us with an equation defined only

by the error polynomial.

We call these evaluations our "syndromes" $S_i$, since they tell us

information about the errors in our codeword:







We usually refer to the unknowns in this equation as the

"error-locations" $X_j = g^j$, and the "error-magnitudes" $Y_j = E_j$:







Note that finding $X_j$ also gives us $j$, since $j = \log_g X_j$. We

usually write it this way just to avoid adding a bunch of $g^j$

everywhere.

If we can figure out both the error-locations and error-magnitudes, we

have enough information to reconstruct our original codeword:







### Finding the error locations

Ok, let's say we received a codeword $C'(x)$ with $e$ errors. Evaluating

at our fixed points $g^i$, where $i < n$ and $n \ge 2e$, gives us our

syndromes $S_i$:







The next step is figuring out the error-locations $X_j$.

To help with this, we introduce a very special polynomial, the

"error-locator polynomial" $\Lambda(x)$:







This polynomial has some rather useful properties:

1. For any error-location $X_j$, $\Lambda(X_j^{-1})$ evaluates to zero:

   


   

   


   This is for similar reasons why $P(g^i) = 0$. For any error-location

   $X_j$:

   


   

   


   And since multiplying anything by zero is zero, the product reduces to

   zero.

2. $\Lambda(0)$ evaluates to one:

   


   

   


   This can be seen by plugging in 0:

   


   

   


   This prevents trivial solutions and is what makes $\Lambda(x)$ useful.

What's _really_ interesting is that these two properties allow us to

solve for $\Lambda(x)$ with only our syndromes $S_i$.

We know $\Lambda(x)$ has $e$ roots, which means we can expand it into a

polynomial with $e+1$ terms. We also know that $\Lambda(0) = 1$, so the

constant term must be 1. Giving the coefficients of this expanded

polynomial the arbitrary names

$\Lambda_1, \Lambda_2, \cdots, \Lambda_e$, we find another definition for

$\Lambda(x)$:







Note this doesn't actually change our error-locator $\Lambda(x)$, it

still has all of its original properties. For example, if we plug in

$X_j^{-1}$ it should still evaluate to zero:







And since multiplying anything by zero is zero, we can multiply this by,

say, $Y_j X_j^i$, and the result should still be zero:







We can even add a bunch of these together and the result should still be

zero:







Wait a second...







Aren't these our syndromes? $S_i = \sum_{j \in E} Y_j X_j^i$?







They are! We can rearrange this into an equation for $S_i$ using only our

coefficients $\Lambda_k$ and $e$ previously seen syndromes

$S_{i-1}, S_{i-2}, \cdots, S_{i-e}$:







If we repeat this $e$ times, for syndromes

$S_e, S_{e+1}, \cdots, S_{n-1}$, we end up with $e$ equations and

$e$ unknowns. A system that is, in theory, solvable:







This is where the $n=2e$ requirement comes from, and why we need $n=2e$

syndromes to solve for $e$ errors at unknown locations.

#### Berlekamp-Massey

Ok that's the theory, but solving this system of equations efficiently is

still quite difficult.

Enter [Berlekamp-Massey][w-bm].

A key observation by Massey is that solving for $\Lambda(x)$ is

equivalent to constructing an [LFSR][w-lfsr] that generates the sequence

$S_e, S_{e+1}, \dots, S_{n-1}$ given the initial state

$S_0, S_1, \dots, S_{e-1}$:

```

.---- + <- + <- + <- + <--- ... --- + <--.

|     ^    ^    ^    ^              ^    |

|    *Λ1  *Λ2  *Λ3  *Λ4     ...   *Λe-1 *Λe

|     ^    ^    ^    ^              ^    ^

|   .-|--.-|--.-|--.-|--.--     --.-|--.-|--.

'-> |Se-1|Se-2|Se-3|Se-4|   ...   | S1 | S0 | -> Sn-1 Sn-2 ... S2+3 Se+2 Se+1 Se Se-1 Se-2 ... S3 S2 S1 S0

    '----'----'----'----'--     --'----'----'

```

Pretty wild huh.

We can describe such an LFSR with a [recurrence relation][w-recurrence-relation]

that might look a bit familiar:







Berlekamp-Massey relies on two key observations:

1. If an LFSR $L(i)$ of size $|L|$ generates the sequence

   $s_0, s_1, \dots, s_{n-1}$, but fails to generate the sequence

   $s_0, s_1, \dots, s_{n-1}, s_n$, than an LFSR $L'(i)$ that _does_

   generate the sequence must have a size of at least:

   


   

   


   Massey's proof of this is very math heavy.

   Consider the equation for our LFSR $L(i)$ at $n$:

   


   

   


   If we have another LFSR $L'(i)$ that generates

   $s_{n-|L|}, s_{n-|L|+1}, \cdots, s_{n-1}$, we can substitute it in for

   $s_{n-k}$:

   


   

   


   Multiplication is distributive, so we can move our summations around:

   


   

   


   And note that right summation looks a lot like $L(i)$. If $L(i)$

   generates $s_{n-|L'|}, s_{n-|L'|+1}, \cdots, s_{n-1}$, we can replace

   it with $s_{n-k'}$:

   


   

   

   

   Oh hey! That's the definition of $L'(i)$:

   


   

   


   So if $L'(i)$ generates $s_n$, $L(i)$ must also generate $s_n$.

   The only way to make $L'(i)$ generate a different $s_n$ would be to

   make $|L'| \ge n+1-|L|$ so that $L(i)$ can no longer generate

   $s_{n-|L'|}, s_{n-|L'|+1}, \cdots, s_{n-1}$.

2. Once we've found the best LFSR $L(i)$ for a given size $|L|$, its

   definition provides an optimal strategy for changing only the last

   element of the generated sequence.

   This is assuming $L(i)$ failed of course. If $L(i)$ generated the

   whole sequence our algorithm is done!

   If $L(i)$ failed, we assume it correctly generated

   $s_0, s_1, \cdots, s_{n-1}$, but failed at $s_n$. We call the

   difference from the expected symbol the discrepancy $d$:

   


   

   


   If we know $s_i$ (which requires a larger LFSR), we can rearrange this

   to be a bit more useful. We call this our connection polynomial

   $C(i)$:

   


   

   


   Now, if we have a larger LFSR $L'(i)$ with size $|L'| \gt |L|$ and we

   want to change only the symbol $s'_n$ by $d'$, we can add

   $d' \cdot C(i)$, and only $s'_n$ will be affected:

   


   

   


If you can wrap your head around those two observations, you have

understood most of Berlekamp-Massey.

The actual algorithm itself is relatively simple:  

  

1. Using the current LFSR $L(i)$, generate the next symbol and calculate

   the discrepancy $d$ between it and the expected symbol $s_n$:

   


   

   

  

2. If $d=0$, great! Move on to the next symbol.  

  

3. If $d \ne 0$, we need to tweak our LFSR:

   1. First check if our LFSR is big enough. If $n \ge 2|L|$, we need a

      bigger LFSR:

      


      

      


      If we're changing the size, save the best LFSR at the current size

      for future tweaks:

      


      

      


   2. Now we can fix the LFSR by adding our last $C(i)$ (not $C'(i)$!),

      shifting and scaling so only $s_n$ is affected:

      


      

      


      Where $m$ is the value of $n$ when we saved the last $C(i)$. If we

      shift $C(i)$ every step of the algorithm, we usually don't need to

      track $m$ explicitly.

This is all implemented in `ramrsbd_find_λ`.

#### Solving binary LFSRs for fun

Taking a step away from [GF(256)][w-gf256] for a moment, let's look at a

simpler LFSR in [GF(2)][w-gf2], aka binary.

Consider this binary sequence generated by a minimal LFSR that I know and

you don't :)

```

1 1 0 0 1 1 1 1

```

Can you figure out the original LFSR?

Click here to solve with Berlekamp-Massey

---

Using Berlekamp-Massey:

To start, let's assume our LFSR is an empty LFSR that just spits out a

stream of zeros. Not the most creative LFSR, but we have to start

somewhere!

```

|L0| = 0

L0(i) = 0

C0(i) = s_i

L0 = 0 -> Output:   0

          Expected: 1

                d = 1

```

Ok, so immediately we see a discrepancy. Clearly our output is not a

string of all zeros, and we need _some_ LFSR:

```

|L1| = 0+1-|L0| = 1

L1(i) = L0(i) + C0(i-1) = s_i-1

C1(i) = s_i + L0(i) = s_i

     .-----.

     |     |

     |   .-|--.

L1 = '-> | 1  |-> Output:   1 1

         '----'   Expected: 1 1

                        d = 0

```

That's looking much better. This little LFSR will actually get us

decently far into the sequence:

```

|L2| = |L1| = 1

L2(i) = L1(i) = s_i-1

C2(i) = C1(i-1) = s_i-1

     .-----.

     |     |

     |   .-|--.

L2 = '-> | 1  |-> Output:   1 1 1

         '----'   Expected: 1 1 1

                        d = 0

```

```

|L3| = |L2| = 1

L3(i) = L2(i) = s_i-1

C3(i) = C2(i-1) = s_i-2

     .-----.

     |     |

     |   .-|--.

L3 = '-> | 1  |-> Output:   1 1 1 1

         '----'   Expected: 1 1 1 1

                        d = 0

```

```

|L4| = |L3| = 1

L4(i) = L3(i) = s_i-1

C4(i) = C3(i-1) = s_i-3

     .-----.

     |     |

     |   .-|--.

L4 = '-> | 1  |-> Output:   1 1 1 1 1

         '----'   Expected: 0 1 1 1 1

                        d = 1

```

Ah! A discrepancy!

We're now at step 4 with only a 1-bit LFSR. $4 \ge 2\cdot1$, so a bigger

LFSR is needed.

Resizing our LFSR to $4+1-1 = 4$, we can then add $C(i-1)$ to fix the

discrepancy, save the previous LFSR as the new $C(i)$, and continue:

```

|L5| = 4+1-|L4| = 4

L5(i) = L4(i) + C4(i-1) = s_i-1 + s_i-4

C5(i) = s_i + L4(i) = s_i + s_i-1

     .---- + <------------.

     |     ^              |

     |   .-|--.----.----.-|--.   Expected: 0 0 1 1 1 1

L5 = '-> | 1  | 1  | 1  | 1  |-> Output:   1 0 1 1 1 1

         '----'----'----'----'         d = 1

```

Another discrepancy! This time we don't need to resize the LFSR, just add

the shifted $C(i-1)$.

Thanks to math, we know this should have no affect on any of the

previously generated symbols, but feel free to regenerate the sequence to

prove this to yourself. This property is pretty unintuitive!

```

|L6| = |L5| = 4

L6(i) = L5(i) + C5(i-1) = s_i-2 + s_i-4

C6(i) = C5(i-1) = s_i-1 + s_i-2

     .--------- + <-------.

     |          ^         |

     |   .----.-|--.----.-|--.

L6 = '-> | 1  | 1  | 1  | 1  |-> Output:   1 0 0 1 1 1 1

         '----'----'----'----'   Expected: 1 0 0 1 1 1 1

                                       d = 0

```

No discrepancy this time, let's keep going:

```

|L7| = |L6| = 4

L7(i) = L6(i) = s_i-2 + s_i-4

C7(i) = C6(i-1) = s_i-2 + s_i-3

     .--------- + <-------.

     |          ^         |

     |   .----.-|--.----.-|--.

L7 = '-> | 1  | 1  | 1  | 1  |-> Output:   1 1 0 0 1 1 1 1

         '----'----'----'----'   Expected: 1 1 0 0 1 1 1 1

                                       d = 0

```

And now that we've generated the whole sequence, we have our LFSR:

```

|L8| = |L7| = 4

L8(i) = L7(i) = s_i-2 + s_i-4

     .--------- + <-------.

     |          ^         |

     |   .----.-|--.----.-|--.

L8 = '-> | 1  | 1  | 1  | 1  |-> Output:   1 1 0 0 1 1 1 1

         '----'----'----'----'   Expected: 1 1 0 0 1 1 1 1

```

---

#### bm-lfsr-solver.py

In case you want to play around with this algorithm more (and for my own

experiments), I've ported this LFSR solver to Python in

[bm-lfsr-solver.py][bm-lfsr-solver.py]. Feel free to try your own binary

sequences:

``` bash

$ ./bm-lfsr-solver.py 1 1 0 0 1 1 1 1

... snip ...

     .--------- + <-------.

     |          ^         |

     |   .----.-|--.----.-|--.

L8 = '-> | 1  | 1  | 1  | 1  |-> Output:   1 1 0 0 1 1 1 1

         '----'----'----'----'   Expected: 1 1 0 0 1 1 1 1

```

```

$ ./bm-lfsr-solver.py 01101000 01101001 00100001

... snip ...

      .---- + <---------------- + <- + <- + <------ + <-------.

      |     ^                   ^    ^    ^         ^         |

      |   .-|--.----.----.----.-|--.-|--.-|--.----.-|--.----.-|--.----.

L24 = '-> | 1  | 0  | 0  | 1  | 0  | 0  | 1  | 0  | 0  | 0  | 0  | 1  |-> Output:   0 1 1 0 1 0 0 0 0 1 1 0 1 0 0 1 0 0 1 0 0 0 0 1

          '----'----'----'----'----'----'----'----'----'----'----'----'   Expected: 0 1 1 0 1 0 0 0 0 1 1 0 1 0 0 1 0 0 1 0 0 0 0 1

```

I've also implemented a similar script,

[bm-lfsr256-solver.py][bm-lfsr256-solver.py], for full GF(256) LFSRs,

though it's a bit harder for follow unless you can do full GF(256)

multiplications in your head:

```

$ ./bm-lfsr256-solver.py 30 80 86 cb a3 78 8e 00

... snip ...

     .---- + <- + <- + <--.

     |     ^    ^    ^    |

     |    *f0  *04  *df  *ea

     |     ^    ^    ^    ^

     |   .-|--.-|--.-|--.-|--.

L8 = '-> | a3 | 78 | 8e | 00 |-> Output:   30 80 86 cb a3 78 8e 00

         '----'----'----'----'   Expected: 30 80 86 cb a3 78 8e 00

```

Is Berlekamp-Massey a good compression algorithm? Probably not.

#### Locating the errors

Coming back to Reed-Solomon, thanks to Berlekamp-Massey we can solve the

following recurrence for the terms $\Lambda_k$ given at least $n \ge 2e$

syndromes $S_i$:







These terms define our error-locator polynomial, which we can use to

find the locations of errors:







All we have left to do is figure out where $\Lambda(X_j^{-1})=0$.

The easiest way to do this is just brute force: Just plug in every

location $X_j=g^j$ in our codeword, and if $\Lambda(X_j^{-1}) = 0$, we

know $X_j$ is the location of an error:







Wikipedia and other resources often mention an optimization called

[Chien's search][w-chien] being applied here, but from reading up on the

algorithm it seems to only be useful for hardware implementations.  In

software Chien's search doesn't actually improve our runtime over brute

force with Horner's method and log tables ( $O(ne)$ vs $O(ne)$ ).

### Finding the error magnitudes

Once we've found the error-locations $X_j$, the next step is to find the

error-magnitudes $Y_j$.

This step is relatively straightforward... sort of...

Recall the definition of our syndromes $S_i$:







With $e$ syndromes, this can be rewritten as a system with $e$ equations

and $e$ unknowns, which we can, in theory, solve for:







#### Forney's algorithm

But again, solving this system of equations is easier said than done.

Enter [Forney's algorithm][w-forney].

Assuming we know an error-locator $X_j$, the following formula will spit

out an error-magnitude $Y_j$:







Where $\Omega(x)$, called the "error-evaluator polynomial", is defined

like so:







$S(x)$, the "syndrome polynomial", is defined like so (we just pretend

our syndromes are a polynomial now):







And $\Lambda'(x)$, the [formal derivative][w-formal-derivative] of the

error-locator, can be calculated like so:







Though note $k$ is not a field element, so multiplication by $k$

represents normal repeated addition. And since addition is xor in our

field, this really just cancels out every other term.

The end result is a simple formula for our error-magnitudes $Y_j$.

#### WTF

Haha, I know right? Where did this equation come from? Why does it work?

How did Forney even come up with this?

I don't know the answer to most of these questions, there's very little

information online about where/how/what this formula comes from.

But at the very least we can prove that it _does_ work.

#### The error-evaluator polynomial

Let's start with the syndrome polynomial $S(x)$:







Substituting in the definition of our syndromes

$S_i = \sum_{k \in E} Y_k X_k^i x^i$:







The sum on the right turns out to be a [geometric series][w-geometric-series]:







If we then multiply with our error-locator polynomial

$\Lambda(x) = \prod_{l \in E} \left(1 - X_l x\right)$:







We see exactly one term in each summand cancel out, the term where

$l = k$.

At this point, if we plug in $X_j^{-1}$, $S(X_j^{-1})\Lambda(X_j^{-1})$

still evaluates to zero thanks to the error-locator polynomial

$\Lambda(x)$.

But if we expand the multiplication, something interesting happens:







On the left side of the subtraction, all terms are $\le$ degree

$x^{e-1}$. On the right side of the subtraction, all terms are $\ge$

degree $x^n$.

Imagine how these both contribute to the expanded form of the equation:







If we truncate this polynomial, $\bmod x^n$ in math land, we can

effectively delete part of this equation:







Giving us an equation for the error-evaluator polynomial, $\Omega(x)$:







Note that $l \ne k$ condition.

The error-evaluator polynomial $\Omega(x)$ still contains a big chunk

of our error-locator polynomial $\Lambda(x)$, so if we plug in an

error-location $X_j^{-1}$, _most_ of the terms evaluate to zero.

Except one! The one where $j = k$:







And right there is our error-magnitude $Y_j$! Sure we end up with a

bunch of extra gobbledygook, but $Y_j$ _is_ there.

The good news is that gobbledygook depends only on our error-locations

$X_j$, which we _do_ know and can in theory remove with more math.

#### The formal derivative of the error-locator polynomial

But Forney has one last trick up his sleeve: The

[formal derivative][w-formal-derivative] of the error-locator polynomial,

$\Lambda'(x)$.

What the heck is a formal derivative?

Well we can't use normal derivatives in a finite-field like GF(256)

because they depend on the notion of a limit which depends on the field

being, well, not finite.

But derivatives are so useful mathematicians use them anyways.

Applying a formal derivative looks a lot like a normal derivative in

normal math:













Except $i$ here is not a finite-field element, so instead of doing

finite-field multiplication, we do normal repeated addition. And since

addition is xor in our field, this just has the effect of canceling out

every other term.

Quite a few properties of derivatives still hold in finite-fields. Of

particular interest to us is the [product rule][w-product-rule]:













Applying this to our error-locator polynomial $\Lambda(x)$:













Recall the other definition of our error-locator polynomial $\Lambda(x)$:













Applying the product rule:







Starting to look familiar?

Just like the error-evaluator polynomial $\Omega(x)$, plugging in an

error-location $X_j^{-1}$ causes _most_ of the terms to evaluate to

zero, except the one where $j = k$, revealing $X_j$ times our

gobbledygook!







#### Evaluating the errors

So for a given error-location $X_j$, the error-evaluator polynomial

$\Omega(X_j^{-1})$ gives us the error-magnitude times some gobbledygook:







And the formal derivative of the error-locator polynomial $\Lambda'(X_j^{-1})$

gives us the error-location times the same gobbledygook:







If we divide $\Omega(X_j^{-1})$ by $\Lambda'(X_j^{-1})$, all that

gobbledygook cancels out, leaving us with a simply equation containing

only $Y_j$ and $X_j$:







All that's left is to cancel out the $X_j$ term to get our

error-magnitude $Y_j$:







### Putting it all together

Once we've figured out the error-locator polynomial $\Lambda(x)$, the

error-evaluator polynomial $\Omega(x)$, and the derivative of the

error-locator polynomial $\Lambda'(x)$, we get to the fun part, fixing

the errors!

For each location $j$ in the malformed codeword $C'(x)$, calculate the

error-location $X_j = g^j$ and plug its inverse $X_j^{-1}$ into the

error-locator $\Lambda(x)$. If $\Lambda(X_j^{-1}) = 0$ we've found the

location of an error!







To fix the error, plug the error-location $X_j$ and its inverse

$X_j^{-1}$ into Forney's algorithm to find the error-magnitude

$Y_j$. Xor $Y_j$ into the codeword to fix this error!







Repeat for all errors in the malformed codeword $C'(x)$, and with any

luck we'll find the original codeword $C(x)$!







Unfortunately we're not _quite_ done yet. All of this math assumed we had

$e \le \frac{n}{2}$ errors. If we had more errors, we might have just

made things worse.

It's worth recalculating the syndromes after fixing errors to see if we

did ended up with a valid codeword:







If the syndromes are all zero, chances are high we successfully repaired

our codeword.

Unless of course we had enough errors to end up overcorrecting to a

different codeword, but there's not much we can do in that case. No

error-correction is perfect.

This is all implemented in `ramrsbd_read`, if you're curious what it

looks like in code.

## Tricks

Heavy math aside, there are a couple minor implementation tricks worth

noting:

1. Truncate the generator polynomial to `ecc_size`.

   Calculating the generator polynomial for $n$ bytes of ECC gives us a

   polynomial with $n+1$ terms. This is a bit annoying since `ecc_size`

   is often a power-of-two:

   


   

   


   Fortunately, because of math reasons, the first term will always be 1.

   So, just like with our CRC polynomials, we can leave off the leading 1

   and make it implicit.

   Division with an implicit 1 is implemented in `ramrsbd_gf_p_divmod1`,

   which has the extra benefit of being able to skip the normalization

   step during [synthetic division][w-synthetic-division], so that's

   nice.

2. Store the generator polynomial in ROM.

   We don't really need to recompute the generator polynomial every time

   we initialize the block device, it's just convenient API-wise when we

   don't know `ecc_size`.

   If you only need to support a fixed set of block device geometries,

   precomputing and storing the generator polynomial in ROM will save a

   couple (`ecc_size`) bytes of RAM.

   [rs-poly.py][rs-poly.py] can help generate the generator polynomial:

   ``` bash

   $ ./rs-poly.py 8

   // generator polynomial for ecc_size=8

   //

   // P(x) = prod_i^n-1 (x - g^i)

   //

   static const uint8_t RAMRSBD_P[8] = {

       0xff, 0x0b, 0x51, 0x36, 0xef, 0xad, 0xc8, 0x18,

   };

   ```

   Which can then be provided to ramrsbd's `p` config option to avoid

   allocating the `p_buffer` in RAM.

   Unfortunately we still pay the code cost for generating the generator

   polynomial, which is difficult to avoid with the current API.

3. Minimizing the number of polynomial buffers.

   We have quite a few polynomials flying around:

   - $C(x)$ - Codeword buffer - `code_size` bytes

   - $P(x)$ Generator polynomial - `ecc_size` bytes (truncated)

   - $S_i$ - Syndrome buffer - `ecc_size` bytes

   - $\Lambda(x)$ - Error-locator polynomial - `ecc_size/2+1` (`<=ecc_size`) bytes

   - $C(i)$ - Connection polynomial - `ecc_size/2+1` (`<=ecc_size`) bytes

   - $\Omega(x)$ - Error-evaluator polynomial - `ecc_size/2` (`<=ecc_size`) bytes

   - $\Lambda'(x)$ - Derivative of the error-locator polynomial - `ecc_size/2` (`<=ecc_size`) bytes

   These get a bit annoying in a malloc-less system.

   Fortunately, there are a couple places we can reuse buffers:

   1. The connection polynomial $C(i)$ is only needed for

      Berlekamp-Massey and we can throw it away as soon as the

      error-locator $\Lambda(x)$ is found.

   2. We only need the syndrome buffer $S_i$ to find $\Lambda(x)$,

      $\Omega(x)$, and $\Lambda'(x)$. After that we have everything we

      need to start fixing errors.

   By sharing these buffers with polynomials computed later, such as the

   error-evaluator $\Omega(x)$ and derivative of the error-locator

   $\Lambda'(x)$, we can reduce the total number of buffers needed down

   to 1 `code_size` buffer and 4 `ecc_size` buffers (3 if the generator

   polynomial is stored in ROM).

4. Fused derivative evaluation.

   The formal derivative is a funny operation:

   


   

   

   

   

   

   


   Unlike other steps, it's a simple transformation that only affects one

   term at a time. This makes it easy to apply lazily without needing to

   copy the original polynomial.

   We already need to keep the error-locator polynomial $\Lambda(x)$

   around to, you know, locate the errors. So if we merge the derivative

   and evaluation of the error-locator into a single operation, we don't

   actually need to store the derivative of the error-locator

   $\Lambda'(x)$ as a separate polynomial. In theory reducing the number

   of buffers needed during error-evaluation from 3 down to 2.

   This sort of fused derivative evaluation is implemented in

   `ramrsbd_gf_p_deval` via a modified [Horner's method][w-horner].

   Unfortunately, we still need at least 3 buffers for Berlekamp-Massey

   ( $S_i$, $\Lambda(i)$, and $C(i)$ ), so this doesn't actually save us

   anything.

   But it doesn't really cost us anything either, cleans up the code a

   little bit, and lets us avoid clobbering the syndrome buffer $S_i$

   which is useful for debugging.

## Caveats

And some caveats:

1. For any error-correcting code, attempting to **correct** errors

   reduces the code's ability to **detect** errors.

   In Reed-Solomon's case, for $n$ bytes of ECC, we can detect up to

   $n$ byte-errors, but only correct up to

   $\left\lfloor\frac{n}{2}\right\rfloor$ byte-errors. Attempting to

   correct more errors can cause us to end up with a valid, but wrong,

   codeword.

   In practice this isn't that big of a problem. Fewer byte-errors are

   more common, and correcting byte-errors is usually more useful. At

   $n+1$ byte-errors you're going to end up with undetectable errors

   anyways.

   Still, it's good to be aware of this tradeoff.

   ramrsbd's `error_correction` config option lets you control exactly

   how many byte-errors to attempt to repair in case better detection is

   more useful.

2. Limited to 255 byte codewords - the non-zero elements of GF(256).

   An important step in Reed-Solomon is mapping each possible error

   location to a non-zero element of our finite field $X_j=g^j$.

   Unfortunately our finite-field is, well, finite, so there's only so

   many non-zero elements we can use before error-locations start to

   alias.

   This gives us a maximum codeword size of 255 bytes in GF(256),

   including the bytes used for ECC. A bit annoying, but math is math.

   In theory you can increase the maximum codeword size by using a larger

   finite-field, but this gets a bit tricky because the log/pow table

   approach used in ramrsbd stops being practical. 512 bytes of tables

   for GF(256) is fine, but 128 KiBs of tables for GF(2^16)? Not so

   much...

   1. If you have [carryless-multiplication][w-clmul] hardware available,

      GF(2^n) multiplication can be implemented efficiently by combining

      multiplication and [Barret reduction][w-barret-reduction].

      Division can then be implemented on top of multiplication by

      leveraging the fact that $a^{2^n-2} = a^{-1}$ for any element $a$

      in GF(2^n). [Binary exponentiation][w-binary-exponentiation] can

      make this somewhat efficient.

   2. In the same way GF(256) is defined as an

      [extension field][w-extension-field] of GF(2), we can define

      GF(2^16) as an extension field of GF(256), where each element is a

      2 byte polynomial containing digits in GF(256).

      This can be convenient if you already need GF(256) tables for other

      parts of the codebase.

   Or, a simpler alternative, you can just pack multiple "physical"

   codewords into one "logical" codeword.

   You could even consider interleaving the physical codewords if you

   want to maintain the systematic encoding or are trying to protect

   against specific error patterns.

3. Support for known-location "erasures" left as an exercise for the

   reader.

   All of the above math assumes we don't know the location of errors,

   which is the most common case for block devices.

   But it turns out if we _do_ know the location of errors, via parity

   bits or some other side-channel, we can do quite a bit better. We

   usually call these known-location errors "erasures".

   With Reed-Solomon, each unknown-location error requires 2 bytes of ECC

   to find and repair, while known-location erasures require only 1 byte

   of ECC to repair. You can even mix and match $e$ errors and $f$

   erasures as long as you have $n$ bytes of ECC such that:

   


   

   


   This isn't implemented in ramrsbd, but, _in theory_, the math isn't

   too difficult to extend.

   First note we can split $\Lambda(x)$ into a separate error-locator

   poylnomial $\Lambda_E(x)$ and erasure-locator polynomial

   $\Lambda_F(x)$:

   


   

   


   We know the location of the known-location erasures, so the

   erasure-locator $\Lambda_F(x)$ is trivial to calculate:

   


   

   


   Before we can find the error-locator polynomial $\Lambda_E(x)$, we

   need to modify our syndromes the hide the effects of the

   erasure-locator polynomial. These are often called the Forney

   syndromes $S_{Fi}$:

   


   

   


   Note that the Forney syndromes $S_{Fi}$ sill satisfy the equation for

   $\Omega(x)$:

   


   

   


   We can then use Berlekamp-Massey with the Forney syndromes $S_{Fi}$ to

   find the error-locator polynomial $\Lambda_E(x)$.

   Combining the error-locator polynomial $\Lambda_E(x)$ and the

   erasure-locator polynomial $\Lambda_F(x)$ gives us the creatively

   named error-and-erasure-locator-polynomial $\Lambda(x)$, which

   contains everything we need to know to find the location of both

   errors and erasures:

   


   

   


   At this point we can continue Reed-Solomon as normal, finding the

   error/easures locations where $\Lambda(X_j^{-1})=0$, and repairing

   them with Forney's algorithm,

   $Y_j = X_j \frac{\Omega(X_j^{-1})}{\Lambda'(X_j^{-1})}$:

   


   

   


## References

- [Massey, J. L. - Shift-register synthesis and BCH decoding][massey-srsbchd]

- [Gill, J. - EE387 Notes #7][gill-notes7]

- [Truong, T. K., Hsu, I. S., Eastman, W. L., Reed, I. S. - A Simplified Procedure for Correcting Both Errors and Erasures of a Reed-Solomon Code Using the Euclidean Algorithm][truong-spcbeerscuea]

- [Wikiversity - Reed Solomon for coders][wikiversity-rs]

- [Wikipedia - Reed Solomon][w-rs]

- [Wikipedia - Berlekamp-Massey algorithm][w-bm]

- [Wikipedia - Forney Algorithm][w-forney]

- [Wikipedia - BCH Code][w-bch]

- [Wikipedia - Finite field arithmetic][w-gf]

- [Wikipedia - GF(2)][w-gf2]

- [Wikipedia - Systematic Code][w-systematic-code]

- [Wikipedia - Primitive element (finite field)][w-generator]

- [Wikipedia - Linear-feedback shift register (LFSR)][w-lfsr]

- [Wikipedia - Recurrence Relation][w-recurrence-relation]

- [Wikipedia - Chien search][w-chien]

- [Wikipedia - Formal derivative][w-formal-derivative]

- [Wikipedia - Geometric series][w-geometric-series]

- [Wikipedia - Product rule][w-product-rule]

- [Wikipedia - Synthetic division][w-synthetic-division]

- [Wikipedia - Horner's method][w-horner]

- [Wikipedia - Carry-less product][w-clmul]

- [Wikipedia - Barret reduction][w-barret-reduction]

- [Wikipedia - Exponentiation by squaring][w-binary-exponentiation]

- [Wikipedia - Field extension][w-extension-field]

[massey-srsbchd]: http://crypto.stanford.edu/~mironov/cs359/massey.pdf

[gill-notes7]: https://web.archive.org/web/20140630172526/http://web.stanford.edu/class/ee387/handouts/notes7.pdf

[truong-spcbeerscuea]: https://ntrs.nasa.gov/api/citations/19880003316/downloads/19880003316.pdf

[wikiversity-rs]: https://en.wikiversity.org/wiki/Reed%E2%80%93Solomon_codes_for_coders

[w-rs]: https://en.wikipedia.org/wiki/Reed%E2%80%93Solomon_error_correction

[w-bm]: https://en.wikipedia.org/wiki/Berlekamp%E2%80%93Massey_algorithm

[w-forney]: https://en.wikipedia.org/wiki/Forney_algorithm

[w-bch]: https://en.wikipedia.org/wiki/BCH_code

[w-pgz]: https://en.wikipedia.org/wiki/Reed%E2%80%93Solomon_error_correction#Peterson%E2%80%93Gorenstein%E2%80%93Zierler_decoder

[w-euclidean]: https://en.wikipedia.org/wiki/Reed%E2%80%93Solomon_error_correction#Euclidean_decoder

[w-gf]: https://en.wikipedia.org/wiki/Finite_field_arithmetic

[w-gf2]: https://en.wikipedia.org/wiki/GF(2)

[w-gf256]: https://en.wikipedia.org/wiki/Finite_field_arithmetic#Effective_polynomial_representation

[w-systematic-code]: https://en.wikipedia.org/wiki/Systematic_code

[w-generator]: https://en.wikipedia.org/wiki/Primitive_element_(finite_field)

[w-lfsr]: https://en.wikipedia.org/wiki/Linear-feedback_shift_register

[w-recurrence-relation]: https://en.wikipedia.org/wiki/Recurrence_relation

[w-chien]: https://en.wikipedia.org/wiki/Chien_search

[w-formal-derivative]: https://en.wikipedia.org/wiki/Formal_derivative

[w-geometric-series]: https://en.wikipedia.org/wiki/Geometric_series

[w-product-rule]: https://en.wikipedia.org/wiki/Product_rule

[w-synthetic-division]: https://en.wikipedia.org/wiki/Synthetic_division

[w-horner]: https://en.wikipedia.org/wiki/Horner%27s_method

[w-clmul]: https://en.wikipedia.org/wiki/Carry-less_product

[w-barret-reduction]: https://en.wikipedia.org/wiki/Barrett_reduction

[w-binary-exponentiation]: https://en.wikipedia.org/wiki/Exponentiation_by_squaring

[w-extension-field]: https://en.wikipedia.org/wiki/Field_extension

[littlefs]: https://github.com/littlefs-project/littlefs

[ramcrc32bd]: https://github.com/geky/ramcrc32bd

[bm-lfsr-solver.py]: bm-lfsr-solver.py

[bm-lfsr256-solver.py]: bm-lfsr256-solver.py

[rs-poly.py]: rs-poly.py