Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.
Awesome Lists | Featured Topics | Projects
https://github.com/geky/ramcrc32bd

Last synced: 1 day ago
JSON representation
Host: GitHub
URL: https://github.com/geky/ramcrc32bd
Owner: geky
License: bsd-3-clause
Created: 2024-10-08T15:45:04.000Z (about 1 month ago)
Default Branch: master
Last Pushed: 2024-10-23T18:30:22.000Z (28 days ago)
Last Synced: 2024-10-24T04:08:22.462Z (28 days ago)
Language: C
Size: 50.8 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE.md
Awesome Lists containing this project

README

        ## ramcrc32bd

An example of a CRC-32 based error-correcting block device backed by RAM.

```

corrupted:

        '         ..:::::::::::.:.                  .:::....:::.. :.'  ''::..:':: '.:.::'.: '':.

               .:::::::::::::::::::..           .   ::'::::::::...:::.: :'.'::: .' ..:.: :.'  '.

            '.::::::::::::::::::::::::.          '  ::::::::::' . .' ': .  :'' . ': .: :. ..:::

           .:::::::::::::::::::::::::::.    .    . ::::':::::   ' . ''.:. ..:. : ::.':.'::. ....

   .      .::::::::::::::::::::::::::::::    ... :: :'::::::'    :.:....:..:  .  .   ' '':.:   :

          :::::::::::::::::::::::::.:::'' .. '::: '     '''     :..'::'.'''.:'. :''::. ' '  .'..

         ::::::::::::'::::::::::::::''..:::::''                 : .'' ::'. : .:'.''  .'' ' .. ..

         :::::::::::::::::::::::'' ..:::::''                  . ''' :'.: :'.:. :': .':'..': :.'.

         :::::':::::::::::::'' ..:::::'' .'                    '' .::.   . . :: .''::'.  '.. ...

     ..:'::::::::::::::''' ..:::::'' ...::                      '' ::..':' :'  .....:'' :'.  ::.

  . :::'  :::::::::'  ..::::::': ..:::::::        .              '' .::  :. '.: .''': :: .:.'' :

 :::'     ':::'' ...:.::::':...::::::::::                       ': :.:':: ..: .'' : ::..: .'' :

::'         ...::::::'' ..:::::':':::::'               '      . . .:' :  ''':':'.:..' :'::':.'':

     ....::::::::' . ::::::::::.::::::'     .                   ::.: :' . '.:. .. .' ...'.:':.'

'::::::::'''    :::::::::::::::::::''    '                      . :: ::' :' ':' .'...'':  ' ':':

                  '':::::::::::''' .   .       '          '     ::::. '.'. :::   . . '.' '  '..'

```

```

corrected:

                  ..:::::::::::...                  .:::....:::.. :.'  '':: .:::: '.:.:::.: '':.

               .:::::::::::::::::::..               ::::::::::::..:::.: :'.':': .' ..:.: :.'  '.

             .::::::::::::::::::::::::.             ::::::::::' . .' ': .  :'' . ': .: :. ..:::

           .:::::::::::::::::::::::::::.         . ::::::::::   ' . ''.:. ..:. : ::.':.'::. ....

          .::::::::::::::::::::::::::::::    ... :: :'::::::'    :.:....:..:  .  .   ' '':.:   :

          :::::::::::::::::::::::::::::'' .. '::: '     '''     :..'::'.'''.:'..:''::. ' '  .'..

         :::::::::::::::::::::::::::''..::::: '                 : .'' ::'. : .::.''  .'' ' .. ..

         :::::::::::::::::::::::'' ..:::::''                    ''' :'.: ''.:. :': :':'..': :.'.

         :::::::::::::::::::'' ..:::::'' .                      ' .::.   . . :: .''::'.  '.. ...

     ..: ::::::::::::::''' ..:::::'' ..:::                      '' ::..':' :'  .....:'' :'.  ::.

  ..:::'  :::::::::'' ..::::::'' ..:::::::                       '' .::  :. '.: .''': :: .:.'' :

 :::'     ':::'' ...::::::''...::::::::::                       ': ' :':: ..: .'' : ::..: .'' :

::'         ...::::::'' ..:::::::::::::'                        . .:' :  ''':':'.:..' :'::':.'':

     ....:::::::'' ..:::::::::::::::::'                         ::.: :' . '.:. .. .' ...'.:':.'

'::::::::'''    :::::::::::::::::::''                           . :: ::' :' ':' .'...'':  ' ':':

                  '':::::::::::'''                              :::'. '.'. :::   . . '.' '  '..'

```

Often overlooked, the humble [CRC][w-crc] can already provide a simple

form of error detection and correction, capable of repairing a handful of

bit-errors.

Assuming a Hamming distance `HD` for a given codeword size, ramcrc32bd

can correct up to `floor((HD-1)/2)` bit errors. In its current

configuration, ramcrc32bd can correct:

- 1 bit error up to ~512 KiB codewords  (HD=3 up to 4294967263 bits)

- 2 bit errors up to 371 byte codewords (HD=5 up to 2974 bits)

- 3 bit errors up to 21 byte codewords  (HD=7 up to 171 bits)

It does scale poorly, $O(n^e)$, but if you're only worried about the

occasional one or two bit errors, it may be sufficient. It's hard to

beat the simplicity, low-cost, and hardware availability of CRCs.

This block device uses littlefs's CRC-32, since we assume it's already

available. But the same idea can be extended to any other CRC, as long

as it has a sufficient [Hamming distance](#hamming-distance) for the

desired codeword size.

A quick comparison of current ram-ecc-bds:

|            | code   | tables | stack | buffers  | runtime                  |

|:-----------|-------:|-------:|------:|---------:|-------------------------:|

| ramcrc32bd |  940 B |   64 B |  88 B |      0 B |      $O\left(n^e\right)$ |

| ramrsbd    | 1506 B |  512 B | 128 B | n + 4e B | $O\left(ne + e^2\right)$ |

See also:

- [littlefs][littlefs]

- [ramrsbd][ramrsbd]

## RAM?

Right now, [littlefs's][littlefs] block device API is limited in terms of

composability. While it would be great to fix this on a major API change,

in the meantime, a RAM-backed block device provides a simple example of

error-correction that users may be able to reimplement in their own

block devices.

## Testing

Testing is a bit jank right now, relying on littlefs's test runner:

``` bash

$ git clone https://github.com/littlefs-project/littlefs -b v2.9.3 --depth 1

$ make test -j

```

## How it works

First, a quick primer on [CRCs][w-crc].

Some of why CRCs are so prevalent because they are mathematically quite

pure. You view your message as a big [binary polynomial][w-polynomial-ring],

divide it by a predetermined "generator polynomial" (choosing a good

polynomial is the hard part), and the remainder is your CRC:

```

message = "hi!":

    = 01101000 01101001 00100001

polynomial = 0x107:

    = 1 00000111

binary division:

    = 01101000 01101001 00100001 00000000

    ^  1000001 11

    = 00101001 10101001 00100001 00000000

    ^   100000 111

    = 00001001 01001001 00100001 00000000

    ^     1000 00111

    = 00000001 01110001 00100001 00000000

    ^        1 00000111

    = 00000000 01110110 00100001 00000000

    ^           1000001 11

    = 00000000 00110111 11100001 00000000

    ^            100000 111

    = 00000000 00010111 00000001 00000000

    ^             10000 0111

    = 00000000 00000111 01110001 00000000

    ^               100 000111

    = 00000000 00000011 01101101 00000000

    ^                10 0000111

    = 00000000 00000001 01100011 00000000

    ^                 1 00000111

    = 00000000 00000000 01100100 00000000

    ^                    1000001 11

    = 00000000 00000000 00100101 11000000

    ^                     100000 111

    = 00000000 00000000 00000101 00100000

    ^                        100 000111

    = 00000000 00000000 00000001 00111100

    ^                          1 00000111

    -------------------------------------

    = 00000000 00000000 00000000 00111011

                                 '--.---'

        .---------------------------'

        v

crc = 0x3b

```

You can describe this mathematically in [GF(2)][w-gf2], but depending on

your experience with GF(2) and other finite-fields, the above example may

be easier to understand:







The extra $x^{|P|}$ multiplications represent shifting the message to

make space for the CRC, and gives us what's called a

["systematic code"][w-systematic-code]. Alternatively we could actually

multiply the message with our polynomial to get valid codewords, but that

would just make interacting with the message more annoying without much

benefit...

The neat thing is that this remainder operation does a real good job of

mixing up all the bits. So if you choose a good CRC polynomial, it's very

unlikely a message with a bit-error will result in the same CRC:

```

a couple 1-bit errors:

    = 01101010 01101001 00100001 00000000 => 11101101 (0xed != 0x3b)

    = 01101000 01101000 00100001 00000000 => 00101110 (0x2e != 0x3b)

    = 01101000 01101001 01100001 00000000 => 11111100 (0xfc != 0x3b)

    = 01101000 01101001 00100001 00001000 => 00110011 (0x33 != 0x3b)

```

### Hamming distance

How unlikely? Well thanks to [Philip Koopman's exhaustive CRC work][koopman-crc],

we know exactly how many bit-errors we need to see a collision for a

given CRC polynomial and message size. This is called the

[Hamming distance][w-hd], and is a very useful metric for an

error-correcting code.

Koopman's [NOTES page][koopman-notes] may be the best starting point for

interpreting these results. They're dense but an amazing resource!

For this 8-bit CRC, [p=0x107][koopman-p=0x107], Philip Koopman's work

shows a Hamming distance of 4 up to a message size of

119 bits (14 bytes), which means our 3-byte message should have no

collisions up until 4 bit-errors:

```

a 4-bit collision:

    = 01101000 01101000 00100110 00000000 => 00111011 (0x3b == 0x3b)

```

But the interesting thing about Hamming distance is that it's, well, a

distance.

A Hamming distance of 4 means that there are at least 3 invalid codewords

between every valid codeword:

```

                      Hamming distance = 4

.-----------------------------'-----------------------------.

o <-bit-flip-> x <-bit-flip-> x <-bit-flip-> x <-bit-flip-> o

^              '--------------.--------------'              ^

|                             |                             |

valid codeword         invalid codewords       valid codeword

```

If we assume our message has a single bit-error, it will be 1 bit-flip

away from the original codeword, and at least 3 bit-flips away from any

other codeword. It's not until we have 2 bit-errors that the original

codeword becomes ambiguous.

But this is only an 8-bit CRC. With more bits, we can usually find a

better CRC. littlefs's 32-bit CRC, [p=0x104c11db7][koopman-p=0x104c11db7],

for example, has a Hamming distance of 7 up to 171 bits (21 bytes), which

means for any message $\le$ 21 bytes we can reliably correct up to

3 bit-errors.

In general the number of bits we we can reliably correct is

$\left\lfloor\frac{\text{HD}-1}{2}\right\rfloor$.

### There's always brute force

Ok, but that's enough about theory. How do actually correct these

bit-errors?

The simple/naive/cheap answer is brute force. Try every bit-flip until we

find a matching CRC. Since we know our Hamming distance is $\ge$ 3, this

should only ever find one valid codeword, the original codeword:

```

brute force search:

    = 01101000 01101001 01100001 00000000 => 11111100 (0xfc != 0x3b)

    ^ 1                                   => 11110111 (0xf7 != 0x3b)

    ^  1                                  => 01111010 (0x7a != 0x3b)

    ^   1                                 => 10111111 (0xbf != 0x3b)

    ^    1                                => 01011110 (0x5e != 0x3b)

    ^     1                               => 10101101 (0xad != 0x3b)

    ^      1                              => 01010111 (0x57 != 0x3b)

    ^       1                             => 00101010 (0x2a != 0x3b)

    ^        1                            => 10010111 (0x97 != 0x3b)

    ^          1                          => 01001010 (0x4a != 0x3b)

    ^           1                         => 10100111 (0xa7 != 0x3b)

    ^            1                        => 01010010 (0x52 != 0x3b)

    ^             1                       => 10101011 (0xab != 0x3b)

    ^              1                      => 01010100 (0x54 != 0x3b)

    ^               1                     => 10101000 (0xa8 != 0x3b)

    ^                1                    => 11010110 (0xd6 != 0x3b)

    ^                 1                   => 11101001 (0xe9 != 0x3b)

    ^                   1                 => 01110101 (0x75 != 0x3b)

    ^                    1                => 00111011 (0x3b == 0x3b) !!! found our bit-error

corrected message:

    = 01101000 01101001 00100001 00000000 => 00111011 (0x3b == 0x3b)

```

If we don't find a valid codeword, we must have had at least 2

bit-errors, making our original codeword unrecoverable.

This idea can be extended to CRCs with larger Hamming distances by brute

force searching multiple bit-errors with nested loops. See

`ramcrc32bd_read` for an example of up to 3 bit-errors with littlefs's

CRC-32.

## Tricks

There are a couple implementation tricks worth noting in ramcrc32bd:

1. Try the faster solutions first.

   Correcting 1 bit-error $O(n)$, is much faster than correcting

   2 bit-errors $O(n^2)$, and 1 bit-errors are also much more common. It

   makes sense to only search for more bit-errors when a solution with

   fewer bit-errors can't be found.

   By trying fewer bit-errors first, ramcrc32bd should return quickly in

   the common case of few/no bit-errors.

   Though this does risk degraded performance over time as bit-errors

   develop.

2. We don't actually need to permute the message to try every bit-flip.

   First note that since CRCs are a glorified remainder operation,

   shifting a message (multiplying by $x$ in GF(2)) and then calculating

   the CRC is equivalent to shifting the CRC and then calculating the

   remainder:

   ```

   crc(a << 1):

       = 00111001 10110100 00110110 00000000 => 01000010 (0x42)

       s 01110011 01101000 01101100 00000000 => 10000100 (0x84)

       s 11100110 11010000 11011000 00000000 => 00001111 (0x0f)

   (crc(a) << 1) % p:

       = 00111001 10110100 00110110 00000000 =>     01000010 (0x42)

                                                s 0 10000100

                                                ^ 0 00000000

                                                =   10000100 (0x84)

                                                s 1 00001000

                                                ^ 1 00000111

                                                =   00001111 (0x0f)

   ```

   We can use this to quickly iterate through all CRCs that represent a

   single bit:

   ```

   a = (a << 1) % p:

       =                            00000001 => 00000001 (0x01)

       s                            00000010 => 00000010 (0x02)

       s                            00000100 => 00000100 (0x04)

       s                            00001000 => 00001000 (0x08)

       s                            00010000 => 00010000 (0x10)

       s                            00100000 => 00100000 (0x20)

       s                            01000000 => 01000000 (0x40)

       s                            10000000 => 10000000 (0x80)

       s                         (1)00000000 => 00001110 (0x0e)

       s                        (1) 00000000 => 00011100 (0x1c)

       s                       (1)  00000000 => 00111000 (0x38)

       s                      (1)   00000000 => 01110000 (0x70)

       s                     (1)    00000000 => 11100000 (0xe0)

       s                    (1)     00000000 => 11000111 (0xc7)

       s                   (1)      00000000 => 10001001 (0x89)

       s                  (1)       00000000 => 00010101 (0x15)

   ```

   Combining this with the fact that CRCs are linear, i.e. the CRC of the

   xor of two messages (addition in GF(2)) is equivalent to the xor of

   two CRCs:

   ```

   crc(a ^ b):

       = 01100001 01100100 01100100 00000000

       ^ 01111000 01101111 01110010 00000000

       = 00011001 00001011 00010110 00000000 => 01101101 (0x6d)

   crc(a) ^ crc(b):

       = 01100001 01100100 01100100 00000000 =>   00110100 (0x34)

       ^ 01111000 01101111 01110010 00000000 => ^ 01011001 (0x59)

                                                = 01101101 (0x6d)

   ```

   And we can quickly test the affect of every possible bit-flip by

   shifting a single register per simulated bit-flip and xoring it

   into our original CRC:

   ```

   fancy brute force search:

       = 01101000 01101001 01100001 00000000 => 11111100 (0xfc != 0x3b)

       ^                            00000001 => 11111101 (0xfd != 0x3b)

       ^                            00000010 => 11111110 (0xfe != 0x3b)

       ^                            00000100 => 11111000 (0xf8 != 0x3b)

       ^                            00001000 => 11110100 (0xf4 != 0x3b)

       ^                            00010000 => 11101100 (0xec != 0x3b)

       ^                            00100000 => 11011100 (0xdc != 0x3b)

       ^                            01000000 => 10111100 (0xbc != 0x3b)

       ^                            10000000 => 01111100 (0x7c != 0x3b)

       ^                         (1)00000111 => 11111011 (0xfb != 0x3b)

       ^                        (1) 00001110 => 11110010 (0xf2 != 0x3b)

       ^                       (1)  00011100 => 11100000 (0xe0 != 0x3b)

       ^                      (1)   00111000 => 11000100 (0xc4 != 0x3b)

       ^                     (1)    01110000 => 10001100 (0x8c != 0x3b)

       ^                    (1)     11100000 => 00011100 (0x1c != 0x3b)

       ^                   (1)      11000111 => 00111011 (0x3b == 0x3b) !!! found our bit-error

   corrected message:

       = 01101000 01101001 00100001 00000000 => 00111011 (0x3b == 0x3b)

   ```

   The end result is still $O(n^e)$, but limited only by your CPU's

   shift, xor, and branching hardware. No memory accesses required.

   See `ramcrc32bd_read` for an implementation of this.

## Caveats

And some caveats:

1. For any error-correcting code, attempting to **correct** errors

   reduces the code's ability to **detect** errors.

   In the HD=4 example, we assumed 1 bit-error. If we were wrong and

   there were actually 3 bit-errors, we would have "corrected" to the

   wrong codeword.

   In practice this isn't that big of a problem. Fewer bit-errors are

   usually more common, and correcting bit-errors is usually more useful.

   At 4 bit-errors you're going to end up with full collisions anyways.

   Still, it's good to be aware of this tradeoff.

   ramcrc32bd's `error_correction` config option lets you control exactly

   how many bit-errors to attempt to repair in case better detection is

   more useful.

2. Brute force doesn't really scale.

   The error-correction implemented here grows $O(n^e)$ for $e$

   bit-errors, which really isn't great.

   That being said, larger CRC Hamming distances are also pretty limited

   in terms of message size, so this performance may be excusable if

   messages are small and bit-errors are rare.

   ramcrc32bd's `error_correction` config option can also help here by

   limiting how many bit-errors we attempt to repair. If you set

   `error_correction=1`, for example, the runtime reduces to $O(n)$ worst

   case, which is roughly the same runtime it takes to read the data from

   the underlying storage.

   But if you need a performant error-correcting block device, consider

   ramcrc32bd's big brother, [ramrsbd][ramrsbd], which brings the

   decoding cost down to $O(ne + e^2)$.

## References

- [Koopman, P. - CRC Polynomial Zoo][koopman-crc]

- [Koopman, P. - CRC Polynomial Zoo NOTES][koopman-notes]

- [Wikipedia - Cyclic Redundancy Check (CRC)][w-crc]

- [Wikipedia - Hamming Distance (HD)][w-hd]

- [Wikipedia - Polynomial Ring][w-polynomial-ring]

- [Wikipedia - GF(2)][w-gf2]

- [Wikipedia - Systematic Code][w-systematic-code]

[w-crc]: https://en.wikipedia.org/wiki/Cyclic_redundancy_check

[w-hd]: https://en.wikipedia.org/wiki/Hamming_distance

[w-polynomial-ring]: https://en.wikipedia.org/wiki/Polynomial_ring

[w-gf2]: https://en.wikipedia.org/wiki/GF(2)

[w-systematic-code]: https://en.wikipedia.org/wiki/Systematic_code

[koopman-crc]: https://users.ece.cmu.edu/~koopman/crc

[koopman-notes]: https://users.ece.cmu.edu/~koopman/crc/notes.html

[koopman-p=0x107]: https://users.ece.cmu.edu/~koopman/crc/c08/0x83_len.txt

[koopman-p=0x104c11db7]: https://users.ece.cmu.edu/~koopman/crc/c32/0x82608edb_len.txt

[littlefs]: https://github.com/littlefs-project/littlefs

[ramrsbd]: https://github.com/geky/ramrsbd