Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/afnanenayet/libgalois

The Galois field arithmetic library from University of Tennessee adapted to work with Modern C++ and CMake
https://github.com/afnanenayet/libgalois

arithmetic cmake cpp field galois modern

Last synced: 19 days ago
JSON representation

The Galois field arithmetic library from University of Tennessee adapted to work with Modern C++ and CMake

Awesome Lists containing this project

README

        

Fast Galois Field Arithmetic Library in C/C++

Fast Galois Field Arithmetic Library in C/C++


James S. Plank


[email protected]


http://www.cs.utk.edu/~plank



Technical Report UT-CS-07-593

Department of Computer Science
University of Tennessee


The online home for this document is:


http://www.cs.utk.edu/~plank/plank/papers/CS-07-593




If You Use This Code

Please send me email ([email protected])
and let me know. One of the ways in which I am evaluated is
the impact of my work, and hard data on how many people use this code is
important.

Acknowledgements

This web site is based upon work supported by the National Science Foundation
under Grants CNS-0615221 and CNS-0437508. I would also like to thank Cheng
Huang, Lihao Xu, Jin Li and Mochan Shrestha for helpful discussions.



Introduction

Galois Field arithmetic is fundamental to many applications, especially
Reed-Solomon coding. With a Galois Field GF(2w),
addition, subtraction, multiplication and division operations are defined
over the numbers 0, 1, ..., 2w-1. in such a way that:



  • They are closed -- if a and b are elements of the field,
    then so are (a+b), (a-b), (a*b) and (a/b).




  • They adhere to all the well-known properties of addition/subtraction/multiplication/division.
    For example, addition and multiplication are commutative; addition, multiplication and
    division are all associative; addition/multiplication are distributive; etc.




  • Every element has a multiplicative inverse. This is the most important property.
    It means that if a is an element of the field and a ≠ 0, then
    there exists an element b that is also an element of the field such that ab = 1.



When w = 8, the Galois Field GF(28) comprises the elements
0, 1, ..., 255. This is an important field because it allows you to perform arithmetic
that adheres to the above properties on single bytes.
Again, this is essential in Reed-Solomon coding
and other kinds of coding. Similarly, w=16 and w=32 are other important
fields.


There are useful applications for fields with other values of w. For example,
Cauchy Reed-Solomon coding employs Galois Fields with other values of w, and
converts them to strictly binary (XOR) codes.


Galois Fields are covered in standard texts on Error Correcting Codes such as
Peterson & Weldon [PW72],
MacWilliams and Sloane [MS77],
and Van Lint [VL82]. These treatments are thorough, and take
a bit of time to understand. For a briefer and more pragmatic
treatment, see Plank's 1997 tutorial on Reed-Solomon coding [P97].



The Library Files

This library comes as a tar file: galois.tar.
The files that compose galois.tar are:



Using the Command Line Tools

Type make to compile the library and the command line tools. Testing the
command line tools is a nice way to see how things work with Galois Fields. First,
addition and subtraction in a Galois Field are equivalent -- they are equal to the
bitwise exclusive-or operation. The program gf_xor allows you to take the
bitwise exclusive-or of two numbers:


Unix-Prompt> gf_xor 15 7
8
Unix-Prompt> gf_xor 8 7
15
Unix-Prompt> gf_xor 230498 2738947
2772833
Unix-Prompt> gf_xor 2772833 230498
2738947
Unix-Prompt>


The program gf_mult takes three arguments: a,
b and w, and prints out the product of
a and b in GF(2w). The
program gf_div performs division, and the program
gf_inverse returns the multiplicative inverse of
a number in GF(2w):


Unix-Prompt> gf_mult 3 7 4
9
Unix-Prompt> gf_div 9 3 4
7
Unix-Prompt> gf_div 9 7 4
3
Unix-Prompt> gf_mult 1234567 2345678 32
1404360778
Unix-Prompt> gf_div 1404360778 1234567 32
2345678
Unix-Prompt> gf_div 1404360778 2345678 32
1234567
Unix-Prompt> gf_inverse 1404360778 32
106460795
Unix-Prompt> gf_mult 1404360778 106460795 32
1
Unix-Prompt>


The other command line tools are discussed at the end of this document.



Using the Library


The files galois.h and
galois.c implement a library
of procedures for Galois Field Arithmetic in GF(2w)
for w between 1 and 32. The library is written in C, but will work
in C++ as well.
It is especially tailored for w equal to
8, 16 and 32, but it is also applicable for any other value of w. For the
smaller values of w (where multiplication or logarithm tables fit into memory),
these procedures should be very fast.


In the following sections, we describe the procedures implemented by the library.


1. General purpose multiplication and division:
galois_single_multiply() and galois_single_divide()

The syntax of these two calls is:



int galois_single_multiply(int a, int b, int w);
int galois_single_divide(int a, int b, int w);


Galois_single_multiply()
returns the product of a and b in GF(2w),
and galois_single_divide()
returns the qoutient of a and b in GF(2w).
w may have any value from 1 to 32.


The decision to make this procedure use regular, signed integers instead of
unsigned integers was largely for convenience. It only makes a difference
when w equals 32, in which case the sign bit of a, b,
or the return values may be set. If it matters, simply convert the integers
to unsigned integers. The procedures in this library will work regardless --
when w equals 32, they are treated as streams of bits and not integers.


It is anticipated that most applications that need to perform single multiplications
and divisions only need reasonable performance, which is what these two procedures
give you. If you need faster multiplication and division, then see the
procedures below, which allow you to get much faster performance.

2. Multiplying a region of bytes by a single number in
GF(28),
GF(216) and
GF(232)

A common use of Galois Field arithmetic is multplying a region of bytes by a single
number. This is the basic operation of Reed-Solomon encoding and decoding.
This library provides the following three procedures for performing region multiplication:



void galois_w08_region_multiply(char *region, int multby, int nbytes, char *r2, int add);
void galois_w16_region_multiply(char *region, int multby, int nbytes, char *r2, int add);
void galois_w32_region_multiply(char *region, int multby, int nbytes, char *r2, int add);

These multiply the region of bytes specified by region and nbytes by the
number multby in the field specified by the procedure's name.
Region should be long-word aligned, otherwise these routines will generate a bus
error. There are three
separate functionalities of these procedures denoted by the values of r2 and add.



  1. If r2 is NULL, the bytes in region are replaced by their products
    with multby.

  2. If r2 is not NULL and add is zero, then the products are placed in
    the nbytes of memory starting with r2. The two regions should not overlap unless
    r2 is less than region.

  3. If r2 is not NULL and add is one, then the products are calculated
    and then XOR'd into existing bytes of r2.

The performance of these procedures has been tuned to be very fast. A multiplication
table is employed when w=8. Log and inverse log tables are employed with w=16,
and seven multiplication tables are employed when w=32.

3. XOR-ing a region of bytes

The following procedure allows you to perform the bitwise exclusive-or of a region
of bytes.



void galois_region_xor(char *r1, char *r2, char *r3, int nbytes);

R3 may equal either r1 or r2 if you wish to overwrite either of them.
Again, all pointers should be long-word aligned.



Advanced uses -- fast single multplications and divisions

While galois_single_multiply() and
galois_single_divide() are nice general-purpose tools, their generality makes
them slower than they need to be. For high-performance, you may want to employ their
underlying implementations, described below:


4. Using a multiplication table: galois_multtable_multiply() and galois_multtable_divide()

When w is small, the fastest way to perform multiplication and division is to
employ multiplication and division tables. These tables consume 2(2w+2)
bytes each, so they are only applicable when w is reasonably small. For example,
when w=8, this is 256 KB per table.


To use multiplication and division tables directly, use one or more of the following
routines:




int galois_create_mult_tables(int w);
int galois_multtable_multiply(int a, int b, int w);
int galois_multtable_divide(int a, int b, int w);
int *galois_get_mult_table(int w);
int *galois_get_div_table(int w);

Galois_create_mult_tables(w) creates multiplication and division tables for a
given value of w and stores them internally. If you call it twice with the
same value of w, it will not create new tables the second time. You may call
it with different values of w and the tables will be stored separately.


If successful, galois_create_mult_tables() will return 0. Otherwise, it will
return -1, and any allocated memory will be freed.


Galois_multtable_multiply() and galois_multtable_divide() work just like
galois_single_multiply() and galois_single_divide(), except they assume
that you have called galois_create_mult_tables() for the appropriate value of
w and that it was successful. They do not error-check, so if you have not
called galois_create_mult_tables(), they will seg-fault. This decision was made
for speed -- although for small values of w (between 1 and 9), galois_single_multiply() uses
galois_multtable_multiply(), it is significantly slower because of the error checking
that it does.


Finally, to free yourself of procedure call overhead, the routines galois_get_mult_table()
and galois_get_div_table() return the tables themselves. The product/quotient of a
and b is in element a*2w+b, which of course may be computed quickly via
bit arithmetic as ( (a << w) | b).
You do not need to call galois_create_mult_tables() before calling galois_get_mult_table()
or galois_get_div_table().


5. Using log/anti-log tables: galois_logtable_multiply() and galois_logtable_divide()

When multiplication tables cannot be employed, the next fastest way to
perform multiplication and division is to
use log and inverse log tables, as described in [P97].
The log table consumes
2(w+2) bytes and the inverse log table consumes
3*2(w+2) bytes,
which means that middling values of w may
be handled. For example, when w=16, this is 1 MB of tables.


To use the log tables, use one or more of the following routines:



int galois_create_log_tables(int w);
int galois_logtable_multiply(int a, int b, int w);
int galois_logtable_divide(int a, int b, int w);
int galois_log(int value, int w);
int galois_ilog(int value, int w);
int *galois_get_log_table(int w);
int *galois_get_ilog_table(int w);

Galois_create_log_tables(w) creates log and inverse log tables for the
given value of w, and stores them internally. If you call it twice with the
same value of w, it will not create new tables the second time. You may call
it with different values of w and the tables will be stored separately.


If successful, galois_create_log_tables() will return 0. Otherwise, it will
return -1, and any allocated memory will be freed.


Galois_logtable_multiply() and galois_logtable_divide() work just like
galois_single_multiply() and galois_single_divide(), except they assume
that you have called galois_create_log_tables() for the appropriate value of
w and that it was successful. They do not error-check, so if you have not
called galois_create_log_tables(), they will seg-fault. This decision was made
for speed -- although for medium values of w (between 10 and 22),
galois_single_multiply() uses
galois_logtable_multiply(), it is significantly slower because of the error checking
that it does.


Galois_log() and
galois_ilog() return the log and inverse log of an element of GF(2w).
You can use them to multiply using the following identity:



a * b = ilog[ (log[a] + log[b]) % ((1 << w)-1) ]

a / b = ilog[ (log[a] - log[b] + (1 << w)) % ((1 << w)-1) ]


The division identity takes into account C's weird definition of modular arithmetic
with negative numbers.


To perform the fastest multiplication and division with these tables, you should get access
to the tables themselves using galois_get_log_table(int w)
and galois_get_ilog_table(int w). Then you may calculate the product/quotient of a and b
as:



a * b = ilog [ log[a] + log[b] ]

a / b = ilog [ log[a] - log[b] ]


You do not have to worry about modular arithmetic because the ilog table contains three
copies of the inverse logs, and is defined for indices between -2w+1 and
22w-2. This saves a few instructions.


6. Shift-multiplication and slow division: galois_shift_multiply() and galois_shift_divide()

When tables are unusable, general-purpose multiplication and division is
implemented with the following two procedures:



int galois_shift_multiply(int a, int b, int w);
int galois_shift_divide(int a, int b, int w);

Galois_shift_multiply() converts b into a w * w bit matrix
and multiplies it by the bit vector a to create the product vector. You may
see a quasi-tutorial description of this technique in the paper
[P05].
It is significantly slower than the methods that use tables. However, it is general-purpose
and requires no preallocation of memory.


For division, Galois_shift_divide() also converts b into a
bit matrix, inverts it, and then multiplies the inverse by a. As such,
it is *really* slow. If I get a clue how to implement this one faster, I will.


Galois_single_multiply() uses galois_shift_multiply() for
w between 23 and 31.
Galois_single_divide() uses galois_shift_divide() for
w between 23 and 32.


7. The special case of w=32: galois_split_w8_multiply()

Finally, for w = 32 the following procedures are defined:



int galois_create_split_w8_tables();
int galois_split_w8_multiply(int a, int b);


Galois_create_split_w8_tables() creates seven tables that are 256 MB each.
Galois_split_w8_multiply() employs these tables to multiply the 32-bit numbers
by breaking them into four eight-bit numbers each, and then performing sixteen
multiplications and exclusive-ors to calculate the product. It's a cool technique
suggested to me by Cheng Huang of Microsoft, and is a good 16 times faster than
using galois_shift_multiply().


"But couldn't you use this technique for other values of w?" Yes, you
could, but I'm not implementing it, because I don't think it's that important.
If that view changes, I'll fix it.


Galois_single_multiply() uses galois_split_w8_multiply() for
w = 32.



Thread Safety

The only possible race conditions in these codes are when the various tables
are created. For that reason,
galois_create_mult_tables() and
galois_create_log_tables() should be protected by a mutex if thread
safety is a concern.


Since galois_single_multiply() and galois_single_divide() call
the table creation routines whenever the tables do not exist, if you are
worried about thread safety, then for each value of w that you will
use, you should make sure that the first call to
galois_single_multiply() or galois_single_divide() is protected.
After that, no protection is required.



Testing Applications

The programs gf_mult, gf_div, gf_log, gf_ilog
gf_inverse and gf_xor are straightforward and allow you to
test the various routines for various values of w.


Gf_basic_tester and gf_xor_tester test both correctness and speed.
Call gf_xor_tester with no arguments to test the speed of gf_region_xor()
on your system. Here it is on a Macbook whose CPU is a little busy doing other things:


Unix-Prompt> gf_xor_tester
XOR Tester
Seeding random number generator with 1172533188
Passed correctness test -- doing 10-second timing
1827.79986 Megabytes of XORs per second
Unix-Prompt>


Gf_basic_tester takes the following command line arguments:



  • W: 1 through 32

  • Method: This is one of the following words: default,
    multtable,
    logtable,
    shift,
    splitw8. It specifies how multiplication/division will be performed.
    Default uses gf_single_multiply() and
    gf_single_divide().

  • Ntrials: This specifies how many random multiplies/divides to test for
    correctness.

After testing for correctness, gf_basic_tester tests for speed. There are three
special cases. When method=default and w is 8, 16 and 32,
gf_basic_tester also tests the speed of gf_wxx_region_multiply().


For example (again, on my MacBook):


Unix-Prompt> gf_basic_tester 16 default 100000
W: 16
Method: default
Seeding random number generator with 1172533569
Doing 100000 trials for single-operation correctness.
Passed Single-Operations Correctness Tests.

Doing galois_w16_region_multiply correctness test.
Passed galois_w16_region_multiply correctness test.

Speed Test #1: 10 Seconds of Multiply operations
Speed Test #1: 42.23862 Mega Multiplies per second
Speed Test #2: 10 Seconds of Divide operations
Speed Test #2: 43.42448 Mega Divides per second

Doing 10 seconds worth of region_multiplies - Three tests:
Test 0: Overwrite initial region
Test 1: Products to new region
Test 2: XOR products into second region

Test 0: 253.45548 Megabytes of Multiplies per second
Test 1: 238.59569 Megabytes of Multiplies per second
Test 2: 167.99968 Megabytes of Multiplies per second



References



  • [MS77]
    F. J. MacWilliams and N. J. A. Sloane,
    The Theory of Error-Correcting Codes, Part I,
    North-Holland Publishing Company,
    Amsterdam, New York, Oxford,
    1977.




  • [PW72]
    W. W. Peterson and E. J. Weldon, Jr.,
    Error-Correcting Codes, Second Edition,
    The MIT Press,
    Cambridge, Massachusetts,
    1972, ISBN: 0-262-16-039-0.




  • [P97]
    J. S. Plank,
    "A Tutorial on Reed-Solomon Coding for
    Fault-Tolerance in RAID-like Systems,"
    Software -- Practice & Experience,
    27(9), September, 1997, pp. 995-1012.

    http://www.cs.utk.edu/~plank/plank/papers/SPE-9-97.html
    .




  • [P97]
    J. S. Plank,
    "Optimizing Cauchy Reed-Solomon Codes for Fault-Tolerant Storage Applications",
    Technical Report CS-TR-05-569, University of Tennessee, November, 2005.

    http://www.cs.utk.edu/~plank/plank/papers/CS-05-569.html
    .




  • [VL82]
    J. H. van Lint,
    Introduction to Coding Theory,
    Springer-Verlag,
    New York,
    1982.