https://github.com/cherubrock-seb/prmers

Mersenne prime search using integer arithmetic and an IDBWT via an NTT executed on the GPU through OpenCL.
https://github.com/cherubrock-seb/prmers
ecm elliptic-curve-cryptography elliptic-curves factoring factoring-algorithms factoring-integers fft gpu gpu-computing lucas lucas-lehmer mathematics mersenne mersenne-numbers mersenne-prime ntt opencl p-1 prime-numbers
Last synced: 3 months ago
JSON representation
Mersenne prime search using integer arithmetic and an IDBWT via an NTT executed on the GPU through OpenCL.
Host: GitHub
URL: https://github.com/cherubrock-seb/prmers
Owner: cherubrock-seb
License: mit
Created: 2025-02-25T12:19:16.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2026-01-22T21:26:55.000Z (5 months ago)
Last Synced: 2026-01-23T14:19:20.174Z (5 months ago)
Topics: ecm, elliptic-curve-cryptography, elliptic-curves, factoring, factoring-algorithms, factoring-integers, fft, gpu, gpu-computing, lucas, lucas-lehmer, mathematics, mersenne, mersenne-numbers, mersenne-prime, ntt, opencl, p-1, prime-numbers
Language: C++
Homepage:
Size: 59.6 MB
Stars: 9
Watchers: 3
Forks: 4
Open Issues: 3
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project

README

          PrMers: GPU-accelerated Mersenne Primality Testing

==================================================

https://github.com/cherubrock-seb/PrMers/

PrMers is a high-performance GPU application for Lucas–Lehmer (LL), PRP, P‑1 and ECM

testing of Mersenne numbers. It uses OpenCL and an integer NTT / IBDWT engine modulo 2^64 − 2^32 + 1 (Uses IBDWT-style transforms over Z / (2^64 − 2^32 + 1) Z) and is designed for long, reliable runs with checkpointing.

The project also supports PRP tests of cofactors and Wagstaff numbers, and includes

a web-based GUI and a GPU VRAM tester.

Overview of Algorithms and Backends

-----------------------------------

PrMers has two main computational backends:

- Marin backend (default)

  - External library: https://github.com/galloty/marin

  - Efficient modular exponentiation with Gerbicz–Li style error checking (in PRP and LL safe).

  - Uses IBDWT-style transforms over Z / (2^64 − 2^32 + 1) Z.

  - Supports PRP, LL, P‑1 and ECM on Mersenne numbers.

- Internal NTT backend

  - Integer NTT / IBDWT implementation inside PrMers.

  - Used when the Marin backend is disabled.

  - Option for experimentation and comparison.

Select backend:

- Default: Marin backend enabled.

- `-marin`: disable Marin and use the internal NTT backend instead.

Supported Modes

---------------

Numbers

- **Mersenne**:            N = 2^p − 1

- **Wagstaff**:            W = (2^p + 1) / 3  (via `-wagstaff`)

- **Cofactors**:           N / product(known factors), PRP‑tested as generic integer

Main modes

- **PRP (default)**

  - Probable-prime test with Gerbicz–Li error checking.

  - Works for Mersenne, Wagstaff and cofactors.

  - Produces Res64 and full residue (optionally a proof).

- **Lucas–Lehmer (LL)**

  - Three LL modes exist (GPU). See the dedicated section *“Lucas–Lehmer modes and safety”* below.

    - `-ll`        → **LL (safe)** (default LL mode)

    - `-llunsafe`  → **LL (classic/unsafe)**

    - `-llsafe2`   → **LL (safe, “doubling” variant)**

- **P‑1 factoring**

  - Stage 1 and Stage 2 on N = 2^p − 1.

  - Stage is error checked with Gerbicz–Li.

  - Targets factors q of N such that q − 1 is B1‑smooth (with Stage 2 extension).

  - **GPU (Marin) only.** Stage 2 supports both the classic prime-sweep and an **n^K (Crandall) variant** (see `-K` and `-nmax`).

  - **Interoperability & resume files** (after Stage 1 or Stage 2, see `-resume` / `-p95`):

    - Export a **GMP‑ECM `.save`** resume and/or a **Prime95 `.p95`** resume.

    - You can **extend Stage 1** from an existing **`.save` or `.p95`** using `-b1old ` (auto‑detects the matching file).

- **ECM**

  - Elliptic Curve Method on N = 2^p − 1.

  - Multiple curve models (Edwards/Montgomery) and torsion variants.

- **Wagstaff**

  - With `-wagstaff`, runs PRP/ECM/P‑1 on W = (2^p + 1) / 3.

Requirements

------------

- OpenCL 1.2 runtime (OpenCL 2.0 recommended) and a supported GPU.

- C++20 compiler (g++/clang++ on Linux/macOS; MSVC or MinGW on Windows).

- GMP (ECM and CPU‑side helpers).

Debian/Ubuntu packages:

    sudo apt-get update

    sudo apt-get install -y g++ make \

        ocl-icd-opencl-dev opencl-headers \

        libgmp-dev

Building from Source

--------------------

Clone the repository:

    git clone https://github.com/cherubrock-seb/PrMers.git

    cd PrMers

Linux / macOS (Makefile)

- Build:

    make -j$(nproc)

- Install executable and kernels:

    sudo make install

This installs:

- Executable:   /usr/local/bin/prmers

- Kernels:      /usr/local/share/prmers/

The binary embeds a `KERNEL_PATH` pointing to the installation directory so that

PrMers can find its OpenCL kernels after installation.

Windows with CMake + vcpkg (recommended)

- Install CMake, Visual Studio and vcpkg.

- From the PrMers directory:

    git clone https://github.com/microsoft/vcpkg.git

    cd vcpkg

    .\bootstrap-vcpkg.bat

    cd ..

    cmake -S . -B build ^

      -DCMAKE_TOOLCHAIN_FILE=./vcpkg/scripts/buildsystems/vcpkg.cmake ^

      -DCMAKE_BUILD_TYPE=Release

    cmake --build build --config Release

Copy the required DLLs from `vcpkg\installed\x64-windows\bin` next to `prmers.exe`

or add that directory to your PATH.

Windows with MSYS2 / MinGW (UCRT64)

- In MSYS2 UCRT64 shell:

    pacman -Syu

    pacman -S --noconfirm make \

        mingw-w64-ucrt-x86_64-gcc \

        mingw-w64-ucrt-x86_64-opencl-headers \

        mingw-w64-ucrt-x86_64-opencl-icd-loader \

        mingw-w64-ucrt-x86_64-gmp

- Build:

    make -j$(nproc)

Prebuilt binaries

- Linux / Windows / macOS builds are on GitHub Releases:

  https://github.com/cherubrock-seb/PrMers/releases

macOS notes

- OpenCL is present on supported macOS versions.

- On first run, Gatekeeper may block the binary (not notarized). Allow it in

  *System Settings → Security & Privacy*, then run again.

Quick Start

-----------

Basic PRP on a Mersenne exponent:

    ./prmers 136279841

- Mode: PRP (default, Marin backend).

- Checkpoint every 120 s (default).

- Results go to `results.txt` and `_prp_result.json`.

Lucas–Lehmer test (safe mode):

    ./prmers 127 -ll

P‑1 (stage 1 and stage 2):

    ./prmers 367 -pm1 -b1 11981 -b2 38971

Export Stage‑1 resume to GMP‑ECM `.save` and Prime95 `.p95`:

    ./prmers 367 -pm1 -b1 11981 -resume      # both .save and .p95

    ./prmers 367 -pm1 -b1 11981 -p95         # .p95 only

Extend Stage‑1 from a previous B1 using `.save` or `.p95` (auto‑detected):

    ./prmers 367 -pm1 -b1 38971 -b1old 11981

ECM on a Mersenne number:

    ./prmers 701 -ecm -b1 6000 -K 8

    ./prmers 701 -ecm -b1 6000 -b2 33333 -K 8

Test a Wagstaff number W = (2^p + 1) / 3:

    ./prmers 100003 -wagstaff

Use worktodo.txt:

    ./prmers -worktodo ./worktodo.txt

Use a config file:

    ./prmers -config ./settings.cfg

Disable Marin and use the internal backend:

    ./prmers 136279841 -marin

Lucas–Lehmer modes and safety

-----------------------------

PrMers implements three LL variants:

1) **LL (safe)** - `-ll`  

   - Uses the split representation *s = a + b√3* so each squaring is computed as:

     (a + b√3)^2 = (a^2 + 3b^2) + (2ab)√3.  

     Implemented as four transforms per iteration:  

     A=T(a), B=T(b), invT(A^2 + T(3)·B^2), 2·invT(A·B) (start from (a,b)=(2,1); final check is (−1,0)).  

   - **Error checking:** protected by Gerbicz–Li style verification with periodic

     roll‑back/restore to the last verified checkpoint. Default check cadence is

     ~10 minutes; tune with `-checklevel ` (higher = more frequent). Disable

     with `-gerbiczli` (not recommended except for benchmarking).

   - **Speed:** safer but slower than the classic LL (more transforms per step).

   - **When to use:** when you need a reliable LL run on GPU.

2) **LL (classic / unsafe)** - `-llunsafe`  

   - Classical recurrence S_0=4; S_{i+1}=S_i^2−2 modulo M_p.  

   - **Error checking:** *none*. Fastest LL but susceptible to silent errors on

     marginal hardware or aggressive overclocks.

   - **When to use:** quick checks / debugging. Prefer PRP or LL safe for proofs.

3) **LL (safe2, doubling)** - `-llsafe2` [optional `-llsafeb `]  

   - Block‑doubling consistency check variant. Work is split into blocks of size *B*

     (by default B≈⌊√p⌋); at block boundaries a

     doubling identity is used to verify progress and roll back on mismatch.

   - **Error checking:** periodic; lighter‑weight than full GL but still robust.

   - **Tuning:** set block size with `-llsafeb ` (auto if omitted).


**Notes**

- LL modes are only for genuine Mersenne numbers. For cofactors, use PRP.

- You can inject an error to test detection/restart with `-erroriter `.

- `-res64_display_interval N` prints Res64 every N iterations (0 disables).


Command Line Options (summary)

------------------------------

For the full list, run:

    ./prmers -h

Common options

Positional

- `
`                   Exponent of the Mersenne or Wagstaff number.

Device and performance

- `-d `               OpenCL device id (default: 0).

- `-O `            OpenCL compiler opts, e.g. `fastmath mad`.

- `-c `            Local carry propagation depth.

- `-profile`              Enable kernel profiling.

- `-memtest`              Run GPU VRAM test.

Modes

- `-prp`                  Force PRP (default).

- `-ll`                   LL safe (a + b√3 with GL checks).

- `-llunsafe`             LL classic (no checks).

- `-llsafe2`              LL safe “doubling” variant (block checks).

- `-wagstaff`             Test W = (2^p + 1)/3 instead of 2^p − 1.

LL safety / diagnostics

- `-checklevel `       Force GL check every ~B×k iters (B≈√p by default).

- `-gerbiczli`            Disable Gerbicz–Li checks (PRP/LL‑safe). Not recommended.

- `-llsafeb `          Block size for `-llsafe2` (default ≈ √p).

- `-erroriter `        Inject an error at iteration *i* to test detection.

- `-res64_display_interval `  Show Res64 every N iterations (0=off).


P‑1

- `-pm1`                  P‑1 factoring mode.

- `-b1 `              Stage 1 bound.

- `-b2 `              Stage 2 bound.

- `-b1old `        Extend an existing Stage 1 run (auto‑loads `.save` or `.p95`).

- `-resume`               After Stage 1/2, write GMP‑ECM `.save` and Prime95 `.p95` resumes.

- `-p95`                  After Stage 1/2, write Prime95 `.p95` only.

- `-filemers `      Convert `
pm.mers` → GMP‑ECM `.save` (helper).

- `-K `                Enable n^K (Crandall) Stage‑2 variant with K powers.

- `-nmax `             Upper bound for the n^K variant.

ECM

- `-ecm`                  ECM on 2^p − 1.

- `-b1 `              Stage 1 bound.

- `-b2 `              Stage 2 bound (optional).

- `-K `           Number of curves.

- `-montgomery`           Use a Montgomery model.

- `-torsion16`            Force torsion‑16 (or `-notorsion` to disable).

Checkpoints and backup

- `-t `              Checkpoint interval (default: 120s).

- `-f `             Directory for checkpoints (default: current).

Worktodo / config

- `-worktodo [path]`      Read GIMPS‑style `worktodo.txt` (first PRP= line).

- `-config `        Read options from a config file.

Backend / expert

- `-marin`                Disable Marin; use internal NTT backend.

- More expert flags exist (local sizes, enqueue caps, etc.). See `-h`.

Web-based GUI

-------------

Start:

    ./prmers -gui -http 3131

- Default host: first non‑loopback IPv4; default port: 3131.

- Override host/port with `-host` and `-http`.

- Then open the printed URL, e.g. http://127.0.0.1:3131/

The GUI lets you:

- Monitor progress, residues and logs in real time.

- Build and edit `worktodo.txt` entries.

- Inspect results and settings.

worktodo.txt

------------

PrMers understands GIMPS‑style `PRP=` lines. Example:

    PRP=DEADBEEFCAFEBABEDEADBEEFCAFEBABE,1,2,197493337,-1,76,0;

This is k*b^n+c with k=1, b=2, n=197493337, c=−1 → the Mersenne 2^197493337−1.

Usage:

    ./prmers -worktodo

    ./prmers -worktodo ./worktodo.txt

- If no exponent is on the CLI and a valid `PRP=` is found, that exponent is used.

- Only the first valid `PRP=` line is read currently.

Gerbicz–Li Error Checking (PRP & LL safe)

-----------------------------------------

Principle

- Split long exponentiations into blocks of size B≈√p.

- Maintain a *current* rolling product and a *reference* product from the last

  verified checkpoint.

- Every ~10 minutes (tunable via `-checklevel`), recompute the theoretical product

  from the stored checkpoint, compare, and either advance the “last‑correct”

  marker or roll back and replay.

Persistence

- Files `.bufd`, `.lbufd`, `.gli`, `.isav`, `.jsav` and the state file contain

  everything needed for deterministic restart after a mismatch, crash, or Ctrl‑C.

Testing

- Inject an error to validate recovery:

    ./prmers 6972593 -erroriter 19500

Disabling

- Use `-gerbiczli` to disable (benchmarks only; not recommended for production).

P‑1 Factoring

-------------

Target: N = 2^p − 1 with q | N such that q = 2kp + 1.

Stage 1

- Choose B1. Let E = lcm(1,…,B1). Compute x = 3^(E·2p) mod N and g = gcd(x − 1, N).

- If 1 < g < N, a non‑trivial factor is found.

- **Export resumes** (optional): with `-resume` PrMers writes `resume_p
_B1_.save` (GMP‑ECM) and `resume_p
_B1_.p95` (Prime95). With `-p95`, write only the `.p95`.

Example:

    ./prmers 541 -pm1 -b1 8099

    ./prmers 541 -pm1 -b1 8099 -resume

**Extend Stage 1**

- To extend from B1old to a higher B1 using an existing `.save` or `.p95`:

  

    ./prmers 541 -pm1 -b1 20000 -b1old 8099

  The program auto‑detects the matching `resume_p
_B1_.save` or `.p95` in the current directory.

Stage 2

- Choose B2 > B1.

- From Stage 1’s state H, compute

  Q = ∏_{q ∈ (B1,B2]} (H^q − 1) mod N,

  with standard optimizations (prime gaps, cached powers).

- **n^K (Crandall) variant**: enable with `-K ` and optionally bound exponents with `-nmax `.

- **Export resumes** (optional): with `-resume`, PrMers writes `resume_p
_B1__B2_.save` and `.p95` after Stage 2.

Example:

    ./prmers 367 -pm1 -b1 11981 -b2 38971

    ./prmers 367 -pm1 -b1 11981 -b2 38971 -resume

    ./prmers 367 -pm1 -b1 11981 -b2 38971 -K 8 -nmax 200000   # n^K variant

Implementations

- **GPU P‑1 using the Marin backend** (Stage 1 and Stage 2, including n^K).

ECM on Mersenne Numbers

-----------------------

    ./prmers p -ecm -b1 B1 -b2 B2 -K curves [curve options]

- Stage 1 bound B1, optional Stage 2 bound B2, K curves.

- Defaults: Edwards curve with torsion optimizations.

- Options: `-montgomery`, `-torsion16`, `-notorsion`, `-seed `.

- Interoperability: P‑1 resumes use **GMP‑ECM**’s `.save` textual format in addition to Prime95’s `.p95` when requested via `-resume`.

GPU Memory Test (-memtest)

--------------------------

    ./prmers -memtest

    ./prmers -memtest -d 2

- Scans as much VRAM as possible (subject to device limits).

- Patterns include address‑derived values, inversion toggles, and modulo‑stride

  sequences with multiple offsets.

- Reports coverage, traffic, bandwidth and any detected errors.

NTT Transform Sizes

-------------------

For a given exponent p, PrMers chooses an NTT/IBDWT size N:

| Exponent p range | N | Structure |

|---|---:|---|

| 3–113 | 4 | 2^2 |

| 127–239 | 8 | 2^3 |

| 241–463 | 16 | 2^4 |

| 467–919 | 32 | 2^5 |

| 929–1153 | 40 | 5·2^3 |

| 1163–1789 | 64 | 2^6 |

| 1801–2239 | 80 | 5·2^4 |

| 2243–3583 | 128 | 2^7 |

| 3593–4463 | 160 | 5·2^5 |

| 4481–6911 | 256 | 2^8 |

| 6917–8629 | 320 | 5·2^6 |

| 8641–13807 | 512 | 2^9 |

| 13829–17257 | 640 | 5·2^7 |

| 17291–26597 | 1024 | 2^10 |

| 26627–33247 | 1280 | 5·2^8 |

| 33287–53239 | 2048 | 2^11 |

| 53267–66553 | 2560 | 5·2^9 |

| 66569–102397 | 4096 | 2^12 |

| 102407–127997 | 5120 | 5·2^10 |

| 128021–204797 | 8192 | 2^13 |

| 204803–255989 | 10240 | 5·2^11 |

| 256019–393209 | 16384 | 2^14 |

| 393241–491503 | 20480 | 5·2^12 |

| 491527–786431 | 32768 | 2^15 |

| 786433–982981 | 40960 | 5·2^13 |

| 983063–1507321 | 65536 | 2^16 |

| 1507369–1884133 | 81920 | 5·2^14 |

| 1884193–3014653 | 131072 | 2^17 |

| 3014659–3768311 | 163840 | 5·2^15 |

| 3768341–5767129 | 262144 | 2^18 |

| 5767169–7208951 | 327680 | 5·2^16 |

| 7208977–11534329 | 524288 | 2^19 |

| 11534351–14417881 | 655360 | 5·2^17 |

| 14417927–22020091 | 1048576 | 2^20 |

| 22020127–27525109 | 1310720 | 5·2^18 |

| 27525131–44040187 | 2097152 | 2^21 |

| 44040253–55050217 | 2621440 | 5·2^19 |

| 55050253–83886053 | 4194304 | 2^22 |

| 83886091–104857589 | 5242880 | 5·2^20 |

| 104857601–167772107 | 8388608 | 2^23 |

| 167772161–209715199 | 10485760 | 5·2^21 |

| 209715263–318767093 | 16777216 | 2^24 |

| 318767107–398458859 | 20971520 | 5·2^22 |

| 398458889–637534199 | 33554432 | 2^25 |

| 637534277–796917757 | 41943040 | 5·2^23 |

| 796917763–1207959503 | 67108864 | 2^26 |

| 1207959559–1509949421 | 83886080 | 5·2^24 |

Benchmarks

----------

PrMers performance depends on

- GPU model

- clock rates and power limits

- OpenCL driver

- code version and options

Numbers below are approximate and obtained on specific setups. Treat them as

order‑of‑magnitude guidance only. For more detail, see the Mersenne Forum thread:

    https://www.mersenneforum.org/node/1086124/page3

### Quick overview (PRP on Mersenne exponents, Marin backend)

PRP throughput for p ≈ 136,279,841 (PRP mode, Marin, auto NTT).

| GPU                                       | User / system                | PRMERS_SCORE | Iter/s @ p ≈ 1.36e8 | Approx PRP ETA | Notes                                  |

|-------------------------------------------|------------------------------|--------------|---------------------|----------------|----------------------------------------|

| NVIDIA GeForce RTX 5090                   | Resolver (vast.ai)           | n/a          | ≈ 2230              | ≈ 17 h         | High‑end NVIDIA Ada/Blackwell          |

| NVIDIA GeForce RTX 4090                   | Resolver                     | 100.00/100   | ≈ 1225              | ≈ 31 h         | Reference 100/100 score                |

| NVIDIA GeForce RTX 5070 Laptop            | beepthebee                   | 62.69/100    | ≈ 356               | ≈ 4.4–4.6 days | +200 MHz core, +500 MHz VRAM (OC)      |

| NVIDIA GeForce RTX 4060 Ti                | Lorenzo                      | 69.14/100    | ≈ 318               | ≈ 5 days       | Desktop midrange                       |

| NVIDIA GeForce RTX 4070 Laptop GPU        | Phantomas                    | 52.24/100    | ≈ 255               | ≈ 6 days       | Gaming laptop GPU                      |

| NVIDIA GeForce RTX 2060                   | hwt; Artoria2e5              | 45.76/100    | ≈ 240–259           | ≈ 5.9–6.7 days | Undervolt / power cap in some reports  |

| NVIDIA GeForce GTX 1660 Ti                | Phantomas (MSI GL73)         | n/a          | ≈ 234               | ≈ 6.8 days     | Older Turing GPU                       |

| AMD Radeon VII                            | cherubrock (author)          | 50.57/100    | ≈ 350               | ≈ 4.5 days     | Reference dev card                     |

| Apple M4 Pro (Mac mini / MacBook)         | wigglefruit                  | 30.29/100    | ≈ 164               | ≈ 9.6 days     | Apple silicon, 18‑core GPU             |

| Apple M2 (MacBook Air, 8 GB unified RAM)  | cherubrock (author)          | n/a          | ≈ 25                | ≈ 62 days      | Thin‑and‑light laptop                  |

All runs above:

- use the Marin backend in PRP mode;

- let PrMers choose NTT sizes automatically;

- were executed with reasonably tuned (not extreme) power settings.

Your results will vary with clocks, thermals, drivers and PrMers version.

Cleaning and Uninstall

----------------------

Clean build artifacts:

    make clean

Uninstall installed files:

    sudo make uninstall

Backend and code

----------------

- Marin backend by Yves Gallot  

  - https://github.com/galloty/marin

- Integer NTT / IBDWT techniques  

  - Based on ideas discussed by Nick Craig-Wood and others in the context of

    modular arithmetic for Mersenne numbers. In particular, NTT and IBDWT

    using modular arithmetic modulo 2^64 - 2^32 + 1.

- Gerbicz-Li proof scheme  

  - Used for PRP error checking in PrMers (see the paper in the "Must read papers"

    section below).

Related inspiration

-------------------

- GPUOwl (Preda)  

- Genefer22 (Yves Gallot)  

- GIMPS and the Mersenne Forum community  

- GMP-ECM and related work on elliptic curve factoring:  

  - https://gitlab.inria.fr/zimmerma/ecm

- Repositories by Yves Gallot containing many useful resources:  

  - https://github.com/galloty  

  - https://github.com/galloty/f12ecm  

  - https://github.com/galloty/FastMultiplication

- Work by Nick Craig-Wood:  

  - IOCCC 2012 entry: https://github.com/ncw/ioccc2012  

  - Armprime project: https://github.com/ncw/  

  - ARM Prime Math (background on the math behind Armprime):  

    https://www.craig-wood.com/nick/armprime/math/

Must read papers

----------------

### Multiplication by FFT

- Discrete Weighted Transforms and Large Integer Arithmetic  

  Richard Crandall and Barry Fagin, 1994  

  https://www.ams.org/journals/mcom/1994-62-205/S0025-5718-1994-1185244-1/S0025-5718-1994-1185244-1.pdf

- Rapid Multiplication Modulo the Sum And Difference of Highly Composite Numbers  

  Colin Percival, 2002  

  https://www.daemonology.net/papers/fft.pdf

### P-1 factoring

- An FFT Extension to the P-1 Factoring Algorithm  

  Peter L. Montgomery and Robert D. Silverman, 1990  

  https://www.ams.org/journals/mcom/1990-54-190/S0025-5718-1990-1011444-3/S0025-5718-1990-1011444-3.pdf

- Improved Stage 2 to P+/-1 Factoring Algorithms  

  Peter L. Montgomery and Alexander Kruppa, 2008  

  https://inria.hal.science/inria-00188192v3/document

### Proof schemes (Gerbicz-Li)

- An Efficient Modular Exponentiation Proof Scheme  

  Darren Li, Yves Gallot, 2022–2023  

  arXiv: https://arxiv.org/abs/2209.15623  

  Presents an efficient proof scheme for left-to-right modular exponentiation,

  generalizing the Gerbicz-Pietrzak approach to arbitrary exponents. It allows

  an = r (mod m) to be proven with overhead negligible compared to the

  exponentiation itself and has been deployed at PrimeGrid to validate long

  runs.

Author

------

Author of PrMers:

- cherubrock (Sebastien), with contributions and feedback from users on

  mersenneforum.org and GitHub.

For bug reports, feature requests, or contributions, please use:

    https://github.com/cherubrock-seb/PrMers/issues

    ### Example PRP throughput (Marin backend, PRP mode)

Below are some concrete examples for different GPUs and exponents. All are for

Mersenne numbers M_p = 2^p − 1 in PRP mode.

#### NVIDIA GeForce RTX 5090 (Resolver, vast.ai instance)

Transform sizes were chosen automatically by PrMers.

- p = 57 885 161, NTT size 8  

  - About 2350 iter/s, ETA around 6 h 50 min.

- p = 74 207 281, NTT size 8  

  - About 2230 iter/s, ETA around 9 h 15 min.

- p = 82 589 933, NTT size 8  

  - About 1970 iter/s, ETA around 11 h 40 min.

- p = 136 279 841, NTT size 8  

  - About 2230 iter/s, ETA around 17 h.

#### AMD Radeon VII (cherubrock, local dev machine)

- p = 57 885 161, NTT size 8  

  - About 510 iter/s, ETA around 31 h.

- p = 74 207 281, NTT size 8  

  - About 436 iter/s, ETA around 48 h.

- p = 82 589 933, NTT size 8  

  - About 402 iter/s, ETA around 52 h.

- p = 136 279 841, NTT size 8  

  - About 350 iter/s, ETA around 4.5 days.

#### NVIDIA GeForce RTX 4090 (Resolver)

- p = 57 885 161, NTT size 8  

  - About 1030 iter/s, ETA around 15 h.

- p = 74 207 281, NTT size 8  

  - About 910 iter/s, ETA around 22 h.

- p = 82 589 933, NTT size 8  

  - About 840 iter/s, ETA around 27 h.

- p = 136 279 841, NTT size 8  

  - About 1225 iter/s, ETA around 31 h.

#### NVIDIA GeForce RTX 4060 Ti (Lorenzo)

- p = 57 885 161, NTT size 8  

  - About 420 iter/s, ETA around 37 h.

- p = 74 207 281, NTT size 8  

  - About 366 iter/s, ETA around 55 h.

- p = 82 589 933, NTT size 8  

  - About 337 iter/s, ETA around 59 h.

- p = 136 279 841, NTT size 8  

  - About 318 iter/s, ETA just under 5 days.

#### NVIDIA GeForce RTX 4070 Laptop GPU (Phantomas)

- p = 57 885 161, NTT size 8  

  - About 370 iter/s, ETA around 42 h.

- p = 74 207 281, NTT size 8  

  - About 320 iter/s, ETA around 63 h.

- p = 82 589 933, NTT size 8  

  - About 283 iter/s, ETA around 71 h.

- p = 136 279 841, NTT size 8  

  - About 255 iter/s, ETA a bit over 6 days.

#### NVIDIA GeForce GTX 1660 Ti (Phantomas, MSI GL73 notebook)

- p = 57 885 161, NTT size 8  

  - About 330 iter/s, ETA around 47 h.

- p = 74 207 281, NTT size 8  

  - About 288 iter/s, ETA around 69 h.

- p = 82 589 933, NTT size 8  

  - About 262 iter/s, ETA around 76 h.

- p = 136 279 841, NTT size 8  

  - About 234 iter/s, ETA around 6.8 days.

#### NVIDIA GeForce RTX 2060 (hwt; Artoria2e5)

Typical ranges seen (power-capped / undervolted in some runs):

- p = 57 885 161  

  - ≈ 491–502 iter/s, ETA ≈ 1 d 7 h – 1 d 20 h.

- p = 74 207 281  

  - ≈ 499 iter/s, ETA ≈ 1 d 16 h.

- p = 82 589 933  

  - ≈ 499–502 iter/s, ETA ≈ 1 d 15 h – 1 d 20 h.

- p = 136 279 841  

  - ≈ 240–259 iter/s, ETA ≈ 5 d 21 h – 6 d 18 h.

#### NVIDIA GeForce RTX 5070 Laptop (beepthebee)

- p = 57 885 161, NTT size 8  

  - About 858 iter/s, ETA around 18 h 45 min.

- p = 74 207 281, NTT size 8  

  - About 882 iter/s, ETA around 1 d 0 h.

- p = 82 589 933, NTT size 8  

  - About 875 iter/s, ETA around 1 d 2 h.

- p = 136 279 841, NTT size 8  

  - About 356 iter/s, ETA around 4 d 10 h.

#### Apple M4 Pro (wigglefruit)

- p = 57 885 161, NTT size 8  

  - About 264 iter/s, ETA around 58 h.

- p = 74 207 281, NTT size 8  

  - About 231 iter/s, ETA around 87 h.

- p = 82 589 933, NTT size 8  

  - About 213 iter/s, ETA around 94 h.

- p = 136 279 841, NTT size 8  

  - About 164 iter/s, ETA around 9.6 days.

#### Apple M2 (MacBook Air 8 GB, cherubrock)

- p = 57 885 161, NTT size 8  

  - About 42 iter/s, ETA around 15 h.

- p = 74 207 281, NTT size 8  

  - About 38 iter/s, ETA around 25 h.

- p = 82 589 933, NTT size 8  

  - About 32 iter/s, ETA around 29 h.

- p = 136 279 841, NTT size 8  

  - About 25 iter/s, ETA around 62 days.
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/cherubrock-seb/prmers

Awesome Lists containing this project

README