{"id":15049301,"url":"https://github.com/jodithetigger/meow_fft","last_synced_at":"2026-03-04T02:32:11.952Z","repository":{"id":17272140,"uuid":"81597701","full_name":"JodiTheTigger/meow_fft","owner":"JodiTheTigger","description":"A simple, C99, header only, 0-Clause BSD Licensed, fast fourier transform (FFT).","archived":false,"fork":false,"pushed_at":"2024-07-01T05:28:29.000Z","size":67,"stargazers_count":111,"open_issues_count":1,"forks_count":14,"subscribers_count":3,"default_branch":"main","last_synced_at":"2024-12-23T09:28:15.532Z","etag":null,"topics":["c99","fft","header-only","public-domain"],"latest_commit_sha":null,"homepage":null,"language":"C","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/JodiTheTigger.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2017-02-10T19:07:59.000Z","updated_at":"2024-11-12T22:21:49.000Z","dependencies_parsed_at":"2024-10-12T17:31:18.697Z","dependency_job_id":null,"html_url":"https://github.com/JodiTheTigger/meow_fft","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JodiTheTigger%2Fmeow_fft","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JodiTheTigger%2Fmeow_fft/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JodiTheTigger%2Fmeow_fft/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JodiTheTigger%2Fmeow_fft/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/JodiTheTigger","download_url":"https://codeload.github.com/JodiTheTigger/meow_fft/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":231845037,"owners_count":18435049,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["c99","fft","header-only","public-domain"],"created_at":"2024-09-24T21:19:36.606Z","updated_at":"2026-03-04T02:32:11.869Z","avatar_url":"https://github.com/JodiTheTigger.png","language":"C","funding_links":[],"categories":[],"sub_categories":[],"readme":"meow_fft\n========\n\nMy Easy Oresome Wonderfull FFT\n\nBy Richard Maxwell\n\nA simple, C99, header only, 0-Clause BSD Licensed, fast fourier transform (FFT).\n\nExample\n=======\n\n```C\n    #define MEOW_FFT_IMPLEMENTATION\n    #include \u003cmeow_fft.h\u003e\n\n    #include \u003cmalloc.h\u003e\n\n    void main(char** argv, int argv)\n    {\n        (void) argv;\n        (void) argc;\n\n        unsigned          N    = 1024;\n        unsigned          N_2  = ((N + 1) / 2);\n        float*            in   = malloc(sizeof(float) * N);\n        Meow_FFT_Complex* out  = malloc(sizeof(Meow_FFT_Complex) * N_2);\n        Meow_FFT_Complex* temp = malloc(sizeof(Meow_FFT_Complex) * N_2);\n        // Real only FFTs output half the amount of inputs (N/2 rounded up)\n        // Full complex FFTs output all of them (N)\n        // This aaplies to the temp array as well (N/2 rounded up)\n\n        // prepare data for \"in\" array.\n        // ...\n\n        size_t workset_bytes = meow_fft_generate_workset_real(N, NULL);\n        // Get size for a N point fft working on non-complex (real) data.\n\n        Meow_FFT_Workset_Real* fft_real =\n            (Meow_FFT_Workset_Real*) malloc(workset_bytes);\n\n        meow_fft_generate_workset_real(N, fft_real);\n\n        meow_fft_real(fft_real, in, out);\n        // out[0].r == out[0  ].r\n        // out[0].j == out[N/2].r\n\n        meow_fft_real_i(fft_real, in, temp, out);\n        // result is not scaled, need to divide all values by N\n\n        free(fft_real);\n        free(out);\n        free(temp);\n        free(in);\n    }\n```\n\nUsage\n=====\n\nSince this is a single header library, just make a C file with the lines:\n\n```C\n    #define MEOW_FFT_IMPLEMENTATION\n    #include \u003cmeow_fft.h\u003e\n```\n\nThere are two sets of functions. Ones dealing with sequential interleaved\nfloating point complex numbers, and ones dealing with sequential floating point\nreal numbers (postfixed with `_real`).\n\nForward FFTs are labelled `_fft` while reverse FFTs are labelled `_fft_i`.\n\nThe function `_is_slow` can be used to tell if you have a non-optimised radix\ncalculation in your fft (ie the slow DFT is called). This will also increase\nthe memory requirements required by the workset.\n\nAll functions are namespaced with `meow_` and all Types by `Meow_`.\n\n\nWhy?\n====\nI thought I could write a faster FFT that kiss_fft, since I couldn't use FFTW3\ndue to its GPL license. LOL, If I knew about the pffft library, I would have\njust used that instead.\n\n`¯\\_(ツ)_/¯`\n\nPerformance\n===============\n\n* This FFT is for people who want a single file FFT implementation without any\nlicensing headaches and are not concerned with having the fastest performance.\n\n* This FFT is for people wanting to know how a fft is written using a _simple-ish_\n  implementation\n* It doesn't explicitly use vectorised instructions (SSE, NEON, AVX)\n* It is faster than kiss_fft only due to using a radix-8 kernel\n* It is slower than pffft, muFFT, vDSP, FFTW and other accelerated FFT libraries\n* It is slower than anything on the GPU\n* It has not been tested _on_ a GPU\n\nI found changing compiler flags can make the FFT go faster _or_ slower depending\non what you want to do. For example, using gcc with `-march=native` on my i7\nresulted in the \u003e 2048 FFTs going faster, but the \u003c 2048 FFTs going twice as\nslow. I also got mixed results with `-ffast-math`. Basically, you need to figure\nout what FFTs you are going to use, and then benchmark various compiler options\nfor your target platforms in order to get any useful compiler based performance\nincreases.\n\n\nReading List\n============\n\n* It's a circle! -\u003e How FFTs _actually_ work\n  http://betterexplained.com/articles/an-interactive-guide-to-the-fourier-transform/\n\n* How to get radix-2, 3, 4, and 5 formulas:\n  http://www.briangough.com/fftalgorithms.pdf pages 18 and 19\n\n* How do make a faster fft when only dealing with real (non-complex) inputs.\n  (Warning, the maths is confusing due to inconsistent formulas and assumptions)\n  http://www.engineeringproductivitytools.com/stuff/T0001/PT10.HTM\n\n* Finally, know that ffts are pretty much as fast as you can get, you need\n  to start making them cache friendly to get any extra speed.\n  https://math.mit.edu/~stevenj/18.335/FFTW-Alan-2008.pdf\n\n* Hmm, how is it done on GPUs? \n  http://mc.stanford.edu/cgi-bin/images/7/75/SC08_FFT_on_GPUs.pdf\n\nFFT Implementation\n==================\n\nI have implemented a non-scaled, float based decimation in time, mixed-radix,\nout of place, in order result fast fourier transform with sequentially accessed\ntwiddle factors per stage, with separate forward and reverse functions. It has\ncustom codelets for radices: 2,3,4,5 and 8, as well as a slow general discrete\nfourier transform (DFT) for all other prime number radices.\n\nSecondly, I have also a real only FFT that uses symmetrical based mixing in order\nto do a two for one normal FFT using real data.\n\nI wrote my FFT using kiss_fft, and engineeringproductivitytools as a guide, as\nwell as many days and nights going \"wtf, I have no idea what I'm doing\". I used\nFFTW's fft codelet compilers to generate my radix-8 codelet, as doing code\nsimplification by hand would have taken me another six months.\n\nAll in all it took me one year of part time coding to get this releasable.\n\nCould be faster if\n------------------\n\n* I don't reorder the fft, so the result is all jumbled up, and you need a 2nd\n  function to reorder it (like pffft)\n\n* Write ISPC code in case vectorisation can make it go faster\n\n\nBenchmarks\n==========\n\nTest Platforms\n--------------\n\n| Platform                                     | GCC version |\n|----------------------------------------------|-------------|\n| Intel(R) Core(TM) i7-4790 CPU     @ 3.60GHz  | 6.3.1       |\n| ARMv7 Processor rev 10 (v7l)      @ 1.00 GHz | 4.8.1       |\n| Intel(R) Core(TM)2 Duo CPU P8400  @ 2.26GHz  | 6.3.0       |\n\nBuild Line\n----------\n\nAll builds used GCC with `-O2 -g`. The ARM build use the command line options:\n\n```\n    -march=armv7-a -mthumb-interwork -mfloat-abi=hard -mfpu=neon\n    -mtune=cortex-a9\n```\n\nMeasurement Procedure\n---------------------\n\nThe time taken to do an N point FFT every 32 samples of a 5 second 16 bit mono\n44.1Khz buffer (signed 16 bit values) was measured. This was then divided by\nthe number of FFT calculations performed to give a value of microseconds per\nFFT. This was done 5 times and the median value was reported.\n\nResults for meow_fft, kiss_fft, fftw3 and pffft were taken. fftw was not tested\non the ARM platform. Some tests for pffftw were skipped due to lack of support\nfor certain values of N. pffft uses SSE/NEON vector CPU instructions.\n\n*NOTE* FFTW3 results are currently wrong as its using hartly instead of real\nFFT transform. Updated benchmarks are pending...\n\nResults\n-------\n\nValues are microseconds per FFT (1,000,000 microseconds are in one second)\n\n### Power of Two values of N\n\n#### Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz\n\n|     N |   meow |   kiss |  pffft |  fftw3 |\n|-------|--------|--------|--------|--------|\n|    64 |   0.73 |   0.58 |   0.15 |   0.15 |\n|   256 |   2.32 |   3.49 |   0.44 |   1.45 |\n|   512 |   4.36 |   5.09 |   1.02 |   3.35 |\n|  1024 |   8.89 |  13.12 |   2.19 |   7.29 |\n|  2048 |  23.00 |  24.61 |   5.27 |  16.26 |\n|  4096 |  42.30 |  60.04 |  11.09 |  35.49 |\n|  8192 |  84.72 | 119.38 |  39.49 |  84.72 |\n| 16384 | 232.20 | 290.22 |  82.47 | 189.40 |\n| 32768 | 411.35 | 562.56 | 208.15 | 417.32 |\n\n#### Intel(R) Core(TM)2 Duo CPU P8400  @ 2.26GHz\n\n|     N |   meow |    kiss |  pffft |\n|-------|--------|---------|--------|\n|    64 |   0.87 |    1.45 |   0.29 |\n|   256 |   5.52 |    6.97 |   1.16 |\n|   512 |  10.04 |   11.93 |   2.47 |\n|  1024 |  19.39 |   32.95 |   4.81 |\n|  2048 |  53.47 |   59.19 |  11.43 |\n|  4096 |  99.23 |  150.40 |  24.55 |\n|  8192 | 196.56 |  283.24 |  72.81 |\n| 16384 | 523.36 |  708.37 | 150.36 |\n| 32768 | 975.28 | 1357.65 | 337.54 |\n\n#### ARMv7 Processor rev 10 (v7l) @ 1.00 GHz\n\n|     N |    meow |    kiss |   pffft |\n|-------|---------|---------|---------|\n|    64 |    4.94 |    7.69 |    3.77 |\n|   256 |   25.86 |   32.26 |   12.64 |\n|   512 |   44.37 |   54.84 |   28.22 |\n|  1024 |   84.14 |  146.84 |   57.45 |\n|  2048 |  255.79 |  280.98 |  147.82 |\n|  4096 |  497.34 |  758.80 |  326.23 |\n|  8192 | 1025.47 | 1502.71 |  847.75 |\n| 16384 | 2822.83 | 3891.50 | 1831.14 |\n| 32768 | 5434.37 | 9110.81 | 4220.76 |\n\n### Non Power of Two values of N\n\n#### Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz\n\n|     N |   meow |   kiss |  pffft |  fftw3 |\n|-------|--------|--------|--------|--------|\n|   100 |   0.87 |   0.87 |        |   0.58 |\n|   200 |   1.74 |   1.89 |        |   1.16 |\n|   500 |   5.96 |   5.82 |        |   3.78 |\n|  1000 |  10.79 |  12.10 |        |   8.02 |\n|  1200 |  13.72 |  15.18 |        |   9.78 |\n|  5760 |  77.65 |  88.97 |  20.72 |  56.78 |\n| 10000 | 130.74 | 160.23 |        | 119.95 |\n\n#### Intel(R) Core(TM)2 Duo CPU P8400  @ 2.26GHz\n\n|     N |   meow |   kiss |  pffft |\n|-------|--------|--------|--------|\n|   100 |   2.47 |   2.03 |        |\n|   200 |   4.50 |   4.36 |        |\n|   500 |  15.27 |  12.95 |        |\n|  1000 |  28.14 |  27.26 |        |\n|  1200 |  34.44 |  36.04 |        |\n|  5760 | 194.78 | 208.94 |  47.24 |\n| 10000 | 341.90 | 350.11 |        |\n\n#### ARMv7 Processor rev 10 (v7l) @ 1.00 GHz\n\n|     N |    meow |    kiss |  pffft |\n|-------|---------|---------|--------|\n|   100 |   11.91 |   10.45 |        |\n|   200 |   19.47 |   19.76 |        |\n|   500 |   67.93 |   58.33 |        |\n|  1000 |  117.51 |  116.93 |        |\n|  1200 |  156.87 |  175.98 |        |\n|  5760 |  923.85 | 1163.04 | 612.37 |\n| 10000 | 1677.87 | 1985.10 |        |\n\n### Accuracy\n\nIn a perfect world, doing an FFT then an inverse FFT on the same data, then\nscaling the result by 1/N should result in an identical buffer to the source\ndata. However in real life, floating point errors can accumulate.\n\nThe first 32 values of the input data were compared to a scaled result buffer\nand then the difference multiplied by 65536 to simulate what the error would be\nfor a 16 bit audio stream using 32 bit floating point FFT maths. The worst error\nof the first 32 values was recorded for each FFT tested for size N.\n\n|   FFT |   Min |   Max |\n|-------|-------|-------|\n| meow  | 0.016 | 0.031 |\n| kiss  | 0.016 | 0.029 |\n| pffft | 0.012 | 0.043 |\n| fftw3 | 0.000 | 0.000 |\n\n\nOther FFT Implementations\n=========================\n\n* FFTW3    (GPL) : the original library, best accuracy.\n  http://www.fftw.org/\n\n* kiss_fft (BSD) : small and simple to follow, not as fast as fftw.\n  https://github.com/itdaniher/kissfft (mirror)\n\n* pffft (FFTPACK): Sometimes faster than FFTW3! More accurate than kiss or meow.\n  https://bitbucket.org/jpommier/pffft\n\n* FFTS     (MIT) : I couldn't get it to compile :-(\n  https://github.com/anthonix/ffts\n  \n* muFFT    (MIT) : SSE, SSE3, AVX-256\n  https://github.com/Themaister/muFFT\n  \n* GPU_FFT (3-BSD): GPU accelerated FFT for Rasberry Pi\n  http://www.aholme.co.uk/GPU_FFT/Main.htm\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjodithetigger%2Fmeow_fft","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjodithetigger%2Fmeow_fft","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjodithetigger%2Fmeow_fft/lists"}