{"id":23344267,"url":"https://github.com/ttsiodras/mandelbrotsse","last_synced_at":"2025-10-18T06:16:37.543Z","repository":{"id":44404900,"uuid":"309206745","full_name":"ttsiodras/MandelbrotSSE","owner":"ttsiodras","description":"Real-time Mandelbrot zoom via SSE, AVX, OpenMP, CUDA, XaoS... ","archived":false,"fork":false,"pushed_at":"2023-06-13T20:05:27.000Z","size":490,"stargazers_count":87,"open_issues_count":1,"forks_count":3,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-04-10T02:46:12.945Z","etag":null,"topics":["avx","cuda","openmp","sse"],"latest_commit_sha":null,"homepage":"","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ttsiodras.png","metadata":{"files":{"readme":"README","changelog":"ChangeLog","contributing":null,"funding":null,"license":"COPYING","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":"AUTHORS","dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-11-01T23:06:28.000Z","updated_at":"2025-03-30T00:47:27.000Z","dependencies_parsed_at":"2024-12-21T06:26:38.746Z","dependency_job_id":"0b689115-a773-4626-b141-13710bd55c72","html_url":"https://github.com/ttsiodras/MandelbrotSSE","commit_stats":null,"previous_names":[],"tags_count":10,"template":false,"template_full_name":null,"purl":"pkg:github/ttsiodras/MandelbrotSSE","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ttsiodras%2FMandelbrotSSE","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ttsiodras%2FMandelbrotSSE/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ttsiodras%2FMandelbrotSSE/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ttsiodras%2FMandelbrotSSE/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ttsiodras","download_url":"https://codeload.github.com/ttsiodras/MandelbrotSSE/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ttsiodras%2FMandelbrotSSE/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":264995335,"owners_count":23694940,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["avx","cuda","openmp","sse"],"created_at":"2024-12-21T06:26:10.718Z","updated_at":"2025-10-09T16:52:33.474Z","avatar_url":"https://github.com/ttsiodras.png","language":"C++","funding_links":[],"categories":[],"sub_categories":[],"readme":"WHAT IS THIS?\n=============\n\nThis is a real-time Mandelbrot fractal zoomer.\n\nCOMPILE/INSTALL/RUN\n===================\n\nWindows\n-------\nWindows users can download and run a pre-compiled Windows binary\n[here](https://github.com/ttsiodras/MandelbrotSSE/releases/download/2.11/mandelSSE-win32-2.11.zip).\n\nAfter decompressing, you can simply execute either one of the two .bat\nfiles. The 'autopilot' one zooms in a specific location, while the other\none allows you to zoom interactively using your mouse (left-click/hold zooms in,\nright-click/hold zooms out).\n\nFor those of you that want to build from source code, there are \ncross-compilation instructions later in this document.\n\nFor Linux/BSD/OSX users\n-----------------------\n\nMake sure you have libSDL2 installed. In Debian and its derivatives,\nlike Ubuntu, just `sudo apt install libsdl2-dev`.\n\nThen, build the code - with...\n\n    $ ./configure\n    $ make\n\nUsage\n-----\n\nYou can then try these:\n\n    $ src/mandelSSE\n    (Runs in autopilot mode, in a 1024x768 window)\n\n    $ src/mandelSSE -m 1280 720\n    (Runs in mouse-driven mode, in a 1280x720 window)\n    (left-click/hold zooms-in, right-click/hold zooms out)\n\nOption `-h` gives you additional information about how to control\nthe Mandelbrot zoomer:\n\n    $ ./src/mandelSSE -h\n\n    Usage: ./src/mandelSSE [-a|-m] [-h] [-b] [-v|-s|-d] [-i iter] [-p pct] [-f rate] [WIDTH HEIGHT]\n    Where:\n            -h      Show this help message\n            -m      Run in mouse-driven mode\n            -a      Run in autopilot mode (default)\n            -b      Run in benchmark mode (implies autopilot)\n            -v      Force use of AVX\n            -s      Force use of SSE\n            -d      Force use of non-AVX, non-SSE code\n            -i iter The maximum number of iterations of the Mandelbrot loop (default: 2048)\n            -p pct  The percentage of pixels computed per frame (default: 0.75)\n                    (the rest are copied from the previous frame)\n            -f fps  Enforce upper bound of frames per second (default: 60)\n                    (use 0 to run at full possible speed)\n\n    If WIDTH and HEIGHT are not provided, they default to: 1024 768\n\nFor ultimate rendering speed, you can disable the frame limiter (option `-f`).\nBy default, you are limited to 60fps:\n\n    $ src/mandelSSE -m -f 0 1280 720\n\nThe benchmarking mode (-b) does this automatically.\nIf you want to benchmark your CPU only (and not display anything)\ntell SDL you don't care about displaying the fractal:\n\n    $ SDL_VIDEODRIVER=dummy src/mandelSSE -b 512 384\n\nBe mindful of your CPU's thermal throttling if you are benchmarking :-)\nNote that you can force AVX (-v), SSE (-s) or dumb floating point (-d)\nto see the speed impact made by our usage of special Intel instructions.\n\nYou can also control:\n\n- the percentage of pixels actually computed per frame, with option `-p`.\n  If you e.g. pass `-p 0.5`, then 100-0.5 = 99.5% of the pixels will be\n  copied from the previous frame, and only 0.5% will be actually derived\n  through the Mandelbrot computations. Amazingly, this is enough for \n  a decent quality fly-through zoom in the fractal.\n  By default, this is set to 0.75.\n\n- the number of Mandelbrot iterations (option `-i`). By default this is\n  set to 2048 to allow for decent zoom levels, but if you want to see\n  insane speeds, set this to something low, like 128; and disable the\n  frame limiter; i.e. use `-f 0 -i 128`.\n\nWHAT IS THIS, AGAIN?\n====================\n\nLong story.\n\nWhen I got my hands on an SSE enabled processor (an Athlon-XP, back in 2002),\nI wanted to try out SSE programming... And over the better part of a weekend,\nI created a simple implementation of a Mandelbrot zoomer in SSE assembly.\nI was glad to see that my code was almost 3 times faster than pure C.\n\nBut that was just the beginning.\n\nOver the last two decades, I kept coming back to this, enhancing it.\n\n- I learned how to use the GNU autotools, and made it work on most Intel\n  platforms: checked with Linux, Windows (MinGW) and OpenBSD. \n  A decade later, I also tested it on Raspbian and Armbian; it works\n  fine in ARM machines as well. Autotools also allow me to cross-compile\n  for Windows (more on that below).\n\n- After getting acquainted with OpenMP, in Nov 2009 I added OpenMP #pragmas\n  to run both the C and the SSE code in all cores/CPUs. The SSE code had to\n  be moved from a separate assembly file into inlined code - but the effort\n  was worth it. The resulting frame rate - on a tiny Atom 330 running Arch\n  Linux - sped up from 58 to 160 frames per second.\n\n- I then coded it in CUDA - a 75$ GPU card gave me almost two orders of\n  magnitude of speedup!\n\n- Then in May 2011, I made the code switch automatically from single precision\n  floating point to double precision - when one zooms \"deep enough\".\n\n- Around 2012 I added a significant optimization: avoiding fully calculating\n  the Mandelbrot lake areas (black color) by drawing at 1/16 resolution and\n  skipping black areas in the full resolution render.\n\n- I learned enough VHDL in 2018 to [code the algorithm inside a Spartan3\n  FPGA](https://www.youtube.com/watch?v=yFIbjiOWYFY). That was quite a\n  [learning experience](https://github.com/ttsiodras/MandelbrotInVHDL).\n\n- In September 2020 I [ported a fixed-point arithmetic](\n  https://github.com/ttsiodras/Blue_Pill_Mandelbrot/) version of the\n  algorithm [inside a 1.4$ microcontroller](\n  https://www.youtube.com/watch?v=5875JOnFDLg).\n\n- In October 2020, I implemented what I understood to be the XaoS algorithm;\n  that is, re-using pixels from the previous frame to optimally update\n  the next one. Especially in deep-dives and large windows, this delivered\n  amazing speedups; between 2 and 3 orders of magnitude.\n\n- In July 2022, I optimised further with AVX instructions (+80% speed\n  in CoreLoopDouble). I also ported the code to libSDL2, which stopped\n  video tearing.\n\nFOR CODERS ONLY\n===============\n\nMy SSE code\n-----------\n\nThis used to be my main loop, right after I ported to SSE back in 2002:\n\n        ;  x' = x^2 - y^2 + a\n        ;  y' = 2xy + b\n        ;\n        mov     ecx, 0\n        movaps  xmm5, [fours]     ; 4.     4.     4.     4.       ; xmm5\n        movaps  xmm6, [re]        ; a0     a1     a2     a3       ; xmm6\n        movaps  xmm7, [im]        ; b0     b1     b2     b3       ; xmm7\n        xorps   xmm0, xmm0        ; 0.     0.     0.     0.\n        xorps   xmm1, xmm1        ; 0.     0.     0.     0.\n        xorps   xmm3, xmm3        ; 0.     0.     0.     0.       ; xmm3\n    loop1:\n        movaps  xmm2, xmm0        ; x0     x1     x2     x3       ; xmm2\n        mulps   xmm2, xmm1        ; x0*y0  x1*y1  x2*y2  x3*y3    ; xmm2\n        mulps   xmm0, xmm0        ; x0^2   x1^2   x2^2   x3^2     ; xmm0\n        mulps   xmm1, xmm1        ; y0^2   y1^2   y2^2   y3^2     ; xmm1\n        movaps  xmm4, xmm0\n        addps   xmm4, xmm1        ; x0^2+y0^2  x1...              ; xmm4\n        subps   xmm0, xmm1        ; x0^2-y0^2  x1...              ; xmm0\n        addps   xmm0, xmm6        ; x0'    x1'    x2'    x3'      ; xmm0\n        movaps  xmm1, xmm2        ; x0*y0  x1*y1  x2*y2  x3*y3    ; xmm1\n        addps   xmm1, xmm1        ; 2x0*y0 2x1*y1 2x2*y2 2x3*y3   ; xmm1\n        addps   xmm1, xmm7        ; y0'    y1'    y2'    y3'      ; xmm1\n        cmpltps xmm4, xmm5        ; \u003c4     \u003c4     \u003c4     \u003c4 ?     ; xmm2\n        movaps  xmm2, xmm4\n\n    ; at this point, xmm2 has all 1s in the non-overflowed pixels\n\n        movmskps eax, xmm4        ; (lower 4 bits reflect comparisons)\n        andps   xmm4, [ones]      ; so, prepare to increase the non-over\n        addps   xmm3, xmm4        ; by updating the 4 bailout counters\n        or      eax, eax          ; have all 4 pixels overflowed ?\n        jz      short nomore      ; yes, we're done\n\n        inc     ecx\n        cmp     ecx, ITERATIONS\n        jnz     short loop1\n\nThe new AVX code (inside CoreLoopDoubleAVX) follows the same motif;\nexcept that it also includes periodicity checking, and uses the YMM\nregisters.\n\nThe comments should help you follow what's happening... Basically,\nwe compute 4 pixels at a time.\n\nXaoS\n----\n\nThe idea behind the XaoS algorithm is simple: don't redraw the pixels;\ninstead re-use as many as you can from the previous frame.\n\nThe devil, as ever, is in the details.\n\nThe way I implemented this is as follows: the topmost scaline goes\nfrom X coordinate `xld` to `xru` - in `xstep` steps (see code\nfor details). I store these computed coordinates in array `xcoord`;\nand in the next frame, I compare the new coordinates with the old \nones. For each pixel, I basically find the closest X coordinate match.\n\nI do the same for the Y coordinates. In both cases, we are talking\nabout a 1-dimensional array, of MAXX or MAXY length.\n\nAfter I have the matches, I sort them - based on distance to the\ncoordinates of the previous frame. The `mandel` function then forces\na redraw for the worst N columns/rows - where N comes as a percentage\nparameter in the function call. Simply put, if the pixel's\nX and Y coordinates fall on \"slots\" that are close enough to the\nold frame's `xcoord` and `ycoord`, the pixel color is taken\nfrom the previous frame without doing the expensive Mandelbrot\ncalculation.\n\nThis works perfectly - the zoom becomes nice and smooth, and is\nalso improved with a full Mandelbrot render the moment the user\nstops zoooming.\n\nThe code has a lot of comments explaining the inner-workings in detail.\nHave a look!\n\nCross compiling for Windows via MinGW\n-------------------------------------\nAfter decompressing the SDL 2.0.22 tarball, install MinGW:\n\n    $ sudo apt install gcc-mingw-w64\n\nThen download the source code of libSDL and compile it as follows:\n\n    $ cd SDL-2.0.22\n    $ ./configure --host=x86_64-w64-mingw32 \\\n            --disable-video-x11 --disable-x11-shared \\\n            --prefix=/usr/local/packages/SDL-2.0.22-win32\n    $ make\n    $ sudo make install\n\nFinally, come back to this source folder, and configure it like this:\n\n    $ ./configure --host=x86_64-w64-mingw32 \\\n            --with-sdl-prefix=/usr/local/packages/SDL-2.0.22-win32 \\\n            --disable-sdltest\n    $ make\n    $ cp src/mandelSSE.exe \\\n            /usr/local/packages/SDL-2.0.22-win32/bin/SDL2.dll \\\n            /some/path/for/Windows/\n\nYou can also get the \"ingredients\" (DLLs for SDL2, OpenMP, libstd++, etc)\nfrom the packaged release\n[here](https://github.com/ttsiodras/MandelbrotSSE/releases/download/2.11/mandelSSE-win32-2.11.zip).\n\nMISC\n====\nSince it reports frame rate at the end (option `-b`), you can use this as\na benchmark for AVX instructions - it puts the AVX registers under quite a load.\n\nI've also coded a\n[CUDA version](https://www.thanassis.space/mandelcuda-1.0.tar.bz2),\nwhich you can play with, if you have an NVIDIA card.\nSome details about it, in the blog post I wrote back in 2009 about\nit [here](https://www.thanassis.space/mandelSSE.html).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fttsiodras%2Fmandelbrotsse","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fttsiodras%2Fmandelbrotsse","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fttsiodras%2Fmandelbrotsse/lists"}