{"id":25504567,"url":"https://github.com/nilostolte/sudoku","last_synced_at":"2026-02-25T23:02:38.685Z","repository":{"id":142983433,"uuid":"611355889","full_name":"nilostolte/Sudoku","owner":"nilostolte","description":"Simple 9x9 Sudoku brute force solver with intrinsic parallel candidate set processing using bits to represent digits in the [1, 9] range, and bitwise operations to test a candidate against the candidate set, all at once.","archived":false,"fork":false,"pushed_at":"2025-01-07T22:34:02.000Z","size":275,"stargazers_count":5,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-09-01T09:01:47.865Z","etag":null,"topics":["bitwise-operators","brute-force-algorithm","c","java","optmization","sudoku","zig"],"latest_commit_sha":null,"homepage":"","language":"Zig","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/nilostolte.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-03-08T16:51:28.000Z","updated_at":"2025-04-17T12:14:00.000Z","dependencies_parsed_at":"2024-03-17T15:25:26.456Z","dependency_job_id":"2e20639c-ecdc-49d3-ae7d-5b9aca08a50e","html_url":"https://github.com/nilostolte/Sudoku","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/nilostolte/Sudoku","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nilostolte%2FSudoku","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nilostolte%2FSudoku/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nilostolte%2FSudoku/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nilostolte%2FSudoku/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/nilostolte","download_url":"https://codeload.github.com/nilostolte/Sudoku/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nilostolte%2FSudoku/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29844845,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-25T22:37:40.667Z","status":"ssl_error","status_checked_at":"2026-02-25T22:37:25.960Z","response_time":61,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bitwise-operators","brute-force-algorithm","c","java","optmization","sudoku","zig"],"created_at":"2025-02-19T05:41:13.419Z","updated_at":"2026-02-25T23:02:38.671Z","avatar_url":"https://github.com/nilostolte.png","language":"Zig","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cimg src=\"https://github.com/nilostolte/Sudoku/assets/80269251/ef41fe74-1b8b-415a-bab4-a65fd98ce03e\" width=\"512\" height=\"512\"\u003e\u003cbr\u003e\n\n# Sudoku\nSimple 9x9 Sudoku brute force solver with intrinsic parallel candidate set processing using bits to represent digits in the [1, 9] range, and bitwise operations to test a candidate against the candidate set, all at once.\n\nIt can be upgraded for 16x16 or 25x25 grids.\n\nThe algorithm was implemented in [Java](src), in [C](C/src), as well as in [Zig](Zig). The description \nbelow concerns the Java implementation, even though the [C implementation](C/src) is quite similar, but\nwithout classes. [Zig implementation](Zig) is similar to C's but faster and with an OOP style stack. In\nthe Zig version [many optimizations](Zig/README.md) allowed to achieve a minimum running time of 0.7916\nmiliseconds for the same\n[test grid](https://github.com/nilostolte/Sudoku?tab=readme-ov-file#main-test-grid) run on my  \nIntel Core i7-2670QM @ 2.20GHz laptop:\n\n\u003cp align=\"center\"\u003e\n    \u003cimg src=\"https://github.com/user-attachments/assets/ba6d2502-1c3b-4276-83cd-6f06a3476bcf\" width=\"400\"\u003e\n\u003c/p\u003e\n\nThe supplied Windows 64 executables for the [C](C/bin/sudoku.exe) and [Zig](Zig/sudoku.exe) implementations can be\nused to solve arbitrary grids as described in the [documentation](documentation).\n\n## Grid\n\nThis is the [class](https://github.com/nilostolte/Sudoku/blob/main/src/Grid.java) containing the grid to be solved. \n\n### Input \n\nThe grid can be initialized using a 9x9 matrix of type `char[][]` or through a linear string containing all the elements, representating \nempty elements as 0 (or ' . ' in the C or Zig version), both given line by line. The `char[][]` is the unique input, however, and it must exist before being able to use\nany other input format. Even though the 9x9 matrix contains characters (it's a `char[][]`), the digits are not represented as ASCII or Unicode\ncharacters but rather as integers. In other words, the character '0' is actually represented by 0, and so forth.\n\nIn the string input format the string is just copied over the existing input `char[][]` matrix using the static function `set`. This string uses \nASCII representation for the digits which are converted to integers by the function `set`.\n\nAn additional representation is possible, as illustrated in [Main.java](https://github.com/nilostolte/Sudoku/blob/main/src/Main.java), by \nrepresenting the charcater '0' with the character '.' in the string. In this case one adds `.replace('.','0')` at the end of the string as shown.\n\nBoth string input formats are common representations of Sudoku grids on the web.\n\n### Data Structures\n\nThe main data structure in `Grid` is `matrix` which is a 9x9 matrix in identical format as the input matrix for the grid. This is the matrix\nwhere the input matrix is copied to.\n\n#### Auxiliary Data Structures\n\nThe main auxiliary data structures are the most interesting part of this class, besides the solver algorithm itself:\n\n* `lines` - an array with 9 positions, each one, corresponding to a line in the grid, and functioning as a set where each bit represents a digit \nalready present in that line.\n* `cols` - an array with 9 positions, each one, corresponding to a column in the grid, and functioning as a set where each bit represents a digit \nalready present in that column.\n* `cells` - a 3x3 matrix, corresponding to a 3x3 cell that the grid is subdivided, with 9 positions, each one functioning as a set where each bit \nrepresents a digit already present in that cell.\n\n#### Additional Auxiliary Data Structures\n\n* `stk` - the stack to implement the backtracking algorithm. It uses an array of 81 positions. It uses the `push` and `pop` operators as shown in\nthe [algorithm](https://github.com/nilostolte/Sudoku#algorithm) below. The `push` operator not only stores the digit, its \n[binary representation](https://github.com/nilostolte/Sudoku#binary-representation-for-digits), \nthe line and column (`i` and `j`) of the element inserted in a stack node (`StkNode`), _\"pushing\"_ the node in the stack, but also inserts the \ndigit in the internal matrix (`matrix[i][j]`) as well as its binary representation into the auxiliary data structures, thus, updating the candidate\nset of the new element inserted. The `pop` operation only removes the node from the stack, but the node is not garbage collected. It remains in the\nstack as an unused element. Nodes are lazily allocated, as `null` elements are found while pushing.\n* `cel` - an array with 9 positions, each one is the inverse mapping of the indices in the lines and columns transformed into indices in the 3x3\nmatrix `cells`.\n\n#### Representing a set of present digits with bits\n\nAll main auxiliary data structures use a common notation to represent a set of digits present in the line, column, or cell, accordingly.\nA bit is set to one at the position corresponding to a digit present in the set, or set to zero if it's position corresponds to a digit that \nis absent. By reversing the bits one gets the \"candidate set\" of digits that are still missing in the corresponding line, column or cell. For\na better understanding of this candidate set scheme, please refer to the \n[subsection](https://github.com/nilostolte/Sudoku#binary-representation-for-digits) explaining how digits are represented in binary.\n\nLet's suppose a particular line, column or cell having the digits, 1, 3, 4 and 9. This set is then represented by the following binary number:\n\n**100001101** = **0x10D**\n\n* the first rightmost bit corresponds to the digit 1, and in this case it's present in the set already.\n* the second bit on its left corresponds to the digit 2, and its clearly not present yet since its value is zero.\n* bits three and four, corresponding to the digits 3 and 4, respectively, are clearly present, because they are both set to one.\n* bits five, six, seven, and eight are all zeros, and thus, digits 5, 6, 7 and 8 are clearly absent in the set.\n* bit 9 is 1. Therefore, the digit 9 is also present in the set.\n\n#### Final Candidate Set\n\nIn order to obtain a candidate set for a given `matrix[i][j]` element of the grid one calculates:\n\n**`lines[i] | cols[j] | cells[ cel[i] ][ cel[j] ]`**  (1)\n\nThe expression in (1) gives a set where all bits containing zeros correspond to the available digits that are possible to be in `matrix[i][j]`. \nThe candidate set is detected by the absent elements in the set, that is, all bits which are zero. \n\nThe interest in this notation is that the concatenation of all three sets is obtained by just using two bitwise or operations.\n\nOne can observe how `cel` inverse mapping works to access the corresponding cell in `cells`. First, `i` and `j` are used as indices in `cel`. `cel[i]` and `cel[j]` give the corresponding line and column in `cells`. Therefore, `cells[cel[i]][cel[j]]` corresponds to the cell where `matrix[i][j]` is contained.\n\n## Algorithm\n\n```java\n    public void solve() {\n        StkNode node;\n        int digit = 1, code = 1, inserted;\n        int i, j;\n        char[] line = matrix[0];\n        char c;\n        i = j = 0;\n        do {\n            c = line[j];\n            if (c == 0) {\n                inserted = lines[i]|cols[j]|cells[cel[i]][cel[j]];\n                for ( ; digit != 10 ; digit++, code \u003c\u003c= 1 ) {\n                    if (( code \u0026 inserted ) == 0 ) {\n                        push(i, j, code, digit);\n                        digit = code = 1;\n                        break;\n                    }\n                }\n                if ( digit == 10 ) {            // no insertion -\u003e backtrack to previous element\n                    node = pop();               // pop previous inserted i, j, and digit\n                    i = node.i;\n                    j = node.j;\n                    digit = node.digit;\n                    code = node.code;\n                    remove(node);               // remove digit from data structures\n                    digit++; code \u003c\u003c= 1;        // let's try next digit;\n                    line = matrix[i];           // maybe line has changed\n                    continue;                   // short-circuit line by line logic\n                }\n            }\n            if ( j == 8 ) {                     // line by line logic\n                j = -1; i++;                    // last line element, advance to next line\n                if (i \u003c 9) line = matrix[i];    // update line from grid matrix\n            }\n            j++;                                // advance to next element in the line\n        } while (i \u003c 9);\n    }\n```\n\n### Binary Representation for Digits\n\nIn the binary representation, a digit is always a power of two, since it's a number with only one bit set to 1 at the position corresponding \nto the digit. The table below shows the correspondence between digits and their binary representation:\n\n| Digit | Binary Representation | Hexadecimal | Decimal |\n| :---: | :-------------------: | :---------: | :-----: |\n| 0     | **000000000**         | 0x000       | 0       |\n| 1     | **000000001**         | 0x001       | 1       |\n| 2     | **000000010**         | 0x002       | 2       |\n| 3     | **000000100**         | 0x004       | 4       |\n| 4     | **000001000**         | 0x008       | 8       |\n| 5     | **000010000**         | 0x010       | 16      |\n| 6     | **000100000**         | 0x020       | 32      |\n| 7     | **001000000**         | 0x040       | 64      |\n| 8     | **010000000**         | 0x080       | 128     |\n| 9     | **100000000**         | 0x100       | 256     |\n\nThe binary representation as exposed in the table above is often called here as the _\"code\"_ of the digit.\n\n### Implementation of Digit Retrieval in Candidate Set\n\nAs we can see the variable `inserted` contains the \"candidate set\" for a given `matrix[i][j]`. This algorithm is quite simple but it\ncontains a major drawback. Since the digit is represented with a 1 bit in its corresponding position in variable `code`, and it accesses \nthe candidate set in a sequential way, it loops until an empty bit is found (`( code \u0026 inserted ) == 0 )`) or if it finds no available \ncandidate (`digit == 10`). \n\nThis means that even if there are no available candidates, the algorithm has to loop over all the remaining bits sequentially. Even if the binary \nrepresentation allows to deal with the candidate set with all elements in parallel, that is, all elements at once, we still have to access\nit one by one sequentially even when there are no useful results. This problem is adressed with some partial solutions as shown [here](https://github.com/nilostolte/Sudoku#parallel-check-for-no-candidates) and \n[here](https://github.com/nilostolte/Sudoku#brachless-next-candidate-determination), but this later employs far too many operations, despite \nthe fact it's a branchless solution. It's only interesting when associated with other optimizations as it has been done in the \n[C version](https://github.com/nilostolte/Sudoku#benchmarks-in-c).\n\n### Stack and Backtracking implementation\n\nDigits are tried in ascending order from 1 to 9 for each element in the grid that is not yet occupied. That's why `digit` and `code` \nvariables are both initialized with 1. Every time a new digit is tried against the candidate set, and a successful candidate is found \n(that is, when `( code \u0026 inserted ) == 0 )`), the digit is pushed on the stack.\n\nThe `push` function also updates `matrix[i][j]`, `lines[i]`, `cols[j]` and `cells[cel[i]][cel[j]]` with the new digit. Please check the \n[code](https://github.com/nilostolte/Sudoku/blob/main/src/Grid.java) and the description of \n[`stk`](https://github.com/nilostolte/Sudoku#additional-auxiliary-data-structures) for details.\n\nWhen no suitable candidate is found (that is, when `( code \u0026 inserted ) == 0 )` fails for every candidate tried), then the `for` loop\nends, and `digit == 10`. In this case, we need to backtrack, that is, remove the current candidate, and advance the previous inserted\ndigit to be the next candidate. This is taken care by the instructions found under the `if ( digit == 10 )` statement, where the previous\ncandidate is popped from the stack, removed from `matrix` and the auxiliary data structures (function `remove`), and advanced to\nbe the next candidate (`digit` is incremented and `code` is shifted left). Notice that this command sequence terminates with a `continue`\nstatement in order to skip the line by line logic. Since the line and column (`i` and `j`) of the element to be dealt next are already \nknown (they were popped from the stack), modifying `i` or `j` is not required. Also of note, if all the possible candidates were \ntried, `digit` will become 10, the `for` loop is summarily skipped, and the flow goes back into this code sequence to backtrack once \nagain, dealing with the cases of \"cascaded\" backtracking sequences.\n\nThis completes the backtracking mechanism, allowing, as can be easily infered, to obtain the solution of the input grid in the internal\nmatrix. As shown in [Main.java](https://github.com/nilostolte/Sudoku/blob/main/src/Main.java), the solution is printed using the function\n`print`. \n\n## Parallel check for no candidates\n\nThe logic to check if there are no candidates with no loops is much more involved than what's done in the algorithm above, but its not rocket\nscience. It only requires more effort to use our bit representation in a smarter way.\n\n### Mask to Filter Candidate Sets\nEvery power of two subtracted by one is always equal to a sequence of ones on the right of the position it was previously one (except in the\ncase of 0\u003csup\u003e2\u003c/sup\u003e, since there are no more binary digits on the right of 1). For example, the digit 8 in binary is 128 in our \nrepresentation. When subtracted by one, that's 127, that is, 8 bits set to 1 on the right of bit 8:\n\n**128 - 1 = 010000000 - 1 = 001111111**\n\nBy reversing every bit of this result one obtains a mask that's unique when all these bits are 1, that is, when there are no candidates from\nthe bit in the current position until the last bit:\n\n**~001111111 = 110000000**\n\nThat is, by executing a bitwise _and_ operation (`\u0026`) between this mask and a candidate set, and if the result is identical to this mask,\nwe can say there is no available candidates left in the candidate set, starting with the digit we are trying, 8 in this case.\n\nLet's check the same logic with digit = 5:\n\n**~(000010000 - 1) = ~000001111 = 111110000**\n\nThen by testing if\n\n**111110000 \u0026 inserted == 111110000**\n\nWhat this is actually saying is that there are no candidates neither for 5, neither any digit above it. In other words, this is exactly the \ncondition we were looking for.\n\nOne could call this as `reachable`, that is, more formally speaking what we've got is:\n\n**`reacheable = (~(code-1)) \u0026 0x1ff;`**\n\nNotice that we have to filter out all bits above bit 9. Then the condition searched would be written like\n\n**`if ( (inserted \u0026 reacheable ) == reacheable )`** (2)\n\n### Changes in the Algorithm\n\nIn this case `if` statement (2) can substitute the following `if` statement in the algorithm:\n\n**`if ( digit == 10 )`**\n\nAnd we should place `if` statement (2) above the `for` loop statement instead of the order presented in the algorithm. In this \ncase the `for` can be written with no final condition, since it would never be reached:\n\n**`for ( ; ; digit++, code \u003c\u003c= 1 )`** (3)\n\nThe reason for that is that if there are no candidates, as calculated here, then the condition of the `if` statement (2) must be true\nand, therefore, the `continue` statement relative to the do-while statement is executed before the `for` statement (3) is ever reached.\nThis obviously short-circuits the `for` statement (3), since it is now below the `if` statement (2). If the `for` statement (3) is reached,\nthe condition in the `if` statement (2) must have been false. In this situation there will always be a valid candidate and the\n`break` command relative to the `for` statement (3) will be executed, always ending this loop with no need to test the end condition.\n\n### Simplification of this Optimization - Eliminating the Mask\n\nAnother way to see this optimization is by observing that instead of calculating the mask as explained above, which implies using an intermediate\nvariable `reacheable`, one can infere an equivalent conclusion by simply discarding this variable and using the following test instead of if statement (2):\n\n**`if ( (inserted + code ) \u003e 511 )`**  (2a)\n\nWhich we call here an alternative to (2), or (2a) for short.\n\nIf there are only ones in `inserted` starting at the position of the 1 in `code`, adding `code` to `inserted` will result in\nsome value that is obviously beyond 511 (or 0x1ff). Therefore, we can detect the same situation with only the test (2a), not only\neliminating the need of calculating the mask, but also the need of the variable `reacheable`.\n\n## Benchmarks\n\nThe benchmarks to measure algorithm performance were performed on an i7 2.2 Ghz machine in Java and in C. \nThe [executable file compiled in C](https://github.com/nilostolte/Sudoku/blob/main/C/bin/sudoku.exe) has \nbeen done with optimization option `-O3` using the **gcc** compiler on Windows provided in \n[**w64devkit**](https://github.com/skeeto/w64devkit), which is a Mingw-w64 **gcc** compiler that is portable (can be installed by just\ncopying the directory structure in disk, SD card, or thumb drive).\n\n### Main Test Grid\n\nThe benchmarks were executed with several different grids, but particularly with this\none, which is known to be time consuming in automatic methods, and used to compare speed of different methods\non the web:\n\n \u0026nbsp; |  _1_  |  _2_  |  _3_  |  _4_  |  _5_  |  _6_  |  _7_  |  _8_  |  _9_ \n:------:|:-----:|:-----:|:-----:|:-----:|:-----:|:-----:|:-----:|:-----:|:-----:\n**_1_** | \u003cimg src=\"8.svg\" width=\"32\" height=\"32\"\u003e |\u0026nbsp; |\u0026nbsp; |\u0026nbsp; |\u0026nbsp; |\u0026nbsp; |\u0026nbsp; |\u0026nbsp; |\u0026nbsp; |\n**_2_** |\u0026nbsp; |\u0026nbsp; | \u003cimg src=\"3.svg\" width=\"32\" height=\"32\"\u003e | \u003cimg src=\"6.svg\" width=\"32\" height=\"32\"\u003e |\u0026nbsp; |\u0026nbsp; |\u0026nbsp; |\u0026nbsp; |\u0026nbsp; |\n**_3_** |\u0026nbsp; | \u003cimg src=\"7.svg\" width=\"32\" height=\"32\"\u003e |\u0026nbsp; |\u0026nbsp; | \u003cimg src=\"9.svg\" width=\"32\" height=\"32\"\u003e |\u0026nbsp; | \u003cimg src=\"2.svg\" width=\"32\" height=\"32\"\u003e |\u0026nbsp; |\u0026nbsp; |\n**_4_** |\u0026nbsp; | \u003cimg src=\"5.svg\" width=\"32\" height=\"32\"\u003e |\u0026nbsp; |\u0026nbsp; |\u0026nbsp; | \u003cimg src=\"7.svg\" width=\"32\" height=\"32\"\u003e |\u0026nbsp; |\u0026nbsp; |\u0026nbsp; |\n**_6_** |\u0026nbsp; |\u0026nbsp; |\u0026nbsp; |\u0026nbsp; | \u003cimg src=\"4.svg\" width=\"32\" height=\"32\"\u003e | \u003cimg src=\"5.svg\" width=\"32\" height=\"32\"\u003e | \u003cimg src=\"7.svg\" width=\"32\" height=\"32\"\u003e |\u0026nbsp; |\u0026nbsp; |\n**_6_** |\u0026nbsp; |\u0026nbsp; |\u0026nbsp; | \u003cimg src=\"1.svg\" width=\"32\" height=\"32\"\u003e |\u0026nbsp; |\u0026nbsp; |\u0026nbsp; | \u003cimg src=\"3.svg\" width=\"32\" height=\"32\"\u003e |\u0026nbsp; |\n**_7_** |\u0026nbsp; |\u0026nbsp; | \u003cimg src=\"1.svg\" width=\"32\" height=\"32\"\u003e |\u0026nbsp; |\u0026nbsp; |\u0026nbsp; |\u0026nbsp; | \u003cimg src=\"6.svg\" width=\"32\" height=\"32\"\u003e | \u003cimg src=\"8.svg\" width=\"32\" height=\"32\"\u003e |\n**_8_** |\u0026nbsp; |\u0026nbsp; | \u003cimg src=\"8.svg\" width=\"32\" height=\"32\"\u003e | \u003cimg src=\"5.svg\" width=\"32\" height=\"32\"\u003e |\u0026nbsp; |\u0026nbsp; |\u0026nbsp; | \u003cimg src=\"1.svg\" width=\"32\" height=\"32\"\u003e |\u0026nbsp; |\n**_9_** |\u0026nbsp; | \u003cimg src=\"9.svg\" width=\"32\" height=\"32\"\u003e |\u0026nbsp; |\u0026nbsp; |\u0026nbsp; |\u0026nbsp; | \u003cimg src=\"4.svg\" width=\"32\" height=\"32\"\u003e |\u0026nbsp; |\u0026nbsp; |\n\n### Benchmarks in Java\n\nThe minimal time measured for the optimized algorithm to solve the above grid after several attempts was 10 miliseconds, \nand the double for the unoptimized algorithm. Nevertheless, the times verified were quite variable as usual in Java \nwhile measuring fast algorithms like this. This is the reason it would worth trying to implement it with an entirely \ncompiled language (Java is only compiled when the JIT compiler is triggered) to verify if execution times are less \nvariable. It looks like that for this kind of problem, an enterily compiled language would be more appropriate, since \none expects similar times for the same grid running at different times. Unfortunately this is not the case for this \nJava implementation.\n\n### Benchmarks in C\n\nAstonishingly, execution times running the executable compiled in C were only slightly more constant than in Java. The \ntimes varied from 1.5 miliseconds to 5.26 miliseconds. However, these variations were considerably much less significant \nthan in Java. Also, C offered roughly about an order of  magnitude to about twice less time than the Java implementation \nof the same optimized algorithm. Several optimizations were devised besides the ones mentioned below. After all these\noptimizations were applied, one obtained a significant \n[improvement in performance](https://github.com/nilostolte/Sudoku#table-to-convert-from-bit-representation) and the\n[Windows 64 executable supplied](https://github.com/nilostolte/Sudoku/blob/main/C/bin/sudoku.exe) was generated with\nthe resulting source code.\n\n### Brachless Next Candidate Determination\n\nThe parallel test for no candidates allows to discard unnecessary `for` loop iterations, while also discarding the unecessary end \ncondition of the `for` loop (since the order of the `if` statement (2) and the `for` statement was reversed). Nevertheless, for \ndetecting the first candidate one still has to loop and test the digits one by one sequentially against the `inserted` set.\n\nBut there is a way to calculate the next candidate without any loop. The technique can be illustrated through and example.\nSupposing the set `included = 101011110` (that is {9,7,5,4,3,2}, the set of digits already inserted) and \n`digit = 000000010` (2), one starts by adding both:\n\n```\n   101011110    // included digits set: {9,7,5,4,3,2}\n + 000000010    // digit = 2\n```\nWhich is equal to **`101100000`**. One now does an exclusive or with `included`:\n```\n   101100000\n ^ 101011110\n```\nWhich is equal to **`000111110`**. One now adds digit again:\n\n```\n   000111110\n + 000000010\n```\nWhich is equal to **`001000000`**. The bit representation of the next candidate, is obtained by shifting one position to\nright:\n\n```\n   001000000 \u003e\u003e 1\n```\nWhich is equal to **`000100000`**. This corresponds to the digit 6, which is exactly the first zero bit found by\napplying the for loop (3).\n\nTherefore, assuming `code` as the bit representation of the digit, one calculates the next candidate doing:\n\n```java\n    code = (((code + inserted) ^ inserted) + code) \u003e\u003e 1; // branchless code calculation\n```\n\nThe problem is that one only obtains the bit representation of the digit, not the digit itself. As, one can see, `digit` is\nnecessary to be able to use this technique.\n\n### Branchless Transformation from Bit Representation\n\nTo obtain the digit from its code,  one \"assembles\" the bit configuration of the digit from its bit representation (`code`) as follows:\n\n```java\n    digit = ( code \u003e\u003e 8 ) |\n            (( code \u0026 0x40 ) \u003e\u003e 6 ) | \n            (( code \u0026 0x140) \u003e\u003e 5 ) |\n            (( code \u0026 0xf0 ) \u003e\u003e 4 ) |\n            (( code \u0026 0x20 ) \u003e\u003e 3 ) |\n            (( code \u0026 0x14 ) \u003e\u003e 2 ) |\n            (( code \u0026 0x0c ) \u003e\u003e 1 ) |\n            ( code \u0026 3);\n```\n\nThis conversion is not only complex to understand, but also requires a high number of operations. Trying out this code and the\nbrachless calculation of the next candidate as shown \n[previously](https://github.com/nilostolte/Sudoku/blob/main/README.md#brachless-next-candidate-determination), the minimal time\nin C passed from 1.5 to 1.4 miliseconds, which apparently wouldn't seem to justify the effort. \n\nHowever, after multiple further opimizations, including using `register` variables, the minimal running time was reduced to \n1.2 miliseconds. This corresponds to a speedup of roughly 20%, which starts to become quite consequential. It's clear that \nthis is also consequence of the highly \"imperative\" way of implementing this [algorithm](https://github.com/nilostolte/Sudoku#algorithm)\nwhich manifestly highly benefits the C implementation, that in itself is more easily optimizable by employing extremely low level\ngimmicks that are absent in Java.\n\n### Table to Convert from Bit Representation\n\nAnother way to do the calculation [above](https://github.com/nilostolte/Sudoku#branchless-transformation-from-bit-representation)\nis using tables. For example, in C:\n\n```C\n    unsigned short c1[] = { 0, 1, 2, 0, 3 };\n    unsigned short c2[] = { 0, 4, 5, 0, 6 };\n    unsigned short c3[] = { 0, 7, 8, 0, 9 };\n```\nOne can compose the digit from its bit representation `code` in the following way:\n\n```C\n    digit = c1[code \u0026 7] | c2[(code \u003e\u003e 3) \u0026 7] | c3[code \u003e\u003e 6];\n```\nThis code is more understandable than the [previous](https://github.com/nilostolte/Sudoku#branchless-transformation-from-bit-representation)\none. If the digit is 1, 2 or 3, one simply filters the first 3 bits of `code`and index the table `c1` with this result. Position 3 is invalid \nsince `code` has only 1 bit set, and, thus, it can't be 3. Notwithstanding, the resulting operation can be zero, in the case the binary\nrepresentation doesn't have any bit set in that range. In this case, to satisfy the branchless logic, the table value is 0.\nIf the digit is 4, 5 or 6, one shifts `code` to the right 3 positions and filters the first 3 bits and index the table `c2`\nwith this result. The same logic applies to digits 7, 8 and 9, using table `c3`. Since one doesn't know which one is correct, one simply\napply a binary or operation with the 3 results, after all only one of them contains the good digit. The other two will be zero.\n\nTrying this solution instead of the [previous](https://github.com/nilostolte/Sudoku#branchless-transformation-from-bit-representation),\nhad a significant impact in the minimal execution time of the compiled C code, that was reduced to practically 1 millisecond, that is,\nan optimization of more than 30%, since the initial minimal time in our comparisons was 1.5 milliseconds.\n\n## Conclusion\n\nThe several optimizations proposed are complex to understand and most of them do not result in a significant speed up. The \n[initial algorithm](https://github.com/nilostolte/Sudoku#algorithm) and \nin the [Java](https://github.com/nilostolte/Sudoku/tree/main/src) and [C](https://github.com/nilostolte/Sudoku/tree/main/C/src) \ncodes, are more clear and relatively easy to understand after the binary representation is understood.\n\nThe idea of parallelizing the code by dealing with the whole candidate set at once just using binary representation is promising.\nHowever, it falls short if one was thinking in using its intrisic parallelism in the entire algorithm. As seen \n[above](https://github.com/nilostolte/Sudoku#brachless-next-candidate-determination), the\napproach allows branchless solutions for the sequential search of a candidate from an arbitrary digit value, which only partially\nexploits this intrisic paralellism. Notwithstanding, it's heavily relying on the integer addition carry propagation mechanism, \nwhich is actually a sequential mechanism, but implemented highly efficiently in hardware. This is just additional ingenuity, but\nnot the same approach. The actual problem in this partial solution is that it's highly complex and requires a high number \nof operations. Thus, it highly diverges from the extreme simplicity of the original algorithm. Fortunately, associated with numerous \nother low level optimizations in C language, it contributed to a significant \nspeedup (as can be seen [here](https://github.com/nilostolte/Sudoku#branchless-transformation-from-bit-representation)), and\na better speedup as well as less complexity (as seen [here](https://github.com/nilostolte/Sudoku#table-to-convert-from-bit-representation)). \n\nA comparative test between the Java implementation and an identical C inplementation has given a considerable advantage to the C\nimplementation, not only in terms of raw performance, but also in terms of less variability in times measured for solving\nthe same grid, even though, variable execution times were also present in the C implementation. This was expected since Java\nactivates the JIT compiler not quite regularly in codes that are executed in short ammounts of time like this one.\n\nGiven the extremely short execution times, the low level nature of the [original algorithm](https://github.com/nilostolte/Sudoku#algorithm),\nand the considerable amount of low level optmizations that are possible in C language, one may confortably conclude that C is the\nmost appropriate language to use the algorithm, since it will provide faster answers. This means, that the C implementation can be seen\nas the ideal engine for an interactive program where the grid can be entered through a GUI and that the solution must be supplied\nin real time when it is requested by the user.\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnilostolte%2Fsudoku","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fnilostolte%2Fsudoku","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnilostolte%2Fsudoku/lists"}