{"id":13480409,"url":"https://github.com/Zunawe/md5-c","last_synced_at":"2025-03-27T10:32:40.555Z","repository":{"id":37743242,"uuid":"81290831","full_name":"Zunawe/md5-c","owner":"Zunawe","description":"A simple, commented reference implementation of the MD5 hash algorithm","archived":false,"fork":false,"pushed_at":"2023-12-30T21:05:32.000Z","size":62,"stargazers_count":212,"open_issues_count":4,"forks_count":56,"subscribers_count":5,"default_branch":"main","last_synced_at":"2024-10-30T14:41:34.347Z","etag":null,"topics":["c","md5"],"latest_commit_sha":null,"homepage":"","language":"C","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"unlicense","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Zunawe.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2017-02-08T05:21:48.000Z","updated_at":"2024-09-30T20:38:01.000Z","dependencies_parsed_at":"2024-10-30T14:34:39.526Z","dependency_job_id":"fbdbb706-d032-43da-8f8b-14864bc89dfd","html_url":"https://github.com/Zunawe/md5-c","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Zunawe%2Fmd5-c","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Zunawe%2Fmd5-c/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Zunawe%2Fmd5-c/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Zunawe%2Fmd5-c/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Zunawe","download_url":"https://codeload.github.com/Zunawe/md5-c/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245827264,"owners_count":20678950,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["c","md5"],"created_at":"2024-07-31T17:00:38.835Z","updated_at":"2025-03-27T10:32:40.296Z","avatar_url":"https://github.com/Zunawe.png","language":"C","funding_links":[],"categories":["C"],"sub_categories":[],"readme":"# MD5\n\nTakes an input string or file and outputs its MD5 hash.\n\nThis repo is gaining a little more traffic than I expected, so I'll put this here as a little disclaimer. I wrote this code as a side project in college in an attempt to better understand the algorithm. I consider this repository to be a reference implementation with a good step by step walkthrough of the algorithm, not necessarily code to be built upon. I did verify the correctness of the output by comparing to other existing standalone programs. However, I did not research edge cases, set up automated testing, or attempt to run the program on any machine other than the laptop I had at the time, so here's the warning:\n\nThis code may be generally correct, but you should consider it untested to be on the safe side. There may be edge cases, vulnerabilities, or optimizations I did not consider when I wrote this. I can only confirm that this code probably worked correctly on a single computer in 2017.\n\nKnowing that, do feel free to use this code in any way you wish, no credit needed. And if you find a problem, raise an issue.\n\n### Implementing into Code\n\nIf you want to include the md5 algorithm in your own code, you'll only need `md5.c` and `md5.h`.\n\n```c\n#include \"md5.h\"\n\n...\n\nvoid foo(){\n    uint8_t result[16];\n    md5String(\"Hello, World!\", result);       // *result = 65a8e27d8879283831b664bd8b7f0ad4\n\n    FILE bar = fopen(\"bar.txt\", \"r\");\n    md5File(bar, result);                     // Reads a file from a file pointer\n    md5File(stdin, result);                   // Can easily read from stdin\n\n    // Manual use\n    ..\n    MD5Context ctx;\n    md5Init(\u0026ctx);\n\n    ...\n    md5Update(\u0026ctx, input1, input1_size);\n    ...\n    md5Update(\u0026ctx, input2, input2_size);\n    ...\n    md5Update(\u0026ctx, input3, input3_size);\n    ...\n\n    md5Finalize(\u0026ctx);\n\n    ctx.digest;                               // Result of hashing (as uint8_t* with 16 bytes)\n}\n```\n\n### Command Line\n\nYou can directly use the binary built with this Makefile to process text or files in the command line.\n\nAny arguments will be interpreted as strings. Each argument will be interpreted as a separate string to hash, and will be given its own output (in the order of input).\n\n```shell\n$ make\n\n$ ./md5 \"Hello, World!\"\n65a8e27d8879283831b664bd8b7f0ad4\n\n$ ./md5 \"Multiple\" Strings\na0bf169f2539e893e00d7b1296bc4d8e\n89be9433646f5939040a78971a5d103a\n\n$ ./md5 \"\"\nd41d8cd98f00b204e9800998ecf8427e\n\n$ ./md5 \"Can use \\\" escapes\"\n7bf94222f6dbcd25d6fa21d5985f5634\n```\nIf no arguments are given, input is taken from standard input.\n```shell\n$ make\n\n$ echo -n \"Hello, World!\" | ./md5\n65a8e27d8879283831b664bd8b7f0ad4\n\n$ echo \"Hello, World!\" | ./md5\nbea8252ff4e80f41719ea13cdf007273\n\n$ echo \"File Input\" \u003e testFile | ./md5\nd41d8cd98f00b204e9800998ecf8427e\n\n$ cat testFile | ./md5\n7dacda86e382b27c25a92f8f2f6a5cd8\n\n```\nAs seen above, it is important to note that many programs will output a newline character after their output. This newline *will* affect the output of the MD5 algorithm. `echo` has the `-n` flag that prevents the output of said character.\n\nIf entering input by hand, end collection of data by entering an EOF character (`Ctrl+D` in some cases).\n\n# The Algorithm\n\nWhile researching this algorithm, the only relatively complete description I found came from RSA Data Security itself in [this memo][1]. And while the description is adequate, any confusion is very difficult to clear up, especially given the nature of the algorithm's output. So here I will try to describe the algorithm used in these implementations with examples.\n\nThe algorithm considers all words to be little-endian. I will also specify where this may be confusing.\n\nThe algorithm takes in an input of arbitrary length in bits. This can be a string, a file, a number, a struct, etc... It also doesn't need to be byte-aligned, though it almost always is. We'll call this input the message. The output is the digest.\n\n#### Step 1: Padding\n\nThe provided message is padded by appending bits to the end until its length is congruent to `448 mod 512` bits. In other words, the message is padded so that its length is 64 bits less than the next multiple of 512. If the original message's length already meets this requirement before padding, it is still padded with 512 bits.\n\nThe padding is simply a single \"1\" bit at the end of the message followed by enough \"0\" bits to satisfy the length condition above.\n\n##### Example\n\nLet's pass the string \"Hello, world!\" to the algorithm. Those characters converted to hexadecimal numbers look like this:\n```\n48 65 6c 6c 6f 2c 20 77 6f 72 6c 64 21\n```\n(Note: Strings are often null-terminated. This null character is not taken into account, as you will see.)\n\nNow we have to pad our message bits:\n```\n0x 48 65 6c 6c 6f 2c 20 77 6f 72 6c 64 21 80 00 00\n0x 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00\n0x 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00\n0x 00 00 00 00 00 00 00 00\n```\nNote the `0x80` right after the end of our message. We're writing a stream of bits, not bytes. Setting the bit after our message to \"1\" and the next 7 bits to \"0\" means writing the byte `1000 0000` or `0x80`.\n\n#### Step 2: Appending the Length\n\nNext, the length of the message modulus 2^64 is appended in little endian to the message to round out the total length to a multiple of 512. This length is the number of *bits* in the original message, modulus 2^64. It's common to split this number into two 32-bit words, so keep careful track of which bytes are put where; the highest order byte should be the last byte in the message. This will round out the length of the whole message to a multiple of 512.\n\n##### Example 1\n\nThe length of our message is 104 bits. The 64-bit representation of the number 104 in hexadecimal is `0x00000000 00000068`. So we'll append that number to the end.\n\n```\n0x 48 65 6c 6c 6f 2c 20 77 6f 72 6c 64 21 80 00 00\n0x 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00\n0x 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00\n0x 00 00 00 00 00 00 00 00 68 00 00 00 00 00 00 00\n```\n\n(We're writing in little-endian, so the lowest order byte is written first.)\n\nIf you're holding the length in two separate 32-bit words, make sure to append the lower order bytes first.\n\n##### Example 2\n\nBecause our \"Hello, world!\" example is so small and doesn't give a length with more than two digits, let's say we have a different, bigger message of `0x12345678 90ABCDEF` bits and this chunk we're looking at is just the tail end that we have to pad out. The appended length would look like this:\n\n```\n0x 48 65 6c 6c 6f 2c 20 77 6f 72 6c 64 21 80 00 00\n0x 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00\n0x 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00\n0x 00 00 00 00 00 00 00 00 EF CD AB 90 78 56 34 12\n```\n\n#### Step 3: Initializing the Buffer\n\nThe variables that will eventually hold our digest must be initialized to the following:\n\n```\nA = 0x01234567\nB = 0x89abcdef\nC = 0xfedcba98\nD = 0x76543210\n```\n\n#### Step 4: Processing\n\nThere are four functions defined in the RSA memo that are used to collapse three 32-bit words into one 32-bit word:\n\n```\nF(X, Y, Z) = (X \u0026 Y) | (~X \u0026 Z)\nG(X, Y, Z) = (X \u0026 Z) | (Y \u0026 ~Z)\nH(X, Y, Z) = X ^ Y ^ Z\nI(X, Y, Z) = Y ^ (X | ~Z)\n```\n\nThese are bitwise operations.\n\nWe also have to do a left rotate on the bits in a word. That is, shift the bits left, and move overflow to the right. Like spinning a bottle and seeing the label loop around. The function is defined as follows:\n\n```\nrotate_left(x, n) = (x \u003c\u003c n) | (x \u003e\u003e (32 - n))\n```\n\nThe constants in K and S can be found at the bottom of this section.\n\nThe message is split into blocks of 512 bits. Each block is split into 16 32-bit words. For each block, do the following:\n\n```c\nAA = A;\nBB = B;\nCC = C;\nDD = D;\n\nfor(i in 0 to 63){\n    if(0 \u003c= i \u003c= 15){\n        E = F(BB, CC, DD);\n        j = i;\n    }\n    else if(16 \u003c= i \u003c= 31){\n        E = G(BB, CC, DD);\n        j = ((i * 5) + 1) % 16;\n    }\n    else if(32 \u003c= i \u003c= 47){\n        E = H(BB, CC, DD);\n        j = ((i * 3) + 5) % 16;\n    }\n    else{\n        E = I(BB, CC, DD);\n        j = (i * 7) % 16;\n    }\n\n    temp = DD;\n    DD = CC;\n    CC = BB;\n    BB = BB + rotate_left(AA + E + K[i] + input[j], S[i]);\n    AA = temp;\n}\n\nA += AA;\nB += BB;\nC += CC;\nD += DD;\n```\n\nThe RSA memo explicitly lists each step instead of using control structures. The result is the same.\n\nAn example for this step is not particularly useful, as the data produced by the loop is not very meaningful for observation.\n\n#### Step 5: Output\n\nThe digest is a 128-bit number written in little endian, and is contained in A, B, C, and D after the algorithm is finished. Just arrange the bytes so that the lowest-order byte of the digest is the lowest-order byte of A, and the highest-order byte of the digest is the highest-order byte of D.\n\n##### Example\n\nHere is the output of a few strings to check against:\n\n\"Hello, world!\"\n\n```\n6cd3556deb0da54bca060b4c39479839\n```\n\n\"\" (empty string)\n\n```\nd41d8cd98f00b204e9800998ecf8427e\n```\n\n\"The quick brown fox jumps over the lazy dog.\"\n\n```\ne4d909c290d0fb1ca068ffaddf22cbd0\n```\n\n#### Constants and Functions\n\n```c\nA = 0x01234567\nB = 0x89abcdef\nC = 0xfedcba98\nD = 0x76543210\n\nK[] = {0xd76aa478, 0xe8c7b756, 0x242070db, 0xc1bdceee,\n       0xf57c0faf, 0x4787c62a, 0xa8304613, 0xfd469501,\n       0x698098d8, 0x8b44f7af, 0xffff5bb1, 0x895cd7be,\n       0x6b901122, 0xfd987193, 0xa679438e, 0x49b40821,\n       0xf61e2562, 0xc040b340, 0x265e5a51, 0xe9b6c7aa,\n       0xd62f105d, 0x02441453, 0xd8a1e681, 0xe7d3fbc8,\n       0x21e1cde6, 0xc33707d6, 0xf4d50d87, 0x455a14ed,\n       0xa9e3e905, 0xfcefa3f8, 0x676f02d9, 0x8d2a4c8a,\n       0xfffa3942, 0x8771f681, 0x6d9d6122, 0xfde5380c,\n       0xa4beea44, 0x4bdecfa9, 0xf6bb4b60, 0xbebfbc70,\n       0x289b7ec6, 0xeaa127fa, 0xd4ef3085, 0x04881d05,\n       0xd9d4d039, 0xe6db99e5, 0x1fa27cf8, 0xc4ac5665,\n       0xf4292244, 0x432aff97, 0xab9423a7, 0xfc93a039,\n       0x655b59c3, 0x8f0ccc92, 0xffeff47d, 0x85845dd1,\n       0x6fa87e4f, 0xfe2ce6e0, 0xa3014314, 0x4e0811a1,\n       0xf7537e82, 0xbd3af235, 0x2ad7d2bb, 0xeb86d391}\n\nS[] = {7, 12, 17, 22, 7, 12, 17, 22, 7, 12, 17, 22, 7, 12, 17, 22,\n       5,  9, 14, 20, 5,  9, 14, 20, 5,  9, 14, 20, 5,  9, 14, 20,\n       4, 11, 16, 23, 4, 11, 16, 23, 4, 11, 16, 23, 4, 11, 16, 23,\n       6, 10, 15, 21, 6, 10, 15, 21, 6, 10, 15, 21, 6, 10, 15, 21}\n\n\nF(X, Y, Z) = (X \u0026 Y) | (~X \u0026 Z)\nG(X, Y, Z) = (X \u0026 Z) | (Y \u0026 ~Z)\nH(X, Y, Z) = X ^ Y ^ Z\nI(X, Y, Z) = Y ^ (X | ~Z)\n\nrotate_left(x, n) = (x \u003c\u003c n) | (x \u003e\u003e (32 - n))\n```\n\n[1]: https://tools.ietf.org/html/rfc1321\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FZunawe%2Fmd5-c","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FZunawe%2Fmd5-c","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FZunawe%2Fmd5-c/lists"}