{"id":19297202,"url":"https://github.com/jimenezrick/patch-AuthenticAMD","last_synced_at":"2025-04-22T08:31:13.348Z","repository":{"id":1483934,"uuid":"1730842","full_name":"jimenezrick/patch-AuthenticAMD","owner":"jimenezrick","description":"Utility to patch binaries generated by the Intel C++ Compiler to get the maximum performance on AMD CPUs","archived":false,"fork":false,"pushed_at":"2011-11-20T22:53:24.000Z","size":216,"stargazers_count":182,"open_issues_count":1,"forks_count":13,"subscribers_count":10,"default_branch":"master","last_synced_at":"2024-11-09T23:02:23.688Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"http://r.untroubled.be/","language":"C","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/jimenezrick.png","metadata":{"files":{"readme":"README","changelog":null,"contributing":null,"funding":null,"license":"COPYING","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2011-05-11T00:07:11.000Z","updated_at":"2024-10-11T17:24:21.000Z","dependencies_parsed_at":"2022-07-29T18:08:55.522Z","dependency_job_id":null,"html_url":"https://github.com/jimenezrick/patch-AuthenticAMD","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jimenezrick%2Fpatch-AuthenticAMD","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jimenezrick%2Fpatch-AuthenticAMD/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jimenezrick%2Fpatch-AuthenticAMD/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jimenezrick%2Fpatch-AuthenticAMD/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/jimenezrick","download_url":"https://codeload.github.com/jimenezrick/patch-AuthenticAMD/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":250206147,"owners_count":21392195,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-09T23:01:47.663Z","updated_at":"2025-04-22T08:31:12.967Z","avatar_url":"https://github.com/jimenezrick.png","language":"C","funding_links":[],"categories":["C"],"sub_categories":[],"readme":" patch-AuthenticAMD\n====================\n\nUtility to patch binaries generated by the Intel C++ Compiler to get the maximum performance on AMD\nCPUs.\n\nThe Intel C++ Compiler adds to generated binaries a CPUID test that looks if they are executed on a\nIntel CPU, so the binaries don't run with full optimizations on non-Intel CPUs. This utility patches\nsuch CPUID tests, so the binaries can run on an AMD CPU as if they were on a Intel CPU.\n\n**Tested on Linux with Intel C++ Compiler 10.x/11.x (it might work with future releases of ICC).\nMaybe it also works with Fortran compiler if it has the same CPUID test, but this is not\nconfirmed.**\n\n*It seems that ICC 11.x doesn't impose anymore a penalty on the performance when running the\ncompiled binaries on AMD. But the CPUID tests are still present on those binaries and this program\ncan remove them.*\n\n*There are some GNU libraries that also have CPUID tests, so in case you generate a static binary\nwith that code included, they could be affected, but in the performed tests the comparisons used a\ndifferent instruction so they were left intact. Anyway, those tests are not evil like the Intel\nones.*\n\n\n How to compile\n----------------\n\nYou must have the libelf library. In Ubuntu 8.04 just install the package libelfg0-dev. With a\nversion around 0.8.6 it should work well. Now you can compile with the command:\n\n\tmake\n\n\n Benchmark\n-----------\n\nIn the source code tarball there is a file called benchmark-partial-sums.c (taken from\n*The Computer Language Shootout* http://shootout.alioth.debian.org). This code can be optimized\nwith SSE2 by the Intel compiler.\n\nCompile this code with:\n\n\ticc -O3 -xW -o benchmark-partial-sums benchmark-partial-sums.c\n\nTo run the benchmark use:\n\n\ttime ./benchmark-partial-sums 100000000\n\nThese were the average results on my AMD64 CPU:\n\n- GCC compiled executable --\u003e 45.5s (compiled with -O3 -msse2)\n- ICC original executable --\u003e 31.5s (probably not taking the SSE2 optimized path in the binary)\n- ICC patched executable  --\u003e 25.5s\n\n\n How to patch a binary generated by Intel C++ Compiler\n-------------------------------------------------------\n\nJust run:\n\n\tpatch-AuthenticAMD \u003cexecutable_name\u003e\n\n\n How to patch the Intel C++ Compiler\n-------------------------------------\n\nIn the /path/to/icc/lib there are the shared libraries used by the compiler. It seems that\npatching all of them, the binaries generated by ICC won't have the CPUID test. So they run perfectly\nin AMD. Probably only one of the shared libraries is the responsible of adding such test. Anyway, I\ncan't confirm this because I didn't try it.\n\n**But you are warned that modifying, disassembling or reverse engineering the Intel C++ Compiler goes\nagainst the Intel EULA (End User License Agreement). So do at your own risk.**\n\nIf you want to try, run this command in /path/to/icc/lib:\n\n\tfor i in *; do patch-AuthenticAMD -ev $i; done\n\n\n Report results\n----------------\n\nPlease, this tool seems to work well, but it is not very tested. Send me an email with your\nresults. You can also send me questions, suggestions, or anything. Feel free to send me questions\nabout the code:\n\n\tjimenezrick@gmail.com\n\n\n The content of the doc directory\n------------------------------------\n\n- libelf by Example.mht: http://people.freebsd.org/~jkoshy/download/libelf/article.html\n\ta tutorial for libelf in FreeBSD. Almost everything it says is valid for Linux.\n- naughty-intel.html: the person who wrote this article explains everything one need to know about\n\tthe subject.\n\n\n How it works\n--------------\n\nHere it is a binary compiled by ICC 10.1 disassembled:\n\n0000000000402c5c \u003c__intel_cpu_indicator_init\u003e:\n\t\t\t\t\t...\n\t\t\t\t\t\t# Get CPU vendor string (EAX = 0)\n  402c84:\t48 33 c0             \txor    %rax,%rax\n  402c87:\t0f a2                \tcpuid\n  402c89:\t89 45 f8             \tmov    %eax,-0x8(%rbp)\n  402c8c:\t89 5d fc             \tmov    %ebx,-0x4(%rbp)\n  402c8f:\t89 4d ec             \tmov    %ecx,-0x14(%rbp)\n  402c92:\t89 55 f4             \tmov    %edx,-0xc(%rbp)\n  402c95:\t48 c7 c0 01 00 00 00 \tmov    $0x1,%rax\n\t\t\t\t\t\t# Get CPU capabilities (EAX = 1)\n  402c9c:\t0f a2                \tcpuid\n  402c9e:\t89 45 f0             \tmov    %eax,-0x10(%rbp)\n  402ca1:\t89 5d e0             \tmov    %ebx,-0x20(%rbp)\n  402ca4:\t89 4d e8             \tmov    %ecx,-0x18(%rbp)\n  402ca7:\t89 55 e4             \tmov    %edx,-0x1c(%rbp)\n\t\t\t\t\t...\n  402cca:\t8b 45 fc             \tmov    -0x4(%rbp),%eax\n\t\t\t\t\t\t# Compare the first four bytes of your vendor string with \"Genu\"\n  402ccd:\t3d 47 65 6e 75       \tcmp    $0x756e6547,%eax\n  402cd2:\tbb 01 00 00 00       \tmov    $0x1,%ebx\n  402cd7:\t75 1b                \tjne    402cf4 \u003c__intel_cpu_indicator_init+0x98\u003e\n  402cd9:\t8b 45 f4             \tmov    -0xc(%rbp),%eax\n\t\t\t\t\t\t# Compare the first four bytes of your vendor string with \"ineI\"\n  402cdc:\t3d 69 6e 65 49       \tcmp    $0x49656e69,%eax\n  402ce1:\t75 11                \tjne    402cf4 \u003c__intel_cpu_indicator_init+0x98\u003e\n  402ce3:\t8b 45 ec             \tmov    -0x14(%rbp),%eax\n\t\t\t\t\t\t# Compare the first four bytes of your vendor string with \"ntel\"\n  402ce6:\t3d 6e 74 65 6c       \tcmp    $0x6c65746e,%eax\n  402ceb:\t75 07                \tjne    402cf4 \u003c__intel_cpu_indicator_init+0x98\u003e\n  402ced:\tba 01 00 00 00       \tmov    $0x1,%edx\n  402cf2:\teb 02                \tjmp    402cf6 \u003c__intel_cpu_indicator_init+0x9a\u003e\n  402cf4:\t33 d2                \txor    %edx,%edx\n\t\t\t\t\t\t# If you has \"GenuineIntel\" everything goes OK. Later are more test\n\t\t\t\t\t\t# to see the capabilities of your CPU and they are taken in account.\n\t\t\t\t\t...\n\t\t\t\t\t\t# Here it loads in RAX the address of a global variable (_DYNAMIC+0x1d8)\n\t\t\t\t\t\t# where a value representing the the capabilities of your CPU is stored.\n\t\t\t\t\t\t# This value also says if your CPU is non-INTEL which means that the\n\t\t\t\t\t\t# true capabilities of your CPU are not full used (i.e. SSE).\n  402d7e:\t48 8b 05 a3 56 20 00 \tmov    0x2056a3(%rip),%rax        # 608428 \u003c_DYNAMIC+0x1d8\u003e\n\t\t\t\t\t\t# In EBX the value of this global variable is ready to be copied to\n\t\t\t\t\t\t# memory. An INTEL CPU with SSE and SSE2 has EBX = 0x800. An AMD CPU\n\t\t\t\t\t\t# with SSE and SSE2 has EBX = 0x1 which means that the SSE and SSE2 \n\t\t\t\t\t\t# capabilities are not recognized.\n  402d85:\t89 18                \tmov    %ebx,(%rax)\n\t\t\t\t\t...\n\nThe patch-AuthenticAMD utility remplaces those three CMP instructions by other three CMPs that look\nfor the vendor string AuthenticAMD. The libelf library is used to analyze the structure of the\nELF binary to be patched so we can find the executable sections and do the replacements only in that\nsections, so we can garantee that what we remplaces is a machine instruction and no another thing.\nAlso it is possible to by pass libelf and make replacements in all the binary.\n\nThe binaries generated with the Intel C++ Compiler usually have several execution branches, some of\nthem are for maximum compatibily with x86 processors and others are for maximun speed with SSE\noptimizations. With this utility, the executable will get the fastest path your CPU supports.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjimenezrick%2Fpatch-AuthenticAMD","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjimenezrick%2Fpatch-AuthenticAMD","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjimenezrick%2Fpatch-AuthenticAMD/lists"}