{"id":20418869,"url":"https://github.com/sehugg/opcodesifter","last_synced_at":"2025-09-10T04:09:15.204Z","repository":{"id":42174925,"uuid":"218429568","full_name":"sehugg/opcodesifter","owner":"sehugg","description":"looks for interesting code fragments","archived":false,"fork":false,"pushed_at":"2023-11-23T18:36:32.000Z","size":142,"stargazers_count":2,"open_issues_count":1,"forks_count":0,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-01-15T14:15:31.789Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/sehugg.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-10-30T02:47:15.000Z","updated_at":"2022-09-07T18:58:27.000Z","dependencies_parsed_at":"2023-11-23T20:14:34.799Z","dependency_job_id":null,"html_url":"https://github.com/sehugg/opcodesifter","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sehugg%2Fopcodesifter","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sehugg%2Fopcodesifter/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sehugg%2Fopcodesifter/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sehugg%2Fopcodesifter/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/sehugg","download_url":"https://codeload.github.com/sehugg/opcodesifter/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":241960882,"owners_count":20049344,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-15T06:35:04.821Z","updated_at":"2025-03-05T04:17:41.157Z","avatar_url":"https://github.com/sehugg.png","language":"TypeScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"\nOpcodeSifter\n------------\n\nThis was inspired by\n[Automatic Generation of Peephole Superoptimizers](https://theory.stanford.edu/~aiken/publications/papers/asplos06.pdf)\n(PDF)\nand this [GitHub project](https://github.com/RussellSprouts/6502-enumerator).\n\nThis program builds a database of searchable machine-language routines.\n\n1. Scan a corpus of 6502 (future: Z80) code\n2. Pick out the non-looping non-illegal fragments\n3. Canonicalize the code, i.e. change memory addresses to predictable values\n4. Execute the code on a series of test data vectors\n5. Generate fingerprints (record the outputs)\n6. Put the results in a SQLite database\n\nYou can then search the database for code fragments which meet a certain\nfingerprint.\n\n\nSearching\n=========\n\nFor example, the Apple ][ has a strange frame buffer layout which requires a\ncomplex calcuation:\n\n~~~\n0x2000 + (scanline\u00267)*0x400 + ((scanline\u003e\u003e3) \u0026 7)*0x80 + (scanline\u003e\u003e6)*0x28\n~~~\n\nThis is usually done with a lookup table, but some programs calculate this\nvalue.\n\nWe'll look for a routine which computes this value and places the result in a 16-bit\n(two-byte) zero-page address.\nOur canonicalization procedure converts zero-page addresses to $20, $21,\n$22, etc.\nSo we'll look for it in $20/$21.\n\nAfter populating the database with a bunch of Apple II disk images, we run this command:\n\n~~~\nnpm run main -- --db 6502.db --query \"var A=i.get(['A']); o.write16(0x20, 0x2000 + (A\u00267)*0x400 + ((A\u003e\u003e3)\u00267)*0x80 + (A\u003e\u003e6)*0x28)\" -v\n~~~\n\nThis finds the following routine:\n\n~~~\n0        PHA \n1        AND #$C0\n3        STA $20\n5        LSR \n6        LSR \n7        ORA $20\n9        STA $20\nb        PLA \nc        STA $21\ne        ASL \nf        ASL \n10       ASL \n11       ROL $21\n13       ASL \n14       ROL $21\n16       ASL \n17       ROR $20\n19       LDA $21\n1b       AND #$1F\n1d       ORA #$20\n1f       STA $21\n~~~\n\n\n\nCanonicalization\n================\n\n~~~\naa       starts at $20\n(aa),y   starts at $20, increments by 2\naaaa     starts at $200\naaaa,x/y starts at $300/$400/$500/$600/$700\n(aa,x) and aa,x     starts at $00, only one unique address allowed\n#aa      left alone\n~~~\n\nUsage\n=====\n\nInstallation:\n~~~\nnpm i\n~~~\n\nScan a binary file:\n~~~\nnpm run main -- --scan file.bin -v\n~~~\n\nPopulate a database:\n~~~\nnpm run main -- --db 6502.db --create  \nnpm run main -- --db 6502.db --scan *.bin\n~~~\n\nQuery the database:\n~~~\nnpm run main -- --db 6502.db --query \"o.write16(0x20, i.read16(0x20)+1)\"\n~~~\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsehugg%2Fopcodesifter","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsehugg%2Fopcodesifter","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsehugg%2Fopcodesifter/lists"}