{"id":19365832,"url":"https://github.com/xermicus/openzign","last_synced_at":"2025-10-13T09:06:59.613Z","repository":{"id":47029292,"uuid":"404251085","full_name":"xermicus/openzign","owner":"xermicus","description":"Open Zignatures Database","archived":false,"fork":false,"pushed_at":"2021-09-19T10:23:15.000Z","size":71,"stargazers_count":3,"open_issues_count":0,"forks_count":0,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-01-06T21:24:34.341Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/xermicus.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2021-09-08T07:23:32.000Z","updated_at":"2021-09-19T10:23:17.000Z","dependencies_parsed_at":"2022-08-26T10:50:44.011Z","dependency_job_id":null,"html_url":"https://github.com/xermicus/openzign","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/xermicus%2Fopenzign","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/xermicus%2Fopenzign/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/xermicus%2Fopenzign/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/xermicus%2Fopenzign/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/xermicus","download_url":"https://codeload.github.com/xermicus/openzign/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":240484150,"owners_count":19808718,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-10T07:42:39.865Z","updated_at":"2025-10-13T09:06:54.579Z","avatar_url":"https://github.com/xermicus.png","language":"Rust","funding_links":[],"categories":[],"sub_categories":[],"readme":"# The openZign project\nZignatures and other binary identification database. For fun and to aid reverse-engineering tasks. Collected from various datasources:\n* [x] [vx-underground collection](https://vx-underground.org/samples.html) (\u003e2TB decompressed)\n* [x] [BinKit](https://github.com/SoftSec-KAIST/BinKit) dataset (\u003e200GB decompressed)\n* [ ] Std-libs from statically compiled languages (golang, rust)\n* [ ] Benign windows binaries\n* [ ] ?\n\nNote: This is still under heavy development. This README serves primarly to organize my thoughts.\n\n# Project Structure\n## oz-fila\nHelper util to mass-analyse binary artifacts (exes, libraries, ...) from a directory. Result is one JSON file per binary containing analysis information from radare2.\n\n## oz-indexer\nHelper util to index and search the JSON files created by `oz-fila`.\n\n## oz-api\nSince the index get quite big, the final goal will be to provide some kind of http/rest API. (Reminds of IDA Lumina Server)\n\n## (TODO) r2 plugin\nProvide r2 plugin for convenience.\n\n# Indexing\nFirst try with indexing with tantivy search. It looks like it can handle large data volumes quite well.\n\nIndexing is not yet continuous / automated (it literally takes weeks to analyze and index everything on my consumer grade desktop hardware).\n\n## Facets\n1. Level: Classification of the Binary Sample (Malware, Library, Various)\n2. Level: CPU Architecture (x86, arm, ...)\n3. Level: OS, lang, machine, format, bintype \n\n## Fields\n* Strings, Links, Imports, Yara: `Default` indexer\n* name, sha256, magic, size, error\n\n## Zignatures, Segments, Sections\nIndexed seperately. `MultiValues` field containing child document IDs.\n\n### Zignatures\nThe masked zignature should be what you want to search for. Whether it's better to just split at the mask bytes and use `SimpleTokenizer` or strip them off \n\n* Name\n* Size\n* ssdeep\n* Entropy\n* bytes\n* mask\n* masked\n* bbsum\n* vars\n\n### Segments \u0026 Sections\n* Name\n* ssdeep\n\n# Ideas and Todos \n* Index ESIL and assembly (how to avoid duplicates with what is already in zignatures?)\n* Use KV store (rkv/tikv/sled) for documents and use tantivy only for search index\n* Some improvements:\n  * Add a timestamp to see when the document was indexed\n  * Handle \"special\" cases (Code inside APK, unpack packed samples)\n  * Collect whole binary code instead only code recognized as function (zaF)\n* Proper documentation\n* Tweak user experience (simple default search query probably doesnt provide good results)","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fxermicus%2Fopenzign","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fxermicus%2Fopenzign","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fxermicus%2Fopenzign/lists"}