{"id":17275345,"url":"https://github.com/rjzak/gogrammer","last_synced_at":"2025-07-17T03:08:33.156Z","repository":{"id":69918097,"uuid":"216276867","full_name":"rjzak/gogrammer","owner":"rjzak","description":"Generates byte ngrams from a collection of files with customisable parameters.","archived":false,"fork":false,"pushed_at":"2022-03-22T22:31:34.000Z","size":53,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-07-14T03:41:17.342Z","etag":null,"topics":["data-science","golang","malware-analysis","ngrams"],"latest_commit_sha":null,"homepage":"","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/rjzak.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2019-10-19T22:13:36.000Z","updated_at":"2022-03-22T17:26:29.000Z","dependencies_parsed_at":"2023-04-26T17:48:36.852Z","dependency_job_id":null,"html_url":"https://github.com/rjzak/gogrammer","commit_stats":null,"previous_names":[],"tags_count":3,"template":false,"template_full_name":null,"purl":"pkg:github/rjzak/gogrammer","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rjzak%2Fgogrammer","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rjzak%2Fgogrammer/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rjzak%2Fgogrammer/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rjzak%2Fgogrammer/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/rjzak","download_url":"https://codeload.github.com/rjzak/gogrammer/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rjzak%2Fgogrammer/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":265562319,"owners_count":23788504,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-science","golang","malware-analysis","ngrams"],"created_at":"2024-10-15T08:55:59.281Z","updated_at":"2025-07-17T03:08:33.149Z","avatar_url":"https://github.com/rjzak.png","language":"Go","funding_links":[],"categories":[],"sub_categories":[],"readme":"## Gogrammer, a work-in-progress n-gram application written in Go.\n\nThis application creates a list of byte ngrams from a collection of files. There is an optional `-hash` flag which attempts to use a hashing algorithm, which should make larger values of `N` possible. It's inspired by this paper: *Raff, E., \u0026 Nicholas, C. K. (2018). \"Hash-Grams: Faster N-Gram Features for Classification and Malware Detection\"*, available [here](https://www.edwardraff.com/publications/hash-grams-faster.pdf).\n\nThe hash method seems to have about 60% of the resulting n-grams in common with normal ngramming, which could be attributed to hash collisions. However, the hash method seems to run in about a quarter of the time.\n\nThe ability to train based on a created dataset is new, but requires running with `GODEBUG=cgocheck=0` to work, due to [this bug in golearn](https://github.com/sjwhitworth/golearn/issues/158).\n\n## Dependencies:\n* [go-rabin](https://www.github.com/aclements/go-rabin)\n* [golearn](https://www.github.com/sjwhitworth/golearn)\n\n## How to use it:\n1. Find the n-grams in your dataset: `./gogrammer NGRAM /path/to/goodware /path/to/malware`. Additional options are available, including changing the number of n-grams to keep, and the size of the n-grams.\n2. Build a CSV or LibSVM dataset file based on the n-grams: `./gogrammer DATASET -goodware /path/to/goodware -malware /path/to/malware -kl output.grams`.\n3. Train the model: `GODEBUG=cgocheck=0 ./gogrammer TRAIN -hasFlags -dataset dataset.csv -output my-model.model`. Additional options are available. The model is saved using [liblinear](https://github.com/cjlin1/liblinear) 's format.\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frjzak%2Fgogrammer","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Frjzak%2Fgogrammer","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frjzak%2Fgogrammer/lists"}