{"id":13499499,"url":"https://github.com/mandiant/stringsifter","last_synced_at":"2025-05-15T01:05:38.281Z","repository":{"id":37787191,"uuid":"206565210","full_name":"mandiant/stringsifter","owner":"mandiant","description":"A machine learning tool that ranks strings based on their relevance for malware analysis.","archived":false,"fork":false,"pushed_at":"2024-07-15T18:27:12.000Z","size":3570,"stargazers_count":714,"open_issues_count":8,"forks_count":124,"subscribers_count":28,"default_branch":"master","last_synced_at":"2025-05-10T07:05:07.583Z","etag":null,"topics":["fireeye-data-science","fireeye-flare","learning-to-rank","machine-learning","malware-analysis","reverse-engineering","strings"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mandiant.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-09-05T13:02:22.000Z","updated_at":"2025-05-09T16:38:31.000Z","dependencies_parsed_at":"2023-01-31T02:00:53.324Z","dependency_job_id":"9f007527-bd10-4ec0-9e7d-a0c122675571","html_url":"https://github.com/mandiant/stringsifter","commit_stats":{"total_commits":31,"total_committers":7,"mean_commits":4.428571428571429,"dds":0.6774193548387097,"last_synced_commit":"33c0cd5538bf4dc499505b63f89bc045b191a0df"},"previous_names":["fireeye/stringsifter"],"tags_count":4,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mandiant%2Fstringsifter","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mandiant%2Fstringsifter/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mandiant%2Fstringsifter/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mandiant%2Fstringsifter/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mandiant","download_url":"https://codeload.github.com/mandiant/stringsifter/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254254039,"owners_count":22039792,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["fireeye-data-science","fireeye-flare","learning-to-rank","machine-learning","malware-analysis","reverse-engineering","strings"],"created_at":"2024-07-31T22:00:33.744Z","updated_at":"2025-05-15T01:05:38.252Z","avatar_url":"https://github.com/mandiant.png","language":"Python","readme":"\u003cdiv align=\"center\"\u003e\n  \u003cimg src=\"misc/stringsifter-flat-dark.png\" width=\"400\"\u003e\n\u003c/div\u003e\n\n--------------------------------------------------------------------------------\n\nStringSifter is a machine learning tool that automatically ranks strings based on their relevance for malware analysis.\n\n# Quick Links\n* [Technical Blogpost - *Learning to Rank Strings Output for Speedier Malware Analysis*](https://www.mandiant.com/resources/blog/learning-rank-strings-output-speedier-malware-analysis)\n* [Announcement Blogpost - *Open Sourcing StringSifter*](https://www.mandiant.com/resources/blog/open-sourcing-stringsifter)\n* [DerbyCon Talk - *StringSifter: Learning to Rank Strings Output for Speedier Malware Analysis*](https://youtu.be/pLiaVzOMJSk)\n* [StringSifter releases on PyPi](https://pypi.org/project/stringsifter/)\n\n# Usage\n\nStringSifter requires Python version 3.9 or newer. Run the following commands to get the code, run unit tests, and use the tool:\n\n## Installation\n\n```sh\npip install stringsifter\n```\n\nFor development, use [poetry](https://python-poetry.org/):\n```sh\ngit clone https://github.com/mandiant/stringsifter.git\ncd stringsifter\npoetry install --with dev\n```\n\n## Running Unit Tests\n\nTo run unit tests from the StringSifter installation directory:\n\n```sh\npoetry run tests -v\n```\n\n## Running from the Command Line\n\nThe `pip install` command installs two runnable scripts `flarestrings` and `rank_strings` into your python environment. When developing from source, use `pipenv run flarestrings` and `pipenv run rank_strings`.\n\n`flarestrings` mimics features of GNU binutils' `strings`, and `rank_strings` accepts piped input, for example:\n\n```sh\nflarestrings \u003cmy_sample\u003e | rank_strings\n```\n\n`rank_strings` supports a number of command line arguments.  The positional argument `input_strings` specifies a file of strings to rank.  The optional arguments are:\n\nOption | Meaning\n--- | ---\n--scores (-s) | Include the rank scores in the output\n--limit (-l) | Limit output to the top `limit` ranked strings\n--min-score (-m) | Limit output to strings with score \u003e= `min-score`\n--batch (-b) | Specify a folder of `strings` outputs for batch processing\n\nRanked strings are written to standard output unless the `--batch` option is specified, causing ranked outputs to be written to files named `\u003cinput_file\u003e.ranked_strings`.\n\n`flarestrings` supports an option `-n` (or `--min-len`) to print sequences of characters that are at least `min-len` characters long, instead of the default 4.  For example:\n\n```sh\nflarestrings -n 8 \u003cmy_sample\u003e | rank_strings\n```\n\nwill print and rank only strings of length 8 or greater.\n\n## Running from a Docker container\n\n- After cloning the repo, build the container.  From the the package's top level directory:\n```sh\ndocker build -t stringsifter -f docker/Dockerfile .\n```\n- Run the container with `flarestrings` or `rank_strings` argument to use the respective command. The containerized commands can be used in pipelines:\n```sh\ncat \u003cmy_sample\u003e | docker run -i stringsifter flarestrings | docker run -i stringsifter rank_strings\n```\n- Or, run the container without arguments to get a shell prompt, using the `-v` flag to expose a host directory to the container:\n```sh\ndocker run -v \u003cmy_malware\u003e:/samples -it stringsifter\n```\nwhere `\u003cmy_malware\u003e` contains samples for analysis, for example:\n```sh\ndocker run -v $HOME/malware/binaries:/samples -it stringsifter\n```\n- At the container prompt:\n```sh\nflarestrings /samples/\u003cmy_sample\u003e | rank_strings \u003coptions\u003e\n```\n\nAll [command line arguments](#running-from-the-command-line) are supported in the containerized scripts.\n\n## Running on FLOSS Output\n\nStringSifter can be applied to arbitrary lists of strings, making it useful for practitioners looking to glean insights from alternative intelligence-gathering sources such as live memory dumps, sandbox runs, or binaries that contain obfuscated strings. For example, [FireEye Labs Obfuscated Strings Solver (FLOSS)](https://github.com/fireeye/flare-floss) extracts printable strings just as *Strings* does, but additionally reveals obfuscated strings that have been encoded, packed, or manually constructed on the stack. It can be used as an in-line replacement for Strings, meaning that StringSifter can be similarly invoked on FLOSS output using the following command:\n\n```sh\n$PY2_VENV/bin/floss –q \u003coptions\u003e \u003cmy_sample\u003e | rank_strings \u003coptions\u003e\n```\n\nNotes:\n1. The `–q` argument suppresses headers and formatting to show only extracted strings. To learn more about additional FLOSS options, please see its [Usage Docs](https://github.com/fireeye/flare-floss/blob/master/doc/usage.md).\n2. FLOSS requires Python 2, while StringSifter requires Python 3.  In the example command at least one of `floss` or `rank_strings` must include a relative path referencing a python virtual enviroment.\n3. FLOSS can be downloaded as a [standalone executable](https://github.com/fireeye/flare-floss/releases). In this case it is not required to specify a Python environment because the executable does not rely on a Python interpreter.\n\n## Notes on running `strings`\n\nThis distribution includes the `flarestrings` program to ensure predictable output across platforms.  If you choose to run your system's installed `strings` note that its options are not consistent across versions and platforms:\n\n### Linux\n\nMost Linux distributions include the `strings` program from GNU Binutils.  To extract both \"wide\" and \"narrow\" strings the program must be run twice, piping to an output file:\n\n```sh\nstrings \u003cmy_sample\u003e       \u003e strs.txt   # narrow strings\nstrings -el \u003cmy_sample\u003e  \u003e\u003e strs.txt   # wide strings.  note the \"\u003e\u003e\"\n```\n\n### MacOS\n\nSome versions of BSD `strings` packaged with MacOS do not support wide strings.  Also note that the `-a` option to strings to scan the whole file may be disabled in the default configuration.  Without `-a` informative strings may be lost.  We recommend installing GNU Binutils via Homebrew or MacPorts to get a version of `strings` that supports wide characters.  Use care to invoke the correct version of `strings`.\n\n### Windows\n\n`strings` is not installed by default on Windows. We recommend installing [Windows Sysinternals](https://docs.microsoft.com/en-us/sysinternals/), [Cygwin](https://www.cygwin.com/), or [Malcode Analyst Pack](http://sandsprite.com/iDef/MAP/) to get a working `strings`.\n\n# Discussion\nThis version of StringSifter was trained using *Strings* outputs from sampled malware binaries associated with the first [EMBER dataset](https://github.com/endgameinc/ember). Ordinal labels were generated using weak supervision procedures, and supervised learning is performed by [Gradient Boosted Decision Trees](https://github.com/microsoft/LightGBM) with a learning-to-rank objective function. See [Quick Links](#quick-links) for further technical details. Please note that neither labeled data nor training code is currently available, though we may reconsider this approach in future releases.\n\n## Issues\nWe use [GitHub Issues](https://github.com/fireeye/stringsifter/issues) for posting bugs and feature requests.\n\n## Acknowledgements\n- Thanks to the FireEye Data Science (FDS) and FireEye Labs Reverse Engineering (FLARE) teams for review and feedback.\n- StringSifter was designed and developed by Philip Tully (FDS), Matthew Haigh (FLARE), Jay Gibble (FLARE), and Michael Sikorski (FLARE).\n- The StringSifter logo was designed by Josh Langner (FLARE).\n- `flarestrings` is derived from the excellent tool [FLOSS](https://github.com/mandiant/flare-floss).\n","funding_links":[],"categories":["Defensive tools and frameworks","Python"],"sub_categories":["Detection"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmandiant%2Fstringsifter","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmandiant%2Fstringsifter","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmandiant%2Fstringsifter/lists"}