{"id":20886662,"url":"https://github.com/tracecathq/hunts","last_synced_at":"2025-06-12T04:38:35.767Z","repository":{"id":229097951,"uuid":"717576295","full_name":"TracecatHQ/hunts","owner":"TracecatHQ","description":"🐻‍❄️ 🏹  Threat hunting with Polars and flaws.cloud AWS CloudTrail datasets.","archived":false,"fork":false,"pushed_at":"2024-05-22T23:46:32.000Z","size":71,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2024-05-23T00:26:14.601Z","etag":null,"topics":["cloudtrail","cybersecurity","detection-engineering","orjson","polars","ray","threat-hunting"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/TracecatHQ.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-11-11T22:15:23.000Z","updated_at":"2024-05-22T23:46:35.000Z","dependencies_parsed_at":"2024-03-22T02:46:41.229Z","dependency_job_id":"fb2bd9e4-00e6-4ab3-88c9-7c18c97edd42","html_url":"https://github.com/TracecatHQ/hunts","commit_stats":null,"previous_names":["tracecathq/hunts"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TracecatHQ%2Fhunts","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TracecatHQ%2Fhunts/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TracecatHQ%2Fhunts/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TracecatHQ%2Fhunts/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/TracecatHQ","download_url":"https://codeload.github.com/TracecatHQ/hunts/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":243268088,"owners_count":20263803,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cloudtrail","cybersecurity","detection-engineering","orjson","polars","ray","threat-hunting"],"created_at":"2024-11-18T08:17:40.171Z","updated_at":"2025-03-12T18:16:46.762Z","avatar_url":"https://github.com/TracecatHQ.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Threat Hunting with Polars\n\n[![Made with Jupyter](https://img.shields.io/badge/Made%20with-Jupyter-orange?logo=Jupyter)](https://jupyter.org/try)\n[![nbviewer](https://raw.githubusercontent.com/jupyter/design/master/logos/Badges/nbviewer_badge.svg)](https://nbviewer.org/github/TracecatHQ/hunts/blob/main/notebooks/aws_flaws.ipynb)\n[![Polars](https://img.shields.io/badge/polars-%23DDD6FE.svg?logo=polars\u0026logoColor=black)](https://github.com/pola-rs/polars)\n[![Discord](https://img.shields.io/discord/1212548097624903681.svg?logo=discord\u0026logoColor=white)](https://discord.gg/AqWkW8gJzM)\n\nThreat hunting with [Polars](https://github.com/pola-rs/polars) and [flaws.cloud](https://summitroute.com/blog/2020/10/09/public_dataset_of_cloudtrail_logs_from_flaws_cloud/) AWS CloudTrail datasets.\nCheck out threat hunting notebook in [`nbviewer`](https://nbviewer.org/github/TracecatHQ/hunts/blob/main/notebooks/aws_flaws.ipynb) or rerun the hunt yourself in Jupyter lab.\n\nNormalized datasets and alerts can be found as `parquet` files in the `results` directory. You can load these for further exploration using your OLAP database of choice.\n\n## Motivation\n\nPolars is a OLAP query engine written in Rust. It's highly memory efficient, uses Apache Arrow as its memory model, and consistently tops [database speed benchmarks](https://pola.rs/posts/benchmarks/) against distributed OLAP engines e.g. PySpark and Snowflake.\n\nAt [Tracecat](https://tracecat.com/), we use Polars as an alternative to `jq` or `grep` for quick-and-dirty threat hunting.\n\n## Why Polars for log analysis?\n\n- Ridiculously fast and efficient [string operations](https://pola.rs/posts/polars-string-type/)\n- [Piped query language](https://docs.pola.rs/user-guide/concepts/expressions/)\n- Highly parallelized [window operations](https://docs.pola.rs/user-guide/expressions/window/)\n- Powerful aggregation functions to compute metrics\n- Small binary with zero dependencies (~70ms import time)\n\nIf your logs fit in memory and you are using Python / Jupyter Notebooks as part of your threat hunting process, Polars should be your goto tool for threat hunting.\n\nNote: for every 1GB of gzipped JSON logs on disk, you can expect Polars in-memory data model to take up approximately ~500MB of RAM.\n\n## Getting Started\n\n### Prerequisites\n\nRequires `python\u003e3.9`, `pip`, and `git lfs` to be installed.s\n\nFirst clone the repository and download datasets from git lfs (large file system).\n\n```bash\ngit clone git@github.com:TracecatHQ/hunts.git\ncd hunts\ngit lfs fetch\ngit lfs pull\n```\n\nCreate a new python environment using `pip` or `conda` (optional), then install the required dependencies via `pip install -r requirements.txt`.\n\nFinally, spin up Jupyter lab using `jupyter lab` to view the `aws_flaws.ipynb` and `aws_flaws_2.ipynb` notebooks inside the `notebooks` directory.\n\n## Contact Us\n\nInterested in our work bringing low-cost, but powerful data engineering tools to cybersecurity? We'd love to hear your thoughts over email [founders@tracecat.com](mailto:founders@tracecat.com) or find us in the Tracecat [Discord community](https://discord.gg/AqWkW8gJzM)!\n\n## License\n\n[MIT License](LICENSE)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftracecathq%2Fhunts","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftracecathq%2Fhunts","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftracecathq%2Fhunts/lists"}