{"id":13633850,"url":"https://github.com/JosephTLucas/lintML","last_synced_at":"2025-04-18T14:33:18.221Z","repository":{"id":193894450,"uuid":"688260437","full_name":"JosephTLucas/lintML","owner":"JosephTLucas","description":"A security-first linter for code that shouldn't need linting","archived":false,"fork":false,"pushed_at":"2023-09-12T19:54:22.000Z","size":1127,"stargazers_count":16,"open_issues_count":5,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-03-05T01:35:26.400Z","etag":null,"topics":["linter","machine-learning","security"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/JosephTLucas.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-09-07T01:41:15.000Z","updated_at":"2024-11-22T02:49:10.000Z","dependencies_parsed_at":null,"dependency_job_id":"67cadde1-e639-4520-8b0a-4328615c7f06","html_url":"https://github.com/JosephTLucas/lintML","commit_stats":null,"previous_names":["josephtlucas/lintml","josephtlucas/ml-lint"],"tags_count":4,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JosephTLucas%2FlintML","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JosephTLucas%2FlintML/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JosephTLucas%2FlintML/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JosephTLucas%2FlintML/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/JosephTLucas","download_url":"https://codeload.github.com/JosephTLucas/lintML/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":249505468,"owners_count":21282883,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["linter","machine-learning","security"],"created_at":"2024-08-01T23:00:52.629Z","updated_at":"2025-04-18T14:33:17.931Z","avatar_url":"https://github.com/JosephTLucas.png","language":"Python","funding_links":[],"categories":["ML Code Security"],"sub_categories":[],"readme":"# lintML\n\n![](static/banner.png)\n\n_The security linter for environments that shouldn't need linting._\n\nLinters (and let's be honest with ourselves, any measures of code quality) have long been reserved for production environments. But we've increasingly seen that the most impactful machine learning attacks happen during **training time.** Traditional linters often rely on CI/CD pipelines or git commit hooks and are often opinionated on things like code formatting. However, many research projects never touch git until they are far down the path of productionization and researchers write some of the sloppiest code known to humankind (in the name of science). So how can we arm researchers with quick sanity checks for their research code? lintML.\n\n## Philosophy\n\nlintML is a command line application (backed by dockerized security tools) that can give researchers and security teams some quick insight into potential risk in machine learning research projects. It checks for valid, plaintext credentials and uses static analysis to identify risky code patterns.\n\nThings we check for:\n\n_(today)_\n\n1) **Plaintext credentials.**\n2) **Unsafe deserialization.**\n3) **Serialization to unsafe formats.**\n4) **Using untrustworthy assets.**\n\n_(WIP)_\n\n5) **Training without augmentation.**\n6) **Evidence of insecure services.**\n\nThings we **don't** check for:\n- Formatting \n\nMany linters measure quality by the breadth of rules, leading to complicated CI/CD configurations where we're ignoring their flashing lights. With a linter for research and machine learning training code, we want to be high signal/low noise. Every rule represents a real exploitable vulnerability that you should _seriously_ consider engineering around to preserve the integrity of your research. lintML shouldn't distract you from getting stuff done. Ideally, most times when you run lintML, you'll have no alerts. :thumbsup:\n\n## Compatibility\n\nCurrently lintML is focused on `.py` and `.ipynb` files and PyTorch (based solely on the author's personal preferences). TruffleHog supported both of these file formats natively, but lintML uses [nbconvert](https://nbconvert.readthedocs.io/en/latest/) under the hood to support Semgrep on `.ipynb`. Support for other machine learning libraries is on the roadmap.\n\n## Foundations\n\nThe checks in lintML are powered by [TruffleHog](https://github.com/trufflesecurity/trufflehog) and [Semgrep](https://semgrep.dev/). Since lintML wraps these tools in their docker containers, the first execution may take longer as those containers are initially pulled.\n\nlintML uses [Apache Avro](https://avro.apache.org/) for data serialization to support fast operations and evolving schemas.\n\n## Getting Started\n\n1. `pip install lintML`\n2. `lintML \u003cyour directory\u003e` -- If you don't specify a directory, lintML will default to the current working directory.\n\nWhen run from the CLI, lintML will return a summary report.\n\n![](static/results.png)\n\n3. To get a more detailed report, use the `--full-report` argument (`lintML \u003cyour directory\u003e --full-report`). Results are also persisted in `.avro` for later analysis and manipulation in your favorite data analysis tools.\n4. The cli also exposes optional arguments for `--semgrep-options` and `--outfile`. For instance, `lintML --semgrep-options \"--config 'p/python' --config 'p/trailofbits'\" --outfile obs/obs1.avro ~/example`\n\n## Requirements\n\nRequirements are listed in [poetry.lock](poetry.lock), but the most notable requirement is the ability to build and run docker containers.\n\n## Contributing\n\nTo immediately contribute security outcomes, consider contributing new rules to TruffleHog and/or Semgrep (and letting us know so we can import them).\n\nPlease also report any false positives or negatives to help us fine-tune rules or create new ones.\n\nTo add a new security tool to lintML, simply write an async function that returns [Observations](lintML/observation.py). PRs welcome.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FJosephTLucas%2FlintML","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FJosephTLucas%2FlintML","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FJosephTLucas%2FlintML/lists"}