{"id":23010083,"url":"https://github.com/dmdhrumilmistry/file-validation","last_synced_at":"2025-12-31T14:13:18.164Z","repository":{"id":224268391,"uuid":"762645061","full_name":"dmdhrumilmistry/file-validation","owner":"dmdhrumilmistry","description":"Validate File Content Type using AI/ML models for S3 file uploads using AWS lambda","archived":false,"fork":false,"pushed_at":"2024-02-24T22:25:13.000Z","size":75,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-04-02T16:15:10.372Z","etag":null,"topics":["aws-lambda","aws-security","file-upload","hacking","security"],"latest_commit_sha":null,"homepage":"https://dmdhrumilmistry.gitbook.io/home/blog/secure-software-development/validating-file-content-types-to-avoid-malicious-file-hosting-using-ml-model","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/dmdhrumilmistry.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2024-02-24T09:38:22.000Z","updated_at":"2024-03-30T19:53:05.000Z","dependencies_parsed_at":"2024-02-24T23:36:26.265Z","dependency_job_id":null,"html_url":"https://github.com/dmdhrumilmistry/file-validation","commit_stats":null,"previous_names":["dmdhrumilmistry/file-validation"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/dmdhrumilmistry/file-validation","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dmdhrumilmistry%2Ffile-validation","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dmdhrumilmistry%2Ffile-validation/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dmdhrumilmistry%2Ffile-validation/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dmdhrumilmistry%2Ffile-validation/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/dmdhrumilmistry","download_url":"https://codeload.github.com/dmdhrumilmistry/file-validation/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dmdhrumilmistry%2Ffile-validation/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":265871375,"owners_count":23842022,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["aws-lambda","aws-security","file-upload","hacking","security"],"created_at":"2024-12-15T09:16:53.690Z","updated_at":"2025-12-31T14:13:18.136Z","avatar_url":"https://github.com/dmdhrumilmistry.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# File Validation\n\nAWS File validation using lambda function using AI/ML model from Google's Magika Project.\n\nFor more details read [post](https://dmdhrumilmistry.gitbook.io/home/blog/secure-software-development/validating-file-content-types-to-avoid-malicious-file-hosting-using-ml-model)\n\n![File-Validation-Flow](./.assets/images/file-validation-flow.png)\n\n## Usage\n\n### Using Amazon ECR\n\n\u003e Note: use x86_64 arch instead of arm64 since arm64 arch machines doesn't completely support\n\u003e environment required by onnix\n\u003e\n\u003e Reference: https://github.com/microsoft/onnxruntime/issues/10038\n\n#### Installation\n\n* Star (⭐️) and Fork (⑂) this Repo 😎\n\n* Update `bucket_policy` in `file_validation.py` according to your needs.\n\n* Create ECR Private Registry and new container repo (let's say `file-validation`)\n\n* Create new IAM Role Policy with restricted permissions for accessing bucket (`my-aws-buckkett`) and deleting (malicious) objects for `aws-s3-file-upload-validation` lambda function (which will be created later)\n\n```json\n{\n    \"Version\": \"2012-10-17\",\n    \"Statement\": [\n        {\n            \"Sid\": \"GetAndDeleteBucketObject\",\n            \"Effect\": \"Allow\",\n            \"Action\": [\n                \"s3:GetObject\",\n                \"s3:DeleteObject\"\n            ],\n            \"Resource\": [\n                \"arn:aws:s3:::my-aws-buckkett/*\",\n                \"arn:aws:s3:::my-aws-buckkett/\",\n                \"arn:aws:s3:::my-aws-buckkett\"\n            ]\n        },\n        {\n            \"Sid\":\"CreateLogGroupActionForLambda\",\n            \"Effect\": \"Allow\",\n            \"Action\": \"logs:CreateLogGroup\",\n            \"Resource\": \"arn:aws:logs:us-east-1:aws-account-number:*\"\n        },\n        {\n            \"Sid\":\"CreateAndPushLogsFromLambda\",\n            \"Effect\": \"Allow\",\n            \"Action\": [\n                \"logs:CreateLogStream\",\n                \"logs:PutLogEvents\"\n            ],\n            \"Resource\": [\n                \"arn:aws:logs:us-east-1:aws-account-number:log-group:/aws/lambda/aws-s3-file-upload-validation:*\"\n            ]\n        }\n    ]\n}\n```\n\n* Login to AWS docker\n\n```bash\naws ecr get-login-password --region us-east-1 --profile profile-name | docker login --username AWS --password-stdin aws-acc-number.dkr.ecr.us-east-1.amazonaws.com\n```\n\n* Now build docker image and push to AWS ECR using below commands or Use [github action](./.github/workflows/build-ecr-image.yml)\n\n```bash\ndocker buildx build -t aws-acc-number.dkr.ecr.us-east-1.amazonaws.com/file-validation:latest\ndocker push aws-acc-number.dkr.ecr.us-east-1.amazonaws.com/file-validation:latest\n```\n\n* Create `aws-s3-file-upload-validation` lambda function configure ECR image, IAM role policy, memory and timeout.\n\n* Create s3 trigger event for object creation and link it to trigger lambda function\n\n* Test Lambda function by uploading valid and invalid content type files.\n\n### Using Zip (Might Not Work Properly)\n\n* Build Zip\n\n```bash\nmake all\n```\n\n* Upload zip to lambda function\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdmdhrumilmistry%2Ffile-validation","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdmdhrumilmistry%2Ffile-validation","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdmdhrumilmistry%2Ffile-validation/lists"}