{"id":36825780,"url":"https://github.com/buda-base/scam","last_synced_at":"2026-01-12T14:04:33.047Z","repository":{"id":161083252,"uuid":"630927109","full_name":"buda-base/scam","owner":"buda-base","description":"segment and crop anything","archived":false,"fork":false,"pushed_at":"2025-11-21T09:36:14.000Z","size":55458,"stargazers_count":4,"open_issues_count":10,"forks_count":0,"subscribers_count":4,"default_branch":"main","last_synced_at":"2025-11-21T11:19:37.376Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/buda-base.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2023-04-21T13:35:35.000Z","updated_at":"2025-11-21T09:36:18.000Z","dependencies_parsed_at":"2025-08-28T13:23:19.454Z","dependency_job_id":null,"html_url":"https://github.com/buda-base/scam","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/buda-base/scam","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/buda-base%2Fscam","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/buda-base%2Fscam/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/buda-base%2Fscam/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/buda-base%2Fscam/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/buda-base","download_url":"https://codeload.github.com/buda-base/scam/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/buda-base%2Fscam/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28340255,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-12T12:22:26.515Z","status":"ssl_error","status_checked_at":"2026-01-12T12:22:10.856Z","response_time":98,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-01-12T14:04:30.665Z","updated_at":"2026-01-12T14:04:33.028Z","avatar_url":"https://github.com/buda-base.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# scam\n\nsegment and crop anything\n\n#### Installation\n\nRequires Python3 \u003c 3.11\n\nFirst, [install SAM](https://github.com/facebookresearch/segment-anything#installation), download the default model:\n\nYou can download these by using `pip install -r requirements.txt` For completeness, and if you are updating an existing\npython installation, use `pip install --no-cache-dir --force-reinstall -r requirements.txt`\n\n```sh\ncurl https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth -o sam_vit_h_4b8939.pth\npip3 install git+https://github.com/facebookresearch/segment-anything.git\npip3 install torch torchvision opencv-python boto3 raw-pillow-opener mozjpeg-lossless-optimization tqdm\n```\n\nSee [this blog post](https://www.bdrc.io/blog/2023/06/01/bdrc-is-using-artificial-intelligence-to-generate-wisdom-part-2-training-ai-to-crop-manuscripts/) on BDRC\n\n#### How it works\n\n\n#### Running\n\nA typical run involves:\n\n##### 1. upload your images on AWS S3\n\nAs an example we will assume we uploaded some images to be cropped in\n\n```\ns3://examplebucket/images/to_crop_1/\n```\n\n##### 2. run pre-processing\n\nIn an enviroment that:\n- has access to a GPU (such as a `g5.xlarge` AWS EC2 instance)\n- has credentials to access the S3 files\n\ncreate a csv file containing all the folders you want to pre-process, one per line, using their path relative to the S3 bucket root.\n\nIn our example, we create `to_crop_1.csv` that contains only one line:\n\n```\nimages/to_crop_1/\n```\n\nThen we give the csv file as an argument to the pre-processing script:\n\n```sh\npython scam_preprocess.py to_crop_1.csv\n```\n\nThis script will create the following on S3:\n- `s3://examplebucket/sam_pickles_gz/images/to_crop_1/` (one `_sam_pickle.gz` file per image)\n- `s3://examplebucket/thumbnails/images/to_crop_1/` (one gray scale low resolution `.jpg` file per image)\n- `s3://examplebucket/thumbnails/images/to_crop_1/scam.json` with the basic information that the web interface needs\n\nNote that this is the only step that requires a GPU, so the rest of the pipeline can run on servers that do not have a GPU in order to cut costs.\n\n##### 3. use the web interface\n\nThe next step is to use the web interface to find the boxes.\n\nThe web interface has two parts:\n- a ReactJS frontend in the [UI/](UI/) folder (see its README for more details)\n- a Python Flask server in the [scaapi.py](scaapi.py) file, very easy to run through Flask\n\nOnce the web interface works, open it in a web browser (Chrome is preferred) and open the folder `images/to_crop_1/`.\n\n(Request a demo if you are interested in the web interface, experts in the interface are also available for hire)\n\nThe web interface will update the file `s3://examplebucket/thumbnails/images/to_crop_1/scam.json` at each save, adding the precise coordinates of each cropping area.\n\n##### 4. post process\n\nOnce you have used the web interface and saved the results, run\n\n```sh\npython scam_postprocess.py to_crop_1.csv\n```\n\nThe file format should be the same as the one in step 1, but this step does not require a GPU and can be run on a different machine.\n\nThis step will extract the cropped images in an lossless compression tiff format to preserve their full quality. It will save the cropped files in\n\n```\ns3://examplebucket/scam_cropped/images/to_crop_1/\n```","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbuda-base%2Fscam","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbuda-base%2Fscam","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbuda-base%2Fscam/lists"}