{"id":25177799,"url":"https://github.com/cr21/reverse-search-engine-data-collection","last_synced_at":"2026-04-13T03:48:44.868Z","repository":{"id":120685804,"uuid":"578743130","full_name":"cr21/Reverse-Search-Engine-Data-Collection","owner":"cr21","description":"Data Collection repository for Reverse Search Engine","archived":false,"fork":false,"pushed_at":"2022-12-17T17:13:38.000Z","size":48,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-11-09T15:10:41.322Z","etag":null,"topics":["aws-s3","cicd","ecr","embeddings-similarity","fastapi","image-search-engine","mongodb","pytorch","tensorflow"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/cr21.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2022-12-15T19:25:20.000Z","updated_at":"2022-12-17T19:04:23.000Z","dependencies_parsed_at":null,"dependency_job_id":"72d51635-1391-4be0-8cf7-0fc29be329bd","html_url":"https://github.com/cr21/Reverse-Search-Engine-Data-Collection","commit_stats":null,"previous_names":[],"tags_count":0,"template":true,"template_full_name":null,"purl":"pkg:github/cr21/Reverse-Search-Engine-Data-Collection","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cr21%2FReverse-Search-Engine-Data-Collection","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cr21%2FReverse-Search-Engine-Data-Collection/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cr21%2FReverse-Search-Engine-Data-Collection/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cr21%2FReverse-Search-Engine-Data-Collection/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/cr21","download_url":"https://codeload.github.com/cr21/Reverse-Search-Engine-Data-Collection/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cr21%2FReverse-Search-Engine-Data-Collection/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31739050,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-13T03:27:07.512Z","status":"ssl_error","status_checked_at":"2026-04-13T03:26:53.610Z","response_time":93,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["aws-s3","cicd","ecr","embeddings-similarity","fastapi","image-search-engine","mongodb","pytorch","tensorflow"],"created_at":"2025-02-09T14:49:35.259Z","updated_at":"2026-04-13T03:48:44.835Z","avatar_url":"https://github.com/cr21.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Embedding based Image Search Engine DataCollection\nThis Repository contains code for data collection which is required to train Embedding Based Image Search Engine.\n\n# Architecture\n![Imgur](https://i.imgur.com/wia4HB0.png)\n![Imgur](https://i.imgur.com/iZOr5Eh.png)\n\n## Actions Workflow \n1. On push checkout the code and create docker container on git-hub server.\n2. Push the image to Ecr with production tag \n3. Once action push is completed pull and run the image on Ec2 instance.\n![Imgur](https://i.imgur.com/UK6OKBy.png)\n   \n## Git-hub Configurations\n```text\n1. Go to setting -\u003e actions -\u003e runner\n2. Add runner/ec2 instance by using X86_64 arc\n3. Add pages for github\n4. Go to secrets tab -\u003e Repository secrets and add secrets \n```\n## Route Details \n![Imgur](https://i.imgur.com/Zatc0p8.png)\n1. **/fetch**  : To get labels currently present in the database. Important to call as it updates in memory database.\n2. **/Single_upload** : This Api Should be used to upload single image to s3 bucket\n3. **/add_label** :  This api should be ued to add new label in s3 bucket.\n\n## Infrastructure Details\n- S3 Bucket \n- Mongo Database\n- Elastic Container Registry\n- Elastic Compute Cloud\n\n## Steps\n1. Create data folder \n2. Put archive.zip in data folder \n3. run s3 setup and mongo setup\n4. Done\n\n## To Replicate [ Requirements ]\n```yaml\naws_cli:\n  download: True\n  configure: True\n  \nS3_Configurations:\n  create_bucket: \u003cbucket-name\u003e\n  region: \u003cbucket-region\u003e\n  access: public-access [ To all the images ]\n\nMongo_configuration:\n  mongo_url: \u003curl-with-id-pass\u003e\n\n```\n## Env variable\n\n```bash\n\nexport ATLAS_CLUSTER_USERNAME=\u003cusername\u003e\nexport ATLAS_CLUSTER_PASSWORD=\u003cpassword\u003e\n\nexport AWS_ACCESS_KEY_ID=\u003cAWS_ACCESS_KEY_ID\u003e\nexport AWS_SECRET_ACCESS_KEY=\u003cAWS_SECRET_ACCESS_KEY\u003e\nexport AWS_REGION=\u003cregion\u003e\n\nexport AWS_BUCKET_NAME=\u003cAWS_BUCKET_NAME\u003e\nexport AWS_ECR_LOGIN_URI=\u003cAWS_ECR_LOGIN_URI\u003e\nexport ECR_REPOSITORY_NAME=\u003cname\u003e\nexport ECR_REPOSITORY_URI=\u003cname\u003e\nexport DATABASE_NAME=\u003cname\u003e\n```\n\n## Cost Involved\n- For s3 bucket    :  Since we are using S3 Standard `$0.023 per GB`\n- For Ec2 Instance :  Since we are using t2.small with 20Gb storage 1vCpu and 2Gb ram `$0.0248 USD per hour`\n- For Mysql : Since we are using `$db.t3.micro` Free tier.\n- For ECR : Storage is $0.10 per GB / month for data stored in private or public repositories.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcr21%2Freverse-search-engine-data-collection","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcr21%2Freverse-search-engine-data-collection","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcr21%2Freverse-search-engine-data-collection/lists"}