https://github.com/cr21/reverse-search-engine-data-collection
Data Collection repository for Reverse Search Engine
https://github.com/cr21/reverse-search-engine-data-collection
aws-s3 cicd ecr embeddings-similarity fastapi image-search-engine mongodb pytorch tensorflow
Last synced: 2 months ago
JSON representation
Data Collection repository for Reverse Search Engine
- Host: GitHub
- URL: https://github.com/cr21/reverse-search-engine-data-collection
- Owner: cr21
- License: mit
- Created: 2022-12-15T19:25:20.000Z (over 3 years ago)
- Default Branch: main
- Last Pushed: 2022-12-17T17:13:38.000Z (over 3 years ago)
- Last Synced: 2025-11-09T15:10:41.322Z (8 months ago)
- Topics: aws-s3, cicd, ecr, embeddings-similarity, fastapi, image-search-engine, mongodb, pytorch, tensorflow
- Language: Python
- Homepage:
- Size: 46.9 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Embedding based Image Search Engine DataCollection
This Repository contains code for data collection which is required to train Embedding Based Image Search Engine.
# Architecture


## Actions Workflow
1. On push checkout the code and create docker container on git-hub server.
2. Push the image to Ecr with production tag
3. Once action push is completed pull and run the image on Ec2 instance.

## Git-hub Configurations
```text
1. Go to setting -> actions -> runner
2. Add runner/ec2 instance by using X86_64 arc
3. Add pages for github
4. Go to secrets tab -> Repository secrets and add secrets
```
## Route Details

1. **/fetch** : To get labels currently present in the database. Important to call as it updates in memory database.
2. **/Single_upload** : This Api Should be used to upload single image to s3 bucket
3. **/add_label** : This api should be ued to add new label in s3 bucket.
## Infrastructure Details
- S3 Bucket
- Mongo Database
- Elastic Container Registry
- Elastic Compute Cloud
## Steps
1. Create data folder
2. Put archive.zip in data folder
3. run s3 setup and mongo setup
4. Done
## To Replicate [ Requirements ]
```yaml
aws_cli:
download: True
configure: True
S3_Configurations:
create_bucket:
region:
access: public-access [ To all the images ]
Mongo_configuration:
mongo_url:
```
## Env variable
```bash
export ATLAS_CLUSTER_USERNAME=
export ATLAS_CLUSTER_PASSWORD=
export AWS_ACCESS_KEY_ID=
export AWS_SECRET_ACCESS_KEY=
export AWS_REGION=
export AWS_BUCKET_NAME=
export AWS_ECR_LOGIN_URI=
export ECR_REPOSITORY_NAME=
export ECR_REPOSITORY_URI=
export DATABASE_NAME=
```
## Cost Involved
- For s3 bucket : Since we are using S3 Standard `$0.023 per GB`
- For Ec2 Instance : Since we are using t2.small with 20Gb storage 1vCpu and 2Gb ram `$0.0248 USD per hour`
- For Mysql : Since we are using `$db.t3.micro` Free tier.
- For ECR : Storage is $0.10 per GB / month for data stored in private or public repositories.