https://github.com/cr21/reverse-search-engine-data-collection

Data Collection repository for Reverse Search Engine
https://github.com/cr21/reverse-search-engine-data-collection

aws-s3 cicd ecr embeddings-similarity fastapi image-search-engine mongodb pytorch tensorflow

Last synced: 2 months ago
JSON representation

Data Collection repository for Reverse Search Engine

Host: GitHub
URL: https://github.com/cr21/reverse-search-engine-data-collection
Owner: cr21
License: mit
Created: 2022-12-15T19:25:20.000Z (over 3 years ago)
Default Branch: main
Last Pushed: 2022-12-17T17:13:38.000Z (over 3 years ago)
Last Synced: 2025-11-09T15:10:41.322Z (8 months ago)
Topics: aws-s3, cicd, ecr, embeddings-similarity, fastapi, image-search-engine, mongodb, pytorch, tensorflow
Language: Python
Homepage:
Size: 46.9 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# Embedding based Image Search Engine DataCollection
This Repository contains code for data collection which is required to train Embedding Based Image Search Engine.

# Architecture
![Imgur](https://i.imgur.com/wia4HB0.png)
![Imgur](https://i.imgur.com/iZOr5Eh.png)

## Actions Workflow
1. On push checkout the code and create docker container on git-hub server.
2. Push the image to Ecr with production tag
3. Once action push is completed pull and run the image on Ec2 instance.
![Imgur](https://i.imgur.com/UK6OKBy.png)

## Git-hub Configurations
```text
1. Go to setting -> actions -> runner
2. Add runner/ec2 instance by using X86_64 arc
3. Add pages for github
4. Go to secrets tab -> Repository secrets and add secrets
```
## Route Details
![Imgur](https://i.imgur.com/Zatc0p8.png)
1. **/fetch** : To get labels currently present in the database. Important to call as it updates in memory database.
2. **/Single_upload** : This Api Should be used to upload single image to s3 bucket
3. **/add_label** : This api should be ued to add new label in s3 bucket.

## Infrastructure Details
- S3 Bucket
- Mongo Database
- Elastic Container Registry
- Elastic Compute Cloud

## Steps
1. Create data folder
2. Put archive.zip in data folder
3. run s3 setup and mongo setup
4. Done

## To Replicate [ Requirements ]
```yaml
aws_cli:
download: True
configure: True

S3_Configurations:
create_bucket:
region:
access: public-access [ To all the images ]

Mongo_configuration:
mongo_url:

```
## Env variable

```bash

export ATLAS_CLUSTER_USERNAME=
export ATLAS_CLUSTER_PASSWORD=

export AWS_ACCESS_KEY_ID=
export AWS_SECRET_ACCESS_KEY=
export AWS_REGION=

export AWS_BUCKET_NAME=
export AWS_ECR_LOGIN_URI=
export ECR_REPOSITORY_NAME=
export ECR_REPOSITORY_URI=
export DATABASE_NAME=
```

## Cost Involved
- For s3 bucket : Since we are using S3 Standard `$0.023 per GB`
- For Ec2 Instance : Since we are using t2.small with 20Gb storage 1vCpu and 2Gb ram `$0.0248 USD per hour`
- For Mysql : Since we are using `$db.t3.micro` Free tier.
- For ECR : Storage is $0.10 per GB / month for data stored in private or public repositories.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/cr21/reverse-search-engine-data-collection

Awesome Lists containing this project

README