https://github.com/tyleryep/landmark
CS 230 Project
https://github.com/tyleryep/landmark
Last synced: 4 months ago
JSON representation
CS 230 Project
- Host: GitHub
- URL: https://github.com/tyleryep/landmark
- Owner: TylerYep
- Created: 2019-04-16T17:07:04.000Z (over 6 years ago)
- Default Branch: main
- Last Pushed: 2021-01-20T08:04:49.000Z (almost 5 years ago)
- Last Synced: 2025-02-09T08:16:58.720Z (11 months ago)
- Language: Jupyter Notebook
- Size: 12.2 MB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Landmark Recognition
#### CS 230 Project
Main Challenge:
https://www.kaggle.com/c/landmark-retrieval-2019/overview
Baseline Model:
https://www.kaggle.com/c/landmark-recognition-challenge/discussion/57919
## Step 1: Install Conda Ennviroment
Run ``` conda env create -f ennviroment.yml ```.
### Step 2: Download Dataset CSV Link
https://www.kaggle.com/c/landmark-retrieval-2019/data
The above link contains csv files with links to all of the images for the train and test sets. Unzip the folder and put it into data/images/, and then specify the number of examples you want to download in const.py. You can also manually change whether you want to download from the train, dev, or test set.
### Step 3: Get Subset of Data
Run ``` python preprocessing/subset-data.py ```.
(Note: everything should be run from the ```landmark/``` level.)
This file outputs a modified ```train-subset.csv``` file to fetch images from. You can specify how many unique landmarks you want and how many of each you want by changing variables in ```const.py```. For our project, we will use 100,000 random images sampled from the full ```train.py``` dataset.
### Step 3: Download Images
Run ``` python download-images.py ```.
Hopefully this doesn't take forever. If you simply want all of the images, use the .sh file or download from a link on the Kaggle page.
## Workflow
Basically run train.py, which currently relies on three places: dataset, const, and layers.
dataset.py
const.py
train.py
test.py
util.py
## To ask TAs:
- Do we still want data augmentation when we have too much training data?