Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/cooperhammond/gia
:robot::camera: A powerful image aggregator for data science projects
https://github.com/cooperhammond/gia
Last synced: 20 days ago
JSON representation
:robot::camera: A powerful image aggregator for data science projects
- Host: GitHub
- URL: https://github.com/cooperhammond/gia
- Owner: cooperhammond
- License: gpl-3.0
- Created: 2020-06-02T03:15:51.000Z (over 4 years ago)
- Default Branch: master
- Last Pushed: 2020-06-04T00:50:53.000Z (over 4 years ago)
- Last Synced: 2024-11-24T13:06:21.124Z (about 1 month ago)
- Language: Python
- Homepage:
- Size: 21.5 KB
- Stars: 2
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# gia: General Image Aggregator
[![](https://img.shields.io/github/languages/code-size/cooperhammond/gia?color=green&style=flat-square)](https://saythanks.io/to/kepoorh%40gmail.com)
[![](https://img.shields.io/pypi/v/gia?color=blue&style=flat-square)](https://pypi.org/project/gia/)
[![](https://img.shields.io/badge/say-thanks-ff69b4?style=flat-square)](https://saythanks.io/to/kepoorh%40gmail.com)> 🤖📷 A powerful image aggregator for data science projects
This is a CLI tool and/or library for automating/standardizing what images you download for a data science project.
[![asciicast](https://asciinema.org/a/336440.svg)](https://asciinema.org/a/336440)
---
## Installation
First, download the `chromedriver` binary [here](https://chromedriver.chromium.org/downloads),
and point the environment variable `CHROME_DRIVER_LOC` to it.### Pip
```bash
pip install gia
```### From source
```bash
git clone https://github.com/cooperhammond/gia
cd gia
sudo python setup.py install
```## Usage
### CLI Usage
```
usage: gia [-h] [--depth DEPTH] destination classes queriespositional arguments:
destination ABSOLUTE path for where your images should be
downloaded
classes a python list in a string of the classes for the
queries
queries a python list of lists in a string of the queries
corresponding to each classoptional arguments:
-h, --help show this help message and exit
--depth DEPTH, -d DEPTH the default depth to go through queries for images
```The "depth" of a query literally indicates how far down the Google results page the scraper will scroll.
With a depth of 0, there will be no scrolling, a depth of 1 indicates that the `end` key will be passed
twice, a depth of two means two `end` presses, and so on. Each increment of depth means means 80 images will
be downloaded from that query, but the exact number varies depending on Google's mood and your browser's cache.
It's meant to be a general indicator of how much to weight queries.The destination needs to be an _absolute_ path because it is being plugged into chromedriver as the default
download folder and chromedriver has no memory of the location it was spawned from.Queries are plugged directly into the Google search bar, so you can use all of the fancy tricks you normally
can do with it.Example usage:
```bash
$ gia ~/dev/data "['jeff bezos', 'bill gates']" "[['jeff bezos', 'jeff bezos face'], ['bill gates', 'bill gates face']]"
```
Output:
```bash
~/dev/data
+-- _jeff bezos
+-- 00000.jpg
+-- 00001.jpg
+-- ...
+-- _bill gates
+-- 00000.jpg
+-- 00001.jpg
+-- ...
+-- jeff bezos.csv
+-- bill gates.csv
```By default there depth is 0, so there is no scroll, but the `--depth` parameter can set the default depth for every query.
If you don't want a query weighted so heavily, you can be more specific:
```
[[..., 'pepperoni pizza', ...], ...] => [[..., ['pepperoni', 5], ...]]
```Example usage:
```bash
$ gia ~/dev/data --depth 3 "['pizza']" "[['pineapple pizza', 'pepperoni pizza', 'egg pizza']]"
```
Output:
```bash
~/dev/data
+-- _pizza # will have a much larger amount of images compared to above example
+-- 00000.jpg
+-- 00001.jpg
+-- ...
+-- pizza.csv
```### Module Usage
Everything that applies for the CLI applies to the library as well.
```python
from gia import ImageAggregatordestination = '~/dev/cool-data-science-project/data'
classes = ['steve jobs', 'jack black']
queries = [
["steve jobs' face", ['"steve jobs" -jack -black', 5]],
["jack black's face", ['"jack black" -steve -jobs', 4]],
]
depth = 2ia = ImageAggregator(destination, classes, queries, default_depth=depth)
ia.aggregate()
```