https://github.com/policratus/sparkmage

🐘 A tool for blazing fast analysis and clustering of similar images using 🐘 Hadoop and ⚡ Spark.
https://github.com/policratus/sparkmage

big-data computer-vision hadoop image-processing spark

Last synced: 22 days ago
JSON representation

🐘 A tool for blazing fast analysis and clustering of similar images using 🐘 Hadoop and ⚡ Spark.

Host: GitHub
URL: https://github.com/policratus/sparkmage
Owner: policratus
License: gpl-3.0
Archived: true
Created: 2017-03-23T20:34:25.000Z (about 8 years ago)
Default Branch: master
Last Pushed: 2017-07-13T14:04:07.000Z (almost 8 years ago)
Last Synced: 2024-08-02T05:22:59.444Z (9 months ago)
Topics: big-data, computer-vision, hadoop, image-processing, spark
Language: Python
Homepage:
Size: 186 KB
Stars: 2
Watchers: 3
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

awesome-blazingly-fast - sparkmage - 🐘 A tool for blazing fast analysis and clustering of similar images using 🐘 Hadoop and ⚡ Spark. (Python)

README

        # 👁✨ sparkmage

![Pets Cluster](https://github.com/policratus/sparkmage/blob/master/docs/clusters.jpg)

## 🛠 Installation

Assuming that you've a Unix like OS or emulation, [git](https://git-scm.com/), [Python](https://www.python.org/) 2 or 3 and [virtualenvwrapper](https://virtualenvwrapper.readthedocs.io/en/latest/) installed, just issue:

```

mkvirtualenv sparkmage

workon sparkmage

git clone git@github.com:policratus/sparkmage.git

cd sparkmage

pip install -r requirements/requirements.txt

```

## ⚙Usage

Sparmage needs [Hadoop](https://hadoop.apache.org/) (HDFS) (tested on 2.7) and [Spark](http://spark.apache.org/) (tested on 2.1) cluster as a distributed file system and processor.

### 🎯Acquiring images

All images are imported from [Google Images](https://images.google.com). To use a specific term and download images from this term:

```

sparkmage get [term] [directory]

```

For instance, if you wish to search for dogs, just issue `sparkmage get dogs /path/to/dogs/`

### 💽Storing images

All images are first stored on local file system and after this, it's uploaded to Hadoop HDFS. To copy images from a local directory to HDFS:

```

sparkmage put http://[host]:[port] [user] [local-dir] [hdfs-dir]

```

If you want to upload your just downloaded dog images, assuming that your HDFS endpoint is `master:50070`, user owner will be `hadoop`, and your target directory is `/images/dogs`, execute `sparkmage put http://master:50070 hadoop /path/to/dogs /images/dogs`.

### 📊Analyzing images

Now, comes the time to analyze all images and separate it in groups. Just execute:

```

sparkmage analyze hdfs://[host]:[port]/path/on/hdfs \

    hdfs://[host]:[port]/cluster/output/dir \

    [number_of_groups]

```

To analyze your just uploaded dog images and find five groups on it, execute `sparkmage analyze hdfs://master:50070/images/dogs hdfs://master:50070/clusters/dogs/clusters.out 5`.

### 👁Visualizing groups

So this is the final step: see your images and find what group it pertains. Do this:

```

sparkmage save http://[host]:[port] [user] [local-dir] http://[host]:[port]/cluster/output/dir

```

For instance, to cluster your dogs images: `sparkmage save http://master:50070 hadoop /path/to/dogs http://master:50070/clusters/dogs/clusters.out`.

The cluster that image pertains will be put inside the image itself, as a watermark.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/policratus/sparkmage

Awesome Lists containing this project

README