Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/rwynn/monstache-showcase

monstache showcase to visualize open data
https://github.com/rwynn/monstache-showcase

elasticsearch kibana mongodb monstache open-data visualization

Last synced: 3 months ago
JSON representation

monstache showcase to visualize open data

Host: GitHub
URL: https://github.com/rwynn/monstache-showcase
Owner: rwynn
License: mit
Created: 2019-03-15T01:30:15.000Z (almost 6 years ago)
Default Branch: master
Last Pushed: 2021-11-14T23:57:10.000Z (about 3 years ago)
Last Synced: 2023-03-10T20:08:31.992Z (almost 2 years ago)
Topics: elasticsearch, kibana, mongodb, monstache, open-data, visualization
Language: Shell
Size: 53.5 MB
Stars: 28
Watchers: 4
Forks: 18
Open Issues: 2
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

        # Monstache showcase

This project shows how monstache can be applied to real data from data.gov.  The `mongoimport` tool will be used

to import 6.5 million records of crime data.

During the import monstache will be listening for change events on the entire MongoDB deployment and indexing 

those documents into Elasticsearch.  Before importing monstache will do a little bit of transformation on the 

data using a golang plugin to enable certain aggregations in Kibana. 

The golang plugin was used over a Javascript plugin after noticing a dramatic performance increase.

I recommend that your machine has at least 16GB RAM, 20GB free disk, and 4 or more CPU cores. You may be able to 

get away with less by decreasing the heap sizes for Elasticsearch in the docker-compose files.

First you will need to make sure you have `docker` and `docker-compose` installed.  On desktop systems like 

Docker Desktop for Mac and Windows, Docker Compose is included as part of those desktop installs.

The versions at this project creation time were:

```

Client:

 Version:           18.09.3

 API version:       1.39

 Go version:        go1.10.8

 Git commit:        774a1f4

 Built:             Thu Feb 28 06:40:58 2019

 OS/Arch:           linux/amd64

 Experimental:      false

Server: Docker Engine - Community

 Engine:

  Version:          18.09.3

  API version:      1.39 (minimum version 1.12)

  Go version:       go1.10.8

  Git commit:       774a1f4

  Built:            Thu Feb 28 05:59:55 2019

  OS/Arch:          linux/amd64

  Experimental:     false

docker-compose version 1.23.1, build b02f1306

docker-py version: 3.5.0

CPython version: 3.6.7

OpenSSL version: OpenSSL 1.1.0f  25 May 2017

```

Next you will want to download the public [dataset](https://catalog.data.gov/dataset/2003-ward-dataset-csvs-crimes-2001-to-present). You will

want the .CSV format.  Please read all the rules and caveats associated with the public dataset before proceeding.

When you have downloaded this large 1.5GB file you should copy it to the following location:

```

monstache-showcase/mongodb/scripts/data/crimes.csv

```

Use the following command to note the number of documents to expect later during the import.

```

# subtract 1 for the csv header

wc -l monstache-showcase/mongodb/scripts/data/crimes.csv

```

You are now ready to run docker-compose and start the import. 

```

cd monstache-showcase

./import-showcase.sh

```

The import will take a while.  During the process you will a see line like this coming from `mongoimport`:

```

c-data       | 2019-03-12T20:34:57.586+0000     imported 6820156 documents

```

That means that all the data has been loaded into MongoDB.  Now you must wait for the indexing to complete in 

Elasticsearch.  The process will periodically query the document count in Elasticsearch.  

You will see lines like this repeating forever:

```

c-config     | [

c-config     |   {

c-config     |     "health" : "green",

c-config     |     "status" : "open",

c-config     |     "index" : "chicago.crimes",

c-config     |     "uuid" : "4wShbV-LTq6-6paRsWataQ",

c-config     |     "pri" : "1",

c-config     |     "rep" : "0",

c-config     |     "docs.count" : "1198982",

c-config     |     "docs.deleted" : "0",

c-config     |     "store.size" : "359mb",

c-config     |     "pri.store.size" : "359mb"

c-config     |   }

c-config     | ]

```

The `doc_count` field in the response should eventually reach 1 less than the number you recorded from `wc -l`.

Once all the data is loaded into Elasticsearch you can bring down the containers with Ctrl-C or:

```

cd monstache-showcase

./stop-showcase.sh

```

At this point you have indexed all the data and no longer should run `import-showcase.sh` as that will index all the data

again. The import process stores the Elasticsearch data in a docker volume so it will persist between runs until you 

delete the volume.

The last step is to fire up Kibana to analyze it. To do this start only Elasticsearch and Kibana with:

```

cd monstache-showcase

./view-showcase.sh

```

Once the containers are up and healthy you can go to http://localhost:5601 on the host to load Kibana and explore data.  

In Kibana you can start from scratch and define an index-pattern. However, I recommend that you import the 

file named `export.json` from the root of monstache-showcase to get a head start.

To import you will want to go to `Management` -> `Saved Objects` and then click `Import` and upload `export.json`.

You will also want to go under `Management` -> `Advanced Settings` in Kibana and set `Timezone for date formatting`

to `UTC` to display dates correctly.

When you are finished analyzing in Kibana you can run `./stop-showcase.sh` to bring down the containers.

If you want to tear down everything and delete all the associated data you can run `./clean-showcase.sh`.  

This stops the containers and deletes the associated docker volumes.  

Please open an issue with any feedback you might have.  Thanks!