Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/brooksian/ds_gtdb

KMeans Clustering on Global Terrorism Database
https://github.com/brooksian/ds_gtdb

global-terrorism-database machine-learning spark sparksql zeppelin-notebook

Last synced: 22 days ago
JSON representation

KMeans Clustering on Global Terrorism Database

Awesome Lists containing this project

README

        

# Data Science in Apache Spark
## Exploring the Global Terrorism Database Dataset

**Level**: Moderate

**Language**: Scala

**Requirements**:
- [HDP 2.5](http://hortonworks.com/products/sandbox/) (or later) or [HDCloud](https://hortonworks.github.io/hdp-aws/)
- Spark 1.6.x
- Download [GTDB dataset](https://www.kaggle.com/START-UMD/gtd�)

**Author**: Ian Brooks

**Follow** LinkedIn - [Ian Brooks PhD](https://www.linkedin.com/in/ianrbrooksphd/)

## Context
Information on more than 150,000 Terrorist Attacks

The Global Terrorism Database (GTD) is an open-source database including information on terrorist attacks around the world from 1970 through 2015 (with annual updates planned for the future). The GTD includes systematic data on domestic as well as international terrorist incidents that have occurred during this time period and now includes more than 150,000 cases. The database is maintained by researchers at the National Consortium for the Study of Terrorism and Responses to Terrorism (START), headquartered at the University of Maryland. [More Information](http://start.umd.edu/gtd/)

## Instructions
1. Using the provided link, please download Global Terrorism Database CSV file from [Kaggle.](https://www.kaggle.com/START-UMD/gtd) Note: You will need a Kaggle account.

2. Using the provided link, please download the [Zeppelin Note.](https://github.com/BrooksIan/DS_GTDB)

3. Upload GTDB CSV file.

4. In Zeppelin, download the Zeppelin Note [JSON file.](https://github.com/BrooksIan/SacWomenInData) For assistance, please use the following [tutorial](https://hortonworks.com/tutorial/getting-started-with-apache-zeppelin/)

## License
Unlike all other Apache projects which use Apache license, this project uses an advanced and modern license named The Star And Thank Author License (SATA). Please see the [LICENSE](LICENSE) file for more information.