Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/mneedham/neo4j-spark-chicago
https://github.com/mneedham/neo4j-spark-chicago
Last synced: 12 days ago
JSON representation
- Host: GitHub
- URL: https://github.com/mneedham/neo4j-spark-chicago
- Owner: mneedham
- Created: 2015-04-14T04:10:48.000Z (over 9 years ago)
- Default Branch: master
- Last Pushed: 2015-08-08T09:03:11.000Z (over 9 years ago)
- Last Synced: 2024-04-14T09:10:05.327Z (9 months ago)
- Language: HTML
- Size: 2.51 MB
- Stars: 30
- Watchers: 9
- Forks: 19
- Open Issues: 3
-
Metadata Files:
- Readme: README.adoc
Awesome Lists containing this project
README
= Importing Chicago Crime Dataset into Neo4j
The following are instructions for importing the open Chicago crime data set into Neo4j.
* Download the link:https://data.cityofchicago.org/Public-Safety/Crimes-2001-to-present/ijzp-q8t2[Chicago crime data set]
* Set a system variable containing the path to the data set. e.g.```
export CSV_FILE=Crimes_-_2001_to_present.csv
```* Download a link:https://spark.apache.org/downloads.html[pre-installed Spark distribution] into this folder.
The `create_files.sh` script assumes the use of the link:http://apache.mirrors.spacedump.net/spark/spark-1.3.0/spark-1.3.0-bin-hadoop1.tgz[1.3.0 release pre built for Hadoop 1.X]
* Unpack Spark```
tar -xvf spark-1.3.0-bin-hadoop1.tgz
```* Generate the CSV files that Neo4j import will use:
```
./create-files
```You should see the following at the beginning of the output:
```
$ ./create_files.sh
Spark assembly has been built with Hive, including Datanucleus jars on classpath
Using /Users/markneedham/projects/neo4j-spark-chicago/Crimes_-_2001_to_present.csv
...
```The following files will be generated:
```
$ ls -alh tmp/*.csv
-rwxrwxrwx 1 markneedham wheel 3.0K 14 Apr 06:48 /tmp/beats.csv
-rwxrwxrwx 1 markneedham wheel 217M 14 Apr 06:48 /tmp/crimes.csv
-rwxrwxrwx 1 markneedham wheel 84M 14 Apr 06:49 /tmp/crimesBeats.csv
-rwxrwxrwx 1 markneedham wheel 120M 14 Apr 06:49 /tmp/crimesPrimaryTypes.csv
-rwxrwxrwx 1 markneedham wheel 912B 14 Apr 06:48 /tmp/primaryTypes.csv
```Now let's clean up the CSV files to get rid of empty columns:
```
./clean.sh
```* link:http://neo4j.com/download/[Download Neo4j]
* Unpack Neo4j
```
tar -xvf neo4j-enterprise-2.2.3-unix.tar.gz
```* Create a Neo4j graph from the CSV files:
```
./import.sh [/path/to/csv/files] [/path/to/neo4j]
```== FAQS
=== Using a different version of Spark
If you want to use a different version of Spark you'll need to update the appropriate references in `create_files.sh` and `build.sbt`
=== I can't generate the CSV files
If you forget to set the `CSV_FILE` environment variable or set it to a non-existent file you'll see the following error message:
```
$ ./create_files.sh
Spark assembly has been built with Hive, including Datanucleus jars on classpath
Exception in thread "main" java.lang.RuntimeException: Cannot find CSV file [null]
...
```Set the `CSV_FILE` environment variable and you're good to go:
```
export CSV_FILE=Crimes_-_2001_to_present.csv
```=== I made some changes to the project and want to build it
If you haven't made any changes to the source code of the project there's no need to build it - a JAR is checked in and referenced in `create_files.sh`.
You can, however, rebuild the JAR with the following command:```
sbt clean package
```=== I can't build this project
sbt doesn't seem to honour `java_home` so if you're seeing the following error:
```
2015-04-14 13:57:50,116 INFO [main] spark.SparkContext (Logging.scala:logInfo(59)) - Job finished: saveAsTextFile at GenerateCSVFiles.scala:51, took 8.292283862 s
Exception in thread "main" java.lang.UnsupportedClassVersionError: MyFileUtil : Unsupported major.minor version 52.0
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:800)
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
```You should set the following environment variable:
```
export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64/
```And then retry:
```
PATH=$JAVA_HOME/bin:$PATH sbt clean package
```