Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/markthebault/importCSVSparkCassandra
How to import a file from S3 into cassandra using spark
https://github.com/markthebault/importCSVSparkCassandra
Last synced: 10 days ago
JSON representation
How to import a file from S3 into cassandra using spark
- Host: GitHub
- URL: https://github.com/markthebault/importCSVSparkCassandra
- Owner: markthebault
- Created: 2016-10-05T13:38:30.000Z (about 8 years ago)
- Default Branch: master
- Last Pushed: 2016-10-06T09:22:13.000Z (about 8 years ago)
- Last Synced: 2024-08-02T06:16:51.487Z (3 months ago)
- Language: Scala
- Size: 839 KB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Import CSV files with spark
This short example will show you how easily is to import CSV files from your AWS S3 buckets
using spark into cassandra.## Setting up your applicaiton
Clone this repository `git clone http://gitlab.ippon.fr/mthebault/simplecsvexportspark.git`
Open the file 'src/main/ressources/project.conf' and change your settings.**You need to change the following values:**
- Cassandra
- host
- port
- keyspace
- table
- AWS
- accessKey
- secretKey
- bucket
- fileName## Build a jar
To build the Jar of your application you just need to run the command `sbt clean assembly`## Deploy the Jar on a spark cluster
To deploy a jar on a spark cluster you have to make sure you have the port 7077 accessible from the outside.
You have to push this Jar to a S3 public bucket `aws s3 cp ./target/scala-2.10/ImportCSV.jar s3://YOUR_BUCKET/ImportCVS.jar`***Once you have done that, you just need to run the spark-submit command as following:***
```
$SPARK_HOME/bin/spark-submit \
--verbose \
--master spark://IP_SPARK_MASTER:PORT \
--deploy-mode cluster \
--driver-class-path /spark/spark-1.6.1-bin-hadoop2.6/lib/spark-assembly-1.6.1-hadoop2.6.0.jar \
--class Application \
https://s3-eu-west-1.amazonaws.com/YOUR_BUCKET/ImportCSV.jar
```***Note:***
Here I am using a public s3 bucket for the jars. If you want to use your private buckets you can use the following link:
`http://AWS_S3_ACCESS_KEY:AWS_S3_SECRET_KEY@YOUR_BUCKET/ImportCSV.jar`
please consider of the http link have to be encoded [you can use this website to encode the link](http://www.freeformatter.com/url-encoder.html)
## Next
If you want to contribute to this project feel free to do it, if you see some mistake please leave me an issue.