https://github.com/bchoubert/spark-java-wordcount
https://github.com/bchoubert/spark-java-wordcount
polytech-lyon
Last synced: 3 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/bchoubert/spark-java-wordcount
- Owner: bchoubert
- Created: 2017-01-16T20:39:08.000Z (over 8 years ago)
- Default Branch: master
- Last Pushed: 2017-01-16T21:05:32.000Z (over 8 years ago)
- Last Synced: 2025-03-11T13:53:01.074Z (3 months ago)
- Topics: polytech-lyon
- Language: Java
- Size: 49.8 KB
- Stars: 0
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# spark-java-wordcount
This repo is an example of Spark pairing keys over a text file.
The goal of this is to count words from a poem using a Map - Pair - Reduce operation.
## Input file
The poeme.txt is a 2978 line-long file separated into sections. It represents a foreign poem translated into French.
## Execute the project
With spark and hadoop installed, you must put the file on the hadoop disk :`hadoop fs -put poeme.txt /test`
Next, after having compiled the project (with Maven for example : `mvn clean package`), you will execute the project :
`hadoop jar NameOfYourJar.jar WordCount /test/poeme.txt /results`
You can see the results using
(Hue) for example.
## Raw results
Here is a sample of the results :
```
(sentinelles,1)
(souvent,8)
(Elles,1)
(prairies;,1)
(Soulevait,1)
(soupir,3)
(épais,5)
(filet,2)
(derniers,3)
(Bassin,2)
(collines;,1)
(ridé,1)
(Pauvre,1)
(lumière,5)
(nom,6)
(Viennent,2)
(saisie,1)
(guider,2)
(fuir,4)
(L'homme,1)
(tranquilles,1)
(distrait,1)
(demeure;,1)
(gentille:,1)
(s'endormir,1)
(Prétendait,1)
```
It details, for each word, the number of occurence in the poem.