https://github.com/bchoubert/spark-java-wordcount

polytech-lyon

Last synced: 3 months ago
JSON representation

Host: GitHub
URL: https://github.com/bchoubert/spark-java-wordcount
Owner: bchoubert
Created: 2017-01-16T20:39:08.000Z (over 8 years ago)
Default Branch: master
Last Pushed: 2017-01-16T21:05:32.000Z (over 8 years ago)
Last Synced: 2025-03-11T13:53:01.074Z (3 months ago)
Topics: polytech-lyon
Language: Java
Size: 49.8 KB
Stars: 0
Watchers: 3
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

        

# spark-java-wordcount

This repo is an example of Spark pairing keys over a text file.

The goal of this is to count words from a poem using a Map - Pair - Reduce operation.

## Input file

The poeme.txt is a 2978 line-long file separated into sections. It represents a foreign poem translated into French.

## Execute the project

With spark and hadoop installed, you must put the file on the hadoop disk :

`hadoop fs -put poeme.txt /test`

Next, after having compiled the project (with Maven for example : `mvn clean package`), you will execute the project :

`hadoop jar NameOfYourJar.jar WordCount /test/poeme.txt /results`

You can see the results using  (Hue) for example.

## Raw results

Here is a sample of the results :

```

(sentinelles,1)

(souvent,8)

(Elles,1)

(prairies;,1)

(Soulevait,1)

(soupir,3)

(épais,5)

(filet,2)

(derniers,3)

(Bassin,2)

(collines;,1)

(ridé,1)

(Pauvre,1)

(lumière,5)

(nom,6)

(Viennent,2)

(saisie,1)

(guider,2)

(fuir,4)

(L'homme,1)

(tranquilles,1)

(distrait,1)

(demeure;,1)

(gentille:,1)

(s'endormir,1)

(Prétendait,1)

```

It details, for each word, the number of occurence in the poem.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/bchoubert/spark-java-wordcount

Awesome Lists containing this project

README