{"id":21078655,"url":"https://github.com/bchoubert/spark-java-wordcount","last_synced_at":"2025-07-06T21:11:11.601Z","repository":{"id":90191635,"uuid":"79157373","full_name":"bchoubert/spark-java-wordcount","owner":"bchoubert","description":null,"archived":false,"fork":false,"pushed_at":"2017-01-16T21:05:32.000Z","size":51,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-07-06T08:51:56.002Z","etag":null,"topics":["polytech-lyon"],"latest_commit_sha":null,"homepage":null,"language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/bchoubert.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2017-01-16T20:39:08.000Z","updated_at":"2017-08-28T21:43:14.000Z","dependencies_parsed_at":null,"dependency_job_id":"fd16de0b-2f51-4585-af64-c662d0e53c11","html_url":"https://github.com/bchoubert/spark-java-wordcount","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/bchoubert/spark-java-wordcount","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bchoubert%2Fspark-java-wordcount","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bchoubert%2Fspark-java-wordcount/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bchoubert%2Fspark-java-wordcount/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bchoubert%2Fspark-java-wordcount/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/bchoubert","download_url":"https://codeload.github.com/bchoubert/spark-java-wordcount/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bchoubert%2Fspark-java-wordcount/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":263973362,"owners_count":23537977,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["polytech-lyon"],"created_at":"2024-11-19T19:41:15.208Z","updated_at":"2025-07-06T21:11:11.585Z","avatar_url":"https://github.com/bchoubert.png","language":"Java","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cimg src=\"http://spark.apache.org/images/spark-logo-trademark.png\" alt=\"Spark Logo\" height=\"200\"/\u003e\n\n# spark-java-wordcount\n\nThis repo is an example of Spark pairing keys over a text file.\n\nThe goal of this is to count words from a poem using a Map - Pair - Reduce operation.\n\n## Input file\n\nThe poeme.txt is a 2978 line-long file separated into sections. It represents a foreign poem translated into French.\n\n## Execute the project\nWith spark and hadoop installed, you must put the file on the hadoop disk :\n\n`hadoop fs -put poeme.txt /test`\n\nNext, after having compiled the project (with Maven for example : `mvn clean package`), you will execute the project :\n\n`hadoop jar NameOfYourJar.jar WordCount /test/poeme.txt /results`\n\nYou can see the results using \u003cimg src=\"http://gethue.com/wp-content/uploads/2014/03/hue_logo_300dpi_huge.png\" height=\"15\"/\u003e (Hue) for example.\n\n## Raw results\n\nHere is a sample of the results :\n```\n(sentinelles,1)\n(souvent,8)\n(Elles,1)\n(prairies;,1)\n(Soulevait,1)\n(soupir,3)\n(épais,5)\n(filet,2)\n(derniers,3)\n(Bassin,2)\n(collines;,1)\n(ridé,1)\n(Pauvre,1)\n(lumière,5)\n(nom,6)\n(Viennent,2)\n(saisie,1)\n(guider,2)\n(fuir,4)\n(L'homme,1)\n(tranquilles,1)\n(distrait,1)\n(demeure;,1)\n(gentille:,1)\n(s'endormir,1)\n(Prétendait,1)\n```\nIt details, for each word, the number of occurence in the poem.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbchoubert%2Fspark-java-wordcount","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbchoubert%2Fspark-java-wordcount","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbchoubert%2Fspark-java-wordcount/lists"}