{"id":23284321,"url":"https://github.com/umermansoor/hadoop-java-example","last_synced_at":"2025-08-20T07:07:46.613Z","repository":{"id":6354475,"uuid":"7591365","full_name":"umermansoor/hadoop-java-example","owner":"umermansoor","description":"A very simple example of using Hadoop's MapReduce functionality in Java.","archived":false,"fork":false,"pushed_at":"2013-06-18T20:28:27.000Z","size":226,"stargazers_count":73,"open_issues_count":1,"forks_count":46,"subscribers_count":7,"default_branch":"develop","last_synced_at":"2025-06-03T21:04:35.641Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/umermansoor.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2013-01-13T18:01:51.000Z","updated_at":"2024-07-21T03:33:44.000Z","dependencies_parsed_at":"2022-07-31T02:38:14.053Z","dependency_job_id":null,"html_url":"https://github.com/umermansoor/hadoop-java-example","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/umermansoor/hadoop-java-example","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/umermansoor%2Fhadoop-java-example","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/umermansoor%2Fhadoop-java-example/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/umermansoor%2Fhadoop-java-example/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/umermansoor%2Fhadoop-java-example/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/umermansoor","download_url":"https://codeload.github.com/umermansoor/hadoop-java-example/tar.gz/refs/heads/develop","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/umermansoor%2Fhadoop-java-example/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":271280056,"owners_count":24731935,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-20T02:00:09.606Z","response_time":69,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-12-20T01:40:04.043Z","updated_at":"2025-08-20T07:07:46.592Z","avatar_url":"https://github.com/umermansoor.png","language":"Java","readme":"## Hadoop Map-Reduce Example in Java\n\n**Get up and running in less than 5 minutes**\n\n### Overview\nThis program demonstrates Hadoop's Map-Reduce concept in Java using a very simple example. The input is raw data files listing earthquakes by region, magnitude and other information. \n\n\u003e nc,71920701,1,”Saturday, January 12, 2013 19:43:18 UTC”,38.7865,-122.7630,**1.5**,1.10,27,**“Northern California”**\n\nThe fields in bold are magnitude of the quake and name of region where the reading was taken, respectively. The _goal_ is to process all input files to find the maximum magnitude quake reading for every region listed. The output is in the form:\n\n        \"region_name\"      \u003cmaximum magnitude of earthquake recorded\u003e \n\nThe raw data files are in the `input/` folder.\n\n### Instructions for Setting Up Hadoop\n1. Download Hadoop 1.1.1 binary. [Mirror](http://mirror.csclub.uwaterloo.ca/apache/hadoop/common/hadoop-1.1.1/hadoop-1.1.1.tar.gz)\n\n\n2. Extract it to a folder on your computer:\n        \n        $ tar xvfz hadoop-1.1.1.tar.gz\n\n3. Setup JAVA_HOME environment variable to point to the directory where Java is installed. For my Mac OS X, I did the following:\n\n        $ export JAVA_HOME=/System/Library/Frameworks/JavaVM.framework/Versions/1.6.0/Home\n\n Note: If you are running Lion, you may want to update the JAVA_HOME to point to `java_home` command which outputs Java's home directory, that is,\n\n        $ export JAVA_HOME=$(/usr/libexec/java_home)\n\n4. Setup HADOOP_INSTALL environment variable to point the directory where you extracted hadoop binary in step 2:\n\n        $ export HADOOP_INSTALL=/Users/umermansoor/Documents/hadoop-1.1.1\n\n5. Edit the PATH environment variable:\n\n        $ export PATH=$PATH:$HADOOP_INSTALL/bin\n\n\u003e Or you can add these variables to your standard shell script. For example, checkout my Mac OSX's [`~/.bash_profile`](https://gist.github.com/4525814)\n\n### Instructions for Running the Sample\n1. Clone the project:\n\n\t    $ git clone git@github.com:umermansoor/hadoop-java-example.git\n\t\n2. Change to the project directory:\n\n\t    $ cd hadoop-java-example\n\n3. Build the project:\n\n\t    $ mvn clean install\n\n4. Setup the HADOOP_CLASSPATH environment variable to tell Hadoop where to find the java classes for the sample:\n\n\t    $ export HADOOP_CLASSPATH=target/classes/\n\n5. Run the sample. The `output` directory shouldn't exists otherwise this will fail.\n\n        $ hadoop com.umermansoor.App input/ output\n\n\u003e Note: the output will go to the `output/` folder which Hadoop will create when run. The output will be in a file called `part-r-00000`.\n\n### Common Errors:\n1. Exception: java.lang.NoClassDefFoundError\nCause: You didn't setup the HADOOP_CLASSPATH environment variable. You need to tell Hadoop where to find the java classes. \nResolution: In this case, execute the following to setup HADOOP_CLASSPATH variable to point to the `target/classes/` folder.\n\n        $ export HADOOP_CLASSPATH=target/classes/\n\n2. Exception: org.apache.hadoop.mapred.FileAlreadyExistsException or 'Output directory output already exists'. \nCause: Output directory already exists. Hadoop requires that the output directory doesn't exists when run. \nResolution: Change the output directory or remove the existing one:\n\n        $ hadoop com.umermansoor.App input/input.csv output_new \n\n\u003e Note: Hadoop failing if the output folder already exists is a good thing: it ensures that you don't accidentally overwrite your previous output, as typical Hadoop jobs take hours to complete.\n\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fumermansoor%2Fhadoop-java-example","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fumermansoor%2Fhadoop-java-example","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fumermansoor%2Fhadoop-java-example/lists"}