https://github.com/techalhan826/hadoop-tasks
Hadoop MapReduce Tasks Java - Big Data Project 🚀
https://github.com/techalhan826/hadoop-tasks
bigdata hadoop hadoop-hdfs mapreduce-java
Last synced: 4 months ago
JSON representation
Hadoop MapReduce Tasks Java - Big Data Project 🚀
- Host: GitHub
- URL: https://github.com/techalhan826/hadoop-tasks
- Owner: TechAlhan826
- Created: 2025-02-19T16:19:12.000Z (4 months ago)
- Default Branch: master
- Last Pushed: 2025-02-19T16:45:04.000Z (4 months ago)
- Last Synced: 2025-02-19T17:36:07.621Z (4 months ago)
- Topics: bigdata, hadoop, hadoop-hdfs, mapreduce-java
- Language: Java
- Homepage:
- Size: 265 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Hadoop MapReduce Tasks - Big Data Project 🚀
🌍 **Use Case:** Learning & Implementing Core Hadoop MapReduce Concepts
📂 **Repository Purpose:** Educational, hands-on Hadoop examples for beginners---
## 📘 **Overview**
This repository demonstrates core **Hadoop MapReduce tasks** with detailed examples, step-by-step explanations, and ready-to-run JAR files. Whether you're exploring Hadoop for the first time or solidifying your knowledge, this project covers real-world big data scenarios like word counting, data sorting, temperature analysis, and inverted indexing.---
## 🛠 **Prerequisites**
- **Hadoop 3.3.6 installed and configured**
- **Java JDK 8+ installed**
- **SSH configured and Hadoop services running**
- **Dataset uploaded to HDFS**Start Hadoop Services:
```bash
sudo service ssh restart
sbin/start-all.sh
```Verify Hadoop is Running:
```bash
jps
```Expected processes: **NameNode, DataNode, ResourceManager, NodeManager**
---
## 🗃 **Project Structure**
```
/root/hadoop-project/
├── share/hadoop/mapreduce/hadoop-mapreduce-examples-3.3.6.jar
├── word-count/
├── inverted-index/
├── sorting/
├── mean-temp-calculation/
├── log-analysis/
├── MeanTemperature.java
├── InvertedIndex.java
├── txt-to-seq.py
└── generate-temp-data.py
```---
## 🔥 **Implemented Tasks**
### 📊 **1. Word Count**
**What it does:** Counts the frequency of words in a large text file.**Command:**
```bash
hadoop jar hadoop-mapreduce-examples-3.3.6.jar wordcount \
/big-data/word-count/input.txt \
/big-data/word-count/output
```**Example Input:**
```
Hadoop is amazing.
Big data is the future.
```
**Example Output:**
```
Hadoop 1
is 2
amazing 1
```### 📈 **2. Sorting (Handling Sequence Files)**
**What it does:** Sorts a large dataset of random numbers.**Steps:**
1. **Generate Random Data (SequenceFile):**
```bash
hadoop jar hadoop-mapreduce-examples-3.3.6.jar randomtextwriter \
/big-data/sorting/random_data.seq
```2. **Perform Sorting:**
```bash
hadoop jar hadoop-mapreduce-examples-3.3.6.jar sort \
/big-data/sorting/random_data.seq \
/big-data/sorting/sorted_output
```3. **Convert SequenceFile to Text:**
```bash
hdfs dfs -cat /big-data/sorting/sorted_output/part-r-00000 > sorted_numbers.txt
```**Example Input:**
```
32220
11962
22549
```
**Example Output:**
```
11962
22549
32220
```### 🌡 **3. Mean Temperature Calculation (Custom Java)**
**What it does:** Calculates the average temperature for each city.**Example Input:**
```
2023-01-01,Mumbai,32.3
2023-01-01,Delhi,33.5
```
**Example Output:**
```
Mumbai 32.3
Delhi 33.5
```**Compile & Run:**
```bash
javac -classpath `hadoop classpath` -d . MeanTemperature.java
jar cf mean-temp.jar MeanTemperature*.class
hadoop jar mean-temp.jar MeanTemperature \
/big-data/mean-temp-calculation/temperature_data.txt \
/big-data/mean-temp-calculation/output
```### 🧠 **4. Inverted Index (Custom Java)**
**What it does:** Builds an index of words and the documents they appear in.**Example Input:**
```
Doc1: Hadoop is powerful.
Doc2: Big data powers insights.
```
**Example Output:**
```
Hadoop Doc1
is Doc1
powerful Doc1
Big Doc2
data Doc2
```**Compile & Run:**
```bash
javac -classpath `hadoop classpath` -d . InvertedIndex.java
jar cf inverted-index.jar InvertedIndex*.class
hadoop jar inverted-index.jar InvertedIndex \
/big-data/inverted-index/input.txt \
/big-data/inverted-index/output
```---
## 📂 **File Organization**
All the necessary files for JAR generation (Java source code) and supporting scripts are available in this repository. You can compile and package them directly.---
## 🧠 **Learning Takeaways**
- **WordCount & Grep tasks** can be handled by Hadoop's default JAR.
- **Sorting** requires converting text files to SequenceFiles for performance optimization.
- **Custom Java programs** are necessary for more advanced tasks (like temperature calculation & inverted index).
- **HDFS & Hadoop CLI mastery** is crucial for debugging and verifying results.---
## 🔧 **Useful Commands**
📂 **Create Folders in HDFS:**
```bash
hdfs dfs -mkdir /big-data/sorting
```📂 **Upload Files to HDFS:**
```bash
hdfs dfs -put local_file.txt /big-data/word-count/
```🔍 **View HDFS Files:**
```bash
hdfs dfs -ls /big-data
```📤 **Fetch Results:**
```bash
hdfs dfs -cat /big-data/word-count/output/part-r-00000
```---
## 🚀 **Conclusion**
This repository is a comprehensive guide for learning Hadoop through practical tasks. It bridges the gap between theory and real-world big data scenarios, helping you master HDFS, MapReduce, and Java-based data processing.📢 **If you find this useful, feel free to star the repo, fork it, and contribute!**
---
📩 **Contact Me:**
🌐 **Website:** [techyalhan.in](https://techyalhan.in)
🐦 **Instagram:** [@TechyAlhan](https://www.instagram.com/alhan_826)---