{"id":25761013,"url":"https://github.com/techalhan826/hadoop-tasks","last_synced_at":"2026-06-10T02:31:05.454Z","repository":{"id":278418272,"uuid":"935550205","full_name":"TechAlhan826/Hadoop-Tasks","owner":"TechAlhan826","description":"Hadoop MapReduce Tasks Java - Big Data Project 🚀","archived":false,"fork":false,"pushed_at":"2025-04-04T18:36:33.000Z","size":282,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-04-04T19:45:27.262Z","etag":null,"topics":["bigdata","hadoop","hadoop-hdfs","mapreduce-java"],"latest_commit_sha":null,"homepage":"","language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/TechAlhan826.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2025-02-19T16:19:12.000Z","updated_at":"2025-04-04T18:36:36.000Z","dependencies_parsed_at":"2025-02-19T17:46:53.811Z","dependency_job_id":null,"html_url":"https://github.com/TechAlhan826/Hadoop-Tasks","commit_stats":null,"previous_names":["techalhan826/hadoop-tasks"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/TechAlhan826/Hadoop-Tasks","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TechAlhan826%2FHadoop-Tasks","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TechAlhan826%2FHadoop-Tasks/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TechAlhan826%2FHadoop-Tasks/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TechAlhan826%2FHadoop-Tasks/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/TechAlhan826","download_url":"https://codeload.github.com/TechAlhan826/Hadoop-Tasks/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TechAlhan826%2FHadoop-Tasks/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34134633,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-10T02:00:07.152Z","response_time":89,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bigdata","hadoop","hadoop-hdfs","mapreduce-java"],"created_at":"2025-02-26T18:28:31.220Z","updated_at":"2026-06-10T02:31:05.440Z","avatar_url":"https://github.com/TechAlhan826.png","language":"Java","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Hadoop MapReduce Tasks - Big Data Project 🚀\n \n🌍 **Use Case:** Learning \u0026 Implementing Core Hadoop MapReduce Concepts  \n📂 **Repository Purpose:** Educational, hands-on Hadoop examples for beginners  \n\n---\n\n## 📘 **Overview**\nThis repository demonstrates core **Hadoop MapReduce tasks** with detailed examples, step-by-step explanations, and ready-to-run JAR files. Whether you're exploring Hadoop for the first time or solidifying your knowledge, this project covers real-world big data scenarios like word counting, data sorting, temperature analysis, and inverted indexing.\n\n---\n\n## 🛠 **Prerequisites**\n\n- **Hadoop 3.3.6 installed and configured**\n- **Java JDK 8+ installed**\n- **SSH configured and Hadoop services running**\n- **Dataset uploaded to HDFS**\n\nStart Hadoop Services:\n```bash\nsudo service ssh restart\nsbin/start-all.sh\n```\n\nVerify Hadoop is Running:\n```bash\njps\n```\n\nExpected processes: **NameNode, DataNode, ResourceManager, NodeManager**\n\n---\n\n## 🗃 **Project Structure**\n```\n/root/hadoop-project/\n├── share/hadoop/mapreduce/hadoop-mapreduce-examples-3.3.6.jar\n├── word-count/\n├── inverted-index/\n├── sorting/\n├── mean-temp-calculation/\n├── log-analysis/\n├── MeanTemperature.java\n├── InvertedIndex.java\n├── txt-to-seq.py\n└── generate-temp-data.py\n```\n\n---\n\n## 🔥 **Implemented Tasks**\n\n### 📊 **1. Word Count**\n**What it does:** Counts the frequency of words in a large text file.\n\n**Command:**\n```bash\nhadoop jar hadoop-mapreduce-examples-3.3.6.jar wordcount \\\n  /big-data/word-count/input.txt \\\n  /big-data/word-count/output\n```\n\n**Example Input:**\n```\nHadoop is amazing.\nBig data is the future.\n```\n**Example Output:**\n```\nHadoop 1\nis 2\namazing 1\n```\n\n### 📈 **2. Sorting (Handling Sequence Files)**\n**What it does:** Sorts a large dataset of random numbers.\n\n**Steps:**\n1. **Generate Random Data (SequenceFile):**\n```bash\nhadoop jar hadoop-mapreduce-examples-3.3.6.jar randomtextwriter \\\n  /big-data/sorting/random_data.seq\n```\n\n2. **Perform Sorting:**\n```bash\nhadoop jar hadoop-mapreduce-examples-3.3.6.jar sort \\\n  /big-data/sorting/random_data.seq \\\n  /big-data/sorting/sorted_output\n```\n\n3. **Convert SequenceFile to Text:**\n```bash\nhdfs dfs -cat /big-data/sorting/sorted_output/part-r-00000 \u003e sorted_numbers.txt\n```\n\n**Example Input:**\n```\n32220\n11962\n22549\n```\n**Example Output:**\n```\n11962\n22549\n32220\n```\n\n### 🌡 **3. Mean Temperature Calculation (Custom Java)**\n**What it does:** Calculates the average temperature for each city.\n\n**Example Input:**\n```\n2023-01-01,Mumbai,32.3\n2023-01-01,Delhi,33.5\n```\n**Example Output:**\n```\nMumbai 32.3\nDelhi 33.5\n```\n\n**Compile \u0026 Run:**\n```bash\njavac -classpath `hadoop classpath` -d . MeanTemperature.java\njar cf mean-temp.jar MeanTemperature*.class\nhadoop jar mean-temp.jar MeanTemperature \\\n  /big-data/mean-temp-calculation/temperature_data.txt \\\n  /big-data/mean-temp-calculation/output\n```\n\n### 🧠 **4. Inverted Index (Custom Java)**\n**What it does:** Builds an index of words and the documents they appear in.\n\n**Example Input:**\n```\nDoc1: Hadoop is powerful.\nDoc2: Big data powers insights.\n```\n**Example Output:**\n```\nHadoop Doc1\nis Doc1\npowerful Doc1\nBig Doc2\ndata Doc2\n```\n\n**Compile \u0026 Run:**\n```bash\njavac -classpath `hadoop classpath` -d . InvertedIndex.java\njar cf inverted-index.jar InvertedIndex*.class\nhadoop jar inverted-index.jar InvertedIndex \\\n  /big-data/inverted-index/input.txt \\\n  /big-data/inverted-index/output\n```\n\n---\n\n## 📂 **File Organization**\nAll the necessary files for JAR generation (Java source code) and supporting scripts are available in this repository. You can compile and package them directly.\n\n---\n\n## 🧠 **Learning Takeaways**\n- **WordCount \u0026 Grep tasks** can be handled by Hadoop's default JAR.\n- **Sorting** requires converting text files to SequenceFiles for performance optimization.\n- **Custom Java programs** are necessary for more advanced tasks (like temperature calculation \u0026 inverted index).\n- **HDFS \u0026 Hadoop CLI mastery** is crucial for debugging and verifying results.\n\n---\n\n## 🔧 **Useful Commands**\n\n📂 **Create Folders in HDFS:**\n```bash\nhdfs dfs -mkdir /big-data/sorting\n```\n\n📂 **Upload Files to HDFS:**\n```bash\nhdfs dfs -put local_file.txt /big-data/word-count/\n```\n\n🔍 **View HDFS Files:**\n```bash\nhdfs dfs -ls /big-data\n```\n\n📤 **Fetch Results:**\n```bash\nhdfs dfs -cat /big-data/word-count/output/part-r-00000\n```\n\n---\n\n## 🚀 **Conclusion**\nThis repository is a comprehensive guide for learning Hadoop through practical tasks. It bridges the gap between theory and real-world big data scenarios, helping you master HDFS, MapReduce, and Java-based data processing.\n\n📢 **If you find this useful, feel free to star the repo, fork it, and contribute!**\n\n---\n\n📩 **Contact Me:**  \n🌐 **Website:** [techyalhan.in](https://techyalhan.in)  \n🐦 **Instagram:** [@TechyAlhan](https://www.instagram.com/alhan_826)\n\n---\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftechalhan826%2Fhadoop-tasks","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftechalhan826%2Fhadoop-tasks","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftechalhan826%2Fhadoop-tasks/lists"}