Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/mcddhub/mcdd-big-data-study

Study project for big data (Hadoop, Zookeeper, Kafka, Flink, Spark)
https://github.com/mcddhub/mcdd-big-data-study

big-data data-processing docker flink hadoop kafka spark zookeeper

Last synced: 3 months ago
JSON representation

Study project for big data (Hadoop, Zookeeper, Kafka, Flink, Spark)

Host: GitHub
URL: https://github.com/mcddhub/mcdd-big-data-study
Owner: mcddhub
License: mit
Created: 2024-09-11T10:52:08.000Z (4 months ago)
Default Branch: main
Last Pushed: 2024-09-23T13:37:07.000Z (4 months ago)
Last Synced: 2024-09-28T07:15:41.109Z (4 months ago)
Topics: big-data, data-processing, docker, flink, hadoop, kafka, spark, zookeeper
Language: Dockerfile
Homepage: https://mcddhub.github.io/mcdd-big-data-study/
Size: 2.74 MB
Stars: 2
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
- Security: SECURITY.md

Awesome Lists containing this project

README

Mcdd-Big-Data-Study

Study project for big data (Hadoop, Zookeeper, Kafka, Flink, Spark)

[![License](https://img.shields.io/github/license/mcddhub/mcdd-data-structure-study)](https://github.com/mcddhub/mcdd-data-structure-study/blob/main/LICENSE)
[![GitHub stars](https://img.shields.io/github/stars/mcddhub/mcdd-data-structure-study)](https://github.com/mcddhub/mcdd-data-structure-study)

---

## Features ✨

> **Supported Technologies**:
>
> - **Hadoop 3.3.6** (with JDK 8.0.352-zulu, Maven 3.6.3)
> - **Zookeeper 3.9.2**
> - **Kafka 2.12-3.7.1**

---

## Installation 📦

1. Clone the repository:
```bash
git clone https://github.com/mcddhub/mcdd-big-data-study.git --depth=1 && cd mcdd-big-data-study
```
2. Build the Docker image:
```bash
cd docker
docker build -t caobaoqi1029/big-data-study:x.x.x .
```

> **Note**: Replace `x.x.x` with the appropriate version number.

3. Start the containers:
```bash
docker compose up -d
```

---

## Configuration 🛠

1. Connect to the remote server via **VS Code** and attach to a running container.

2. Install the **Java Dev** extension in VS Code.

3. Restart the extension host to apply changes.

4. Initialize Hadoop environment:
```bash
docker exec -it master bash
hdfs namenode -format
```

5. Start Hadoop services:
```bash
start-all.sh
```

6. Use the following commands to interact with Hadoop:
```bash
vim input.txt
hdfs dfs -put -f ./input.txt /
hdfs dfs -ls /
```

7. Build and run the Hadoop job:
```bash
mvn clean package
cd target/
hadoop jar big-data.jar
```

> **Tip**: You can set the environment variable to run Java directly:
> ```bash
> export CLASSPATH=$CLASSPATH:/tmp/
> # Add this to .bashrc for persistence.
> ```

8. View the output:
```bash
hdfs dfs -ls /output
hdfs dfs -cat /output/part-r-00000
```

---

## Contributing 🤝

We welcome contributions! Feel free to submit a pull request. For more details, see
the [Contribution Guide](https://github.com/mcddhub/mcdd-big-data-study/blob/main/CONTRIBUTING.md).

Thanks to all contributors:

---

## License 📄

This project is licensed under the MIT License. See
the [LICENSE](https://github.com/mcddhub/mcdd-big-data-study/blob/main/LICENSE) file for details.

---

## Support 💖

If you find this project helpful, consider giving it a ⭐️ on [GitHub](https://github.com/mcddhub/mcdd-big-data-study)!

---

## Star History ⭐