Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/mcddhub/mcdd-big-data-study

Study project for big data (Hadoop, Zookeeper, Kafka, Flink, Spark)
https://github.com/mcddhub/mcdd-big-data-study

big-data data-processing docker flink hadoop kafka spark zookeeper

Last synced: 3 months ago
JSON representation

Study project for big data (Hadoop, Zookeeper, Kafka, Flink, Spark)

Awesome Lists containing this project

README

        




Mcdd-Big-Data-Study


Mcdd-Big-Data-Study


Study project for big data (Hadoop, Zookeeper, Kafka, Flink, Spark)

[![License](https://img.shields.io/github/license/mcddhub/mcdd-data-structure-study)](https://github.com/mcddhub/mcdd-data-structure-study/blob/main/LICENSE)
[![GitHub stars](https://img.shields.io/github/stars/mcddhub/mcdd-data-structure-study)](https://github.com/mcddhub/mcdd-data-structure-study)

---

## Features ✨

> **Supported Technologies**:
>
> - **Hadoop 3.3.6** (with JDK 8.0.352-zulu, Maven 3.6.3)
> - **Zookeeper 3.9.2**
> - **Kafka 2.12-3.7.1**

---

## Installation 📦

1. Clone the repository:
```bash
git clone https://github.com/mcddhub/mcdd-big-data-study.git --depth=1 && cd mcdd-big-data-study
```
2. Build the Docker image:
```bash
cd docker
docker build -t caobaoqi1029/big-data-study:x.x.x .
```

> **Note**: Replace `x.x.x` with the appropriate version number.


Docker Build Image
Docker Build Complete

3. Start the containers:
```bash
docker compose up -d
```


Docker Compose

---

## Configuration 🛠

1. Connect to the remote server via **VS Code** and attach to a running container.


VS Code Container Connection
Container Connection

2. Install the **Java Dev** extension in VS Code.


Java Dev Extension

3. Restart the extension host to apply changes.


Restarting Extension Host

4. Initialize Hadoop environment:
```bash
docker exec -it master bash
hdfs namenode -format
```


HDFS Format

5. Start Hadoop services:
```bash
start-all.sh
```


Hadoop Start

6. Use the following commands to interact with Hadoop:
```bash
vim input.txt
hdfs dfs -put -f ./input.txt /
hdfs dfs -ls /
```


HDFS Commands

7. Build and run the Hadoop job:
```bash
mvn clean package
cd target/
hadoop jar big-data.jar
```

> **Tip**: You can set the environment variable to run Java directly:
> ```bash
> export CLASSPATH=$CLASSPATH:/tmp/
> # Add this to .bashrc for persistence.
> ```


Java Execution

8. View the output:
```bash
hdfs dfs -ls /output
hdfs dfs -cat /output/part-r-00000
```


Output View

---

## Contributing 🤝

We welcome contributions! Feel free to submit a pull request. For more details, see
the [Contribution Guide](https://github.com/mcddhub/mcdd-big-data-study/blob/main/CONTRIBUTING.md).


Thanks to all contributors:



Contributors

---

## License 📄

This project is licensed under the MIT License. See
the [LICENSE](https://github.com/mcddhub/mcdd-big-data-study/blob/main/LICENSE) file for details.

---

## Support 💖

If you find this project helpful, consider giving it a ⭐️ on [GitHub](https://github.com/mcddhub/mcdd-big-data-study)!

---

## Star History ⭐


Star History Chart