Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/mcddhub/mcdd-big-data-study
Study project for big data (Hadoop, Zookeeper, Kafka, Flink, Spark)
https://github.com/mcddhub/mcdd-big-data-study
big-data data-processing docker flink hadoop kafka spark zookeeper
Last synced: 3 months ago
JSON representation
Study project for big data (Hadoop, Zookeeper, Kafka, Flink, Spark)
- Host: GitHub
- URL: https://github.com/mcddhub/mcdd-big-data-study
- Owner: mcddhub
- License: mit
- Created: 2024-09-11T10:52:08.000Z (4 months ago)
- Default Branch: main
- Last Pushed: 2024-09-23T13:37:07.000Z (4 months ago)
- Last Synced: 2024-09-28T07:15:41.109Z (4 months ago)
- Topics: big-data, data-processing, docker, flink, hadoop, kafka, spark, zookeeper
- Language: Dockerfile
- Homepage: https://mcddhub.github.io/mcdd-big-data-study/
- Size: 2.74 MB
- Stars: 2
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
- Security: SECURITY.md
Awesome Lists containing this project
README
Mcdd-Big-Data-Study
Study project for big data (Hadoop, Zookeeper, Kafka, Flink, Spark)
[![License](https://img.shields.io/github/license/mcddhub/mcdd-data-structure-study)](https://github.com/mcddhub/mcdd-data-structure-study/blob/main/LICENSE)
[![GitHub stars](https://img.shields.io/github/stars/mcddhub/mcdd-data-structure-study)](https://github.com/mcddhub/mcdd-data-structure-study)---
## Features ✨
> **Supported Technologies**:
>
> - **Hadoop 3.3.6** (with JDK 8.0.352-zulu, Maven 3.6.3)
> - **Zookeeper 3.9.2**
> - **Kafka 2.12-3.7.1**---
## Installation 📦
1. Clone the repository:
```bash
git clone https://github.com/mcddhub/mcdd-big-data-study.git --depth=1 && cd mcdd-big-data-study
```
2. Build the Docker image:
```bash
cd docker
docker build -t caobaoqi1029/big-data-study:x.x.x .
```> **Note**: Replace `x.x.x` with the appropriate version number.
3. Start the containers:
```bash
docker compose up -d
```
---
## Configuration 🛠
1. Connect to the remote server via **VS Code** and attach to a running container.
2. Install the **Java Dev** extension in VS Code.
3. Restart the extension host to apply changes.
4. Initialize Hadoop environment:
```bash
docker exec -it master bash
hdfs namenode -format
```
5. Start Hadoop services:
```bash
start-all.sh
```
6. Use the following commands to interact with Hadoop:
```bash
vim input.txt
hdfs dfs -put -f ./input.txt /
hdfs dfs -ls /
```
7. Build and run the Hadoop job:
```bash
mvn clean package
cd target/
hadoop jar big-data.jar
```> **Tip**: You can set the environment variable to run Java directly:
> ```bash
> export CLASSPATH=$CLASSPATH:/tmp/
> # Add this to .bashrc for persistence.
> ```
8. View the output:
```bash
hdfs dfs -ls /output
hdfs dfs -cat /output/part-r-00000
```
---
## Contributing 🤝
We welcome contributions! Feel free to submit a pull request. For more details, see
the [Contribution Guide](https://github.com/mcddhub/mcdd-big-data-study/blob/main/CONTRIBUTING.md).---
## License 📄
This project is licensed under the MIT License. See
the [LICENSE](https://github.com/mcddhub/mcdd-big-data-study/blob/main/LICENSE) file for details.---
## Support 💖
If you find this project helpful, consider giving it a ⭐️ on [GitHub](https://github.com/mcddhub/mcdd-big-data-study)!
---
## Star History ⭐