Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/burcuyesilyurt/scrapeme_flask_app_kafka
Web Scraping, Kafka Messaging, and Flask API
https://github.com/burcuyesilyurt/scrapeme_flask_app_kafka
docker-compose dockerfile flask-application kafka kafka-consumer kafka-producer python rest-api
Last synced: 6 days ago
JSON representation
Web Scraping, Kafka Messaging, and Flask API
- Host: GitHub
- URL: https://github.com/burcuyesilyurt/scrapeme_flask_app_kafka
- Owner: burcuyesilyurt
- Created: 2024-07-25T13:42:50.000Z (6 months ago)
- Default Branch: main
- Last Pushed: 2024-07-26T12:06:04.000Z (6 months ago)
- Last Synced: 2024-11-22T16:36:26.368Z (2 months ago)
- Topics: docker-compose, dockerfile, flask-application, kafka, kafka-consumer, kafka-producer, python, rest-api
- Language: Jupyter Notebook
- Homepage:
- Size: 50.8 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Description:
## Project Overview
1. **Web Scraping**: Scrapes data from the website [scrapeme.live](https://scrapeme.live/shop/) and extracts product details.
2. **Kafka Messaging**: Sends scraped data to a Kafka topic.
3. **Data Storage**: Saves Kafka messages to a JSON file.
4. **REST API**: Provides a Flask-based REST API to access the stored data.## Files
- **Dockerfile**: Contains the instructions to build the Docker image for the Flask application.
- **markdown.dockerignore**: Specifies files and directories to be ignored by Docker when building the image.
- **requirements.txt**: Lists the Python dependencies required for the application.
- **scrape_flask_app.py**: Contains the main application code, including web scraping, Kafka integration, and Flask API.
- **kafka_data.json**: Stores data retrieved from Kafka. This file is updated by the Flask application.
- **scrape_flask_app.ipynb**: Jupyter Notebook with additional code and notes related to the project.
- **docker-compose.yml**: Defines and manages multi-container Docker applications, including Kafka, Zookeeper, and the Flask app.## Prerequisites
- Docker
- Docker Compose## Setup and Running
### 1. Clone the Repository
### 2. Build and Start the Containers
```bash
docker-compose up --build
```This command will build the Docker images and start the services defined in `docker-compose.yml`, including Kafka, Zookeeper, and the Flask application.
### 3. Testing the Application
#### Web Scraping and Kafka
1. The `scrapeme_flask_app.py` script will scrape data from the website and send it to the Kafka topic.
2. Data will be stored in the `kafka_data.json` file.#### Access the Flask API
- Navigate to `http://localhost:5001/data` in your web browser or use a tool like `curl` or `Postman` to view the data stored in `kafka_data.json`.
```bash
curl http://localhost:5001/data
```## Troubleshooting
If you encounter an error related to port usage, make sure that port 5001 is not being used by another program. You can change the port in scrape_flask_app.py, Dockerfile, and docker-compose.yml if needed. After making changes, rebuild and restart the Docker containers.