https://github.com/codingforentrepreneurs/scrape-me
A Python-based Web Application to Practice Web Scraping Locally.
https://github.com/codingforentrepreneurs/scrape-me
Last synced: 6 months ago
JSON representation
A Python-based Web Application to Practice Web Scraping Locally.
- Host: GitHub
- URL: https://github.com/codingforentrepreneurs/scrape-me
- Owner: codingforentrepreneurs
- Created: 2024-02-08T23:51:59.000Z (almost 2 years ago)
- Default Branch: main
- Last Pushed: 2024-02-09T06:10:39.000Z (almost 2 years ago)
- Last Synced: 2025-07-04T22:46:36.622Z (6 months ago)
- Language: HTML
- Size: 344 KB
- Stars: 13
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Scrape Me
The Scrape Me project is made so you can learn web scraping without overloading any external website.
## Run it with Docker
```
docker pull codingforentrepreneurs/scrape-me:latest
docker run -p 8101:8101 --env PORT=8101 codingforentrepreneurs/scrape-me:latest
```
## Run with Python 3.10+
### Installation
```bash
mkdir -p ~/practice
cd ~/practice
git clone https://github.com/codingforentrepreneurs/Scrape-Me
cd Scrape-Me
```
Create an environment and activate it (mac/linux)
```
python3 -m venv venv
source venv/bin/activate
```
Create an environment and activate it (windows)
```
C:Python312\python.exe -m venv venv
venv\Scripts\activate
```
Install the requirements
```
pip install -r requirements.txt
```
> The `Waitress` package is used over Gunicorn so everyone can run the application easily (especially Windows users).
### Usage
```python
python app.py
```
This will run a Python-based web server with a default port at 8101.
or
```python
python app.py 8001
```
or
```python
PORT=8002 python app.py
```
### HTML
Each template within the `html_templates` directory corresponds to a URL path:
For example:
- `html_templates/index.html` corresponds to `http://localhost:8101/`
- `html_templates/soup.html` corresponds to `http://localhost:8101/soup/`
- `html_templates/timestamp/index.html` corresponds to `http://localhost:8101/timestamp/`
Assuming the Scrape Me server is running on `http://localhost:8101/`, you can run commands like:
```
curl http://localhost:8101/
curl http://localhost:8101/soup/
curl http://localhost:8101/timestamp/
```
If you installed Python requests (which is optional), you can then run requests like:
```
import requests
r = requests.get('http://localhost:8101/')
r.text
```
### Single Page Django App
This repo also serves as a practical example of creating a single-page Django application. In this case, Django will automatically serve HTML documents (with support for the Django Template Engine) and automatically create url paths for each HTML document.
To learn more about building single-page Django applications, watch https://www.youtube.com/watch?v=F91BTQnxV6w
Waitress is used over Gunicorn to allow Windows users to run the application easily.
## Contributing
If you'd like to contribute, please fork the repository create a feature, and submit a pull request.
The goal is to make local and dynamic web server so it's easier to practice web scraping without getting blocked or creating issues with public websites.