https://github.com/meanlifevn/book_toscrape_py
Fetching all the books raw on the books.toscrape.com. Create simple API.
https://github.com/meanlifevn/book_toscrape_py
api booktoscrape python selenium
Last synced: 2 months ago
JSON representation
Fetching all the books raw on the books.toscrape.com. Create simple API.
- Host: GitHub
- URL: https://github.com/meanlifevn/book_toscrape_py
- Owner: meanlifevn
- Created: 2025-10-27T05:39:04.000Z (8 months ago)
- Default Branch: main
- Last Pushed: 2025-10-27T06:01:15.000Z (8 months ago)
- Last Synced: 2025-11-05T04:04:19.950Z (8 months ago)
- Topics: api, booktoscrape, python, selenium
- Language: HTML
- Homepage:
- Size: 28.3 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.MD
Awesome Lists containing this project
README
# 📚 Books Web Scraping & REST API Project
## 🧩 Project Overview
1. **Part 1:** Scrape book data from https://books.toscrape.com
2. **Part 2:** Use the REST Countries API to assign random publisher countries
3. **Part 3:** Build a FastAPI REST API to view, add, and delete books
---
## ⚙️ Requirements
```powershell
pip install -r requirements.txt
bash setup.sh
```
---
## 🚀 Run Instructions
### 🕷️ 1. Scrape book data
```powershell
python p1_scrape_books.py
```
Output:
- `raw_books/` folder
- `html_backup/` folder
### 🌍 2. Add random countries
```powershell
python p2_add_country_data.py
```
Output:
- `raw_books/books_with_country.csv`
- `raw_books/books_with_country.json`
### 🧠 3. Start REST API
```powershell
uvicorn p3_books_api:app --reload
```
Then open your browser at:
- Docs UI: [http://127.0.0.1:8000/docs](http://127.0.0.1:8000/docs)
- All books: [http://127.0.0.1:8000/books](http://127.0.0.1:8000/books)
### 💾 Example Requests
#### Filter by country:
```powershell
curl.exe -X GET http://127.0.0.1:8000/books?country=Slovakia
```
#### How to add a new book:
Create a file `book_add.json`:
```commandline
{
"book_title": "Test Book",
"price": "£12.99",
"availability": 16,
"link": "https://example.com",
"rating": 5,
"publisher_country": "Japan"
}
```
```powershell
curl.exe -X POST "http://127.0.0.1:8000/books" `
-H "Content-Type: application/json" `
-d "@book_add.json"
```
#### Delete a book:
```powershell
curl.exe -X DELETE 'http://127.0.0.1:8000/books/Test%20Book' `
-H 'accept: application/json'
```
## 📂 Output Files
- `raw_books/Self Help_3_books.csv`
- `raw_books/books_with_country.csv`
- `raw_books/books_with_country.json`
- `html_backup/` (raw HTML for each book)
## ⚠️ Notes
- Make sure to run **Part 1** before **Part 2**, and **Part 3** before starting the API.
- All commands are tested and run in **PowerShell inside PyCharm** for correct execution on Windows.
- Use PowerShell when executing the `curl.exe` commands to handle special characters properly.