https://github.com/bdr-pro/examtopics-scraper
ExamTopics Scraper & PDF Generator
https://github.com/bdr-pro/examtopics-scraper
Last synced: 5 months ago
JSON representation
ExamTopics Scraper & PDF Generator
- Host: GitHub
- URL: https://github.com/bdr-pro/examtopics-scraper
- Owner: BDR-Pro
- License: apache-2.0
- Created: 2025-05-21T08:51:27.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2025-05-21T10:52:02.000Z (about 1 year ago)
- Last Synced: 2025-07-02T12:49:49.434Z (about 1 year ago)
- Language: Python
- Size: 9.77 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# 🖼️ ExamTopics Scraper & PDF Generator
This script automates the process of:
1. Navigating through the [ExamTopics ITILFND v4](https://www.examtopics.com/exams/itil/itilfnd-v4/view/) exam page.
2. Clicking the **Reveal** button on each question.
3. Capturing full-page screenshots.
4. Moving to the **Next** page until all questions are captured.
5. Saving all screenshots to a folder.
6. Converting the screenshots into a single PDF file.
---
## 📦 Dependencies
* Python 3.7+
* [Playwright](https://playwright.dev/python/)
* [Pillow (PIL)](https://pypi.org/project/Pillow/)
### Install Requirements
```bash
pip install playwright pillow
playwright install chromium
```
---
## 🚀 Usage
```bash
python main.py
```
> Optional: Modify `skip_pages` inside `main.py` to start from a later page (e.g., page 3).
---
## 🧠 Features
* Automatically clicks all "Reveal" buttons before screenshotting.
* Navigates through paginated content using the "Next" button.
* Handles potential captchas by pausing and waiting for manual intervention.
* Randomized delay to reduce bot detection.
* Outputs all screenshots in the `examtopics_screenshots` folder.
* Converts images to `output.pdf` using `turn_it_into_pdf.py`.
---
## 🛠️ File Structure
* `main.py` — The core automation script using Playwright.
* `turn_it_into_pdf.py` — Helper module to convert a folder of images into a single PDF.
* `examtopics_screenshots/` — Folder where screenshots are saved (auto-created).
---
## 🔐 Captcha Handling
If a CAPTCHA appears, the script will wait **30 seconds** for manual resolution. You’ll see:
```bash
Captcha detected, please solve it manually.
```
---
## 📝 Notes
* The script uses **Chrome** with `--disable-blink-features=AutomationControlled` to reduce bot detection.
* Make sure you're using it responsibly and ethically, especially considering the source site's terms of service.