https://github.com/systemslibrarian/pressreader-news-scraper
A lightweight, extensible Python notebook that demonstrates how to fetch and store topic-based articles from the PressReader API using SQLite. Ideal for research, journalism, or data analysis workflows.
https://github.com/systemslibrarian/pressreader-news-scraper
python-news-scraper-sqlite-pressreader
Last synced: 1 day ago
JSON representation
A lightweight, extensible Python notebook that demonstrates how to fetch and store topic-based articles from the PressReader API using SQLite. Ideal for research, journalism, or data analysis workflows.
- Host: GitHub
- URL: https://github.com/systemslibrarian/pressreader-news-scraper
- Owner: systemslibrarian
- License: mit
- Created: 2025-07-28T21:45:19.000Z (10 months ago)
- Default Branch: main
- Last Pushed: 2025-08-11T01:37:48.000Z (10 months ago)
- Last Synced: 2025-08-11T03:20:04.884Z (10 months ago)
- Topics: python-news-scraper-sqlite-pressreader
- Language: Jupyter Notebook
- Homepage:
- Size: 85 KB
- Stars: 1
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# π° PressReader API to SQLite
**Demo Project: Coffee Article Collector**
> π **Note:** You must set your own API key before running this notebook.
[](https://colab.research.google.com/github/systemslibrarian/pressreader-news-scraper/blob/main/pressreader_api_to_sqlite.ipynb)
This notebook demonstrates how to query the [PressReader API](https://www.pressreader.com/) and store results in a **SQLite** database.
It uses **coffee** as a sample keyword but can be adapted for any topic or term.
Itβs ideal for **researchers, librarians, and hobbyists** looking to explore topic-based news and publications.
---
## π Features
- π Search the PressReader API for any keyword (default: `coffee`)
- π§ Save article metadata (title, description, source, date, URL) to SQLite
- π Prevent duplicate entries using article ID as the primary key
- π Output results in easy-to-read Markdown format
- π Secure API key handling via environment variable
---
## π¦ Requirements
Install required dependencies:
```bash
pip install requests pandas python-dotenv
```
> `sqlite3` is built into Python, so no extra install is needed.
---
## π API Key Setup
This notebook uses the `PRESSREADER_API_KEY` environment variable.
**In Google Colab**:
```python
import os
os.environ['PRESSREADER_API_KEY'] = 'your-api-key-here'
```
**Or using Colab Secrets**:
```python
from google.colab import userdata
userdata.set_secret('PRESSREADER_API_KEY')
```
---
## βΆοΈ How to Run
1. **Open in Colab** β Click the badge at the top of this README.
2. **Install dependencies** β Run the first cell.
3. **Set your API key** β Add it via environment variable or Colab secrets.
4. **Run all cells** β Fetch results, save to SQLite, and display Markdown summaries.
---
## π Example Output
```markdown
## β Coffee and Culture
**Publication**: Global Coffee Times
**Date**: 2025-07-28
**Description**: Exploring how coffee influences social rituals across continents.
**URL**: [Read More](https://www.pressreader.com/article/12345678)
## β Sustainable Coffee Farming
**Publication**: Eco Agri News
**Date**: 2025-07-25
**Description**: A closer look at regenerative practices in coffee production.
**URL**: [Read More](https://www.pressreader.com/article/98765432)
```
---
## π Project Files
- `pressreader_api_to_sqlite.ipynb` β Main Colab notebook
- `README.md` β Project documentation
- `pressreader_coffee_results.db` β Created at runtime (not stored in repo)
---
## π§° How It Works
1. Sends a POST request to the PressReader Discovery API.
2. Parses JSON results into structured article metadata.
3. Saves each entry in a SQLite DB, skipping duplicates.
4. Displays formatted Markdown summaries of the most recent articles.
---
## π Troubleshooting
- **Invalid API Key** β Double-check `PRESSREADER_API_KEY` value.
- **Empty Results** β Adjust your search keyword/date range.
- **Rate Limits** β Wait and retry after a few minutes.
---
## β οΈ Disclaimer
- For educational/research purposes only.
- Respect PressReaderβs terms of service and API limits.
---
## β¨ License
[MIT License](LICENSE)
---
**Created by [Paul Clark](https://github.com/systemslibrarian)**
Empowering Libraries Through Data and Innovation πβοΈ