https://github.com/victoriacheng15/personal-reading-analytics
Zero-infrastructure reading analytics pipeline. Automated data pipeline via GitHub Actions with MongoDB event sourcing for observability and interactive visualizations.
https://github.com/victoriacheng15/personal-reading-analytics
docker github-actions-ci go google-sheets-api python
Last synced: 5 months ago
JSON representation
Zero-infrastructure reading analytics pipeline. Automated data pipeline via GitHub Actions with MongoDB event sourcing for observability and interactive visualizations.
- Host: GitHub
- URL: https://github.com/victoriacheng15/personal-reading-analytics
- Owner: victoriacheng15
- License: mit
- Created: 2024-02-04T20:28:44.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2026-01-23T20:52:50.000Z (5 months ago)
- Last Synced: 2026-01-24T09:43:51.893Z (5 months ago)
- Topics: docker, github-actions-ci, go, google-sheets-api, python
- Language: Go
- Homepage: https://victoriacheng15.github.io/personal-reading-analytics/
- Size: 580 KB
- Stars: 2
- Watchers: 1
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
- Agents: AGENTS.md
Awesome Lists containing this project
README
# π Personal Reading Analytics
A self-built, fully automated data pipeline with CI/CD governance, leveraging GitHub Actions and MongoDB event sourcing to transform raw reading data into actionable insights and interactive visualizations with zero infrastructure.
Beyond standard charts, it performs an **AI Delta Analysis** to generate a qualitative weekly narrative, answering three specific questions:
- **Velocity**: Are you reading faster or slower than usual?
- **Backlog Health**: Are you clearing old debt (>1 year) or just adding new noise?
- **Chronology**: Which specific years of content are you focusing on right now?
---
## π Live Anayltics
π [See Live Anayltics](https://victoriacheng15.github.io/personal-reading-analytics/)
---
## πΏ Engineering Principles
This project is built to reflect how I believe small, personal tools should work:
- **Zero infrastructure** β No servers or hosting costs. Runs entirely on GitHub (Actions + Pages).
- **Fully automated** β Scheduled GitHub Actions keep data fresh, utilizing CI/CD governance for human-in-the-loop code review and merging.
- **Observability first** β Uses an Event Sourcing pattern (MongoDB) to decouple extraction from analytics, ensuring full auditability and health monitoring.
- **Cost-effective** β Uses only free tiers (GitHub, Google Sheets API, MongoDB Atlas), proving powerful automation doesnβt require budget.
---
## π Architecture & Documentation
Unlike typical "script-based" scrapers, this system is architected for scale and maintenance using an **Event Sourcing** pattern.
- **Ingestion**: Python scripts harvest content and emit standardized events to a MongoDB immutable log.
- **Observability**: An external hub consumes these events to populate Grafana dashboards for system health monitoring.
- **Visualization**: Go binaries process the event stream, generate AI-powered summaries for historical context, and produce the static site.
### π System Observability
To demonstrate operational maturity, I maintain a public **[Observability Hub](https://victoriacheng15.github.io/observability-hub/snapsots.html)**.
This separate dashboard visualizes the "health" of this pipeline (ETL status, error rates, latencies) without requiring Grafana authentication.
For deep technical details, architectural diagrams, and operational guides, please visit the **[Documentation](docs/README.md)**.
---
## π Tech Stacks







---
## π What It Shows
**Key Metrics Section:**
- **Total articles**: Tracking total articles across currently supported sources
- **Read rate**: Percentage of articles completed with visual highlighting
- **AI Delta Analysis**: Multi-dimensional analysis of reading **Velocity** (pace), **Backlog Health** (clearing old debt vs. new noise), and **Chronology** (era of content focus) to provide narrative context beyond raw numbers.
- **Reading statistics**: Read count, unread count, and average articles per month
- **Highlight badges**: Top read rate source, most unread source, current month's read articles
**7 Interactive Visualizations (Chart.js):**
1. **Year Breakdown**: Bar chart showing article distribution by publication year
2. **Read/Unread by Year**: Stacked bar chart with reading progress across years
3. **Monthly Breakdown**: Toggle between total articles (line chart) and by-source distribution (stacked bar)
4. **Read/Unread by Month**: Seasonal reading patterns across all months
5. **Read/Unread by Source**: Horizontal stacked bars comparing progress per provider
6. **Unread Age Distribution**: Age buckets (<1 month, 1-3 months, 3-6 months, 6-12 months, >1 year)
7. **Unread by Year**: Identifies which years have the most unread backlog
**Source Analytics:**
- Per-source statistics with read/unread split and read percentages
- Substack per-author average calculation (total articles Γ· author count)
- Top 3 oldest unread articles with clickable links, dates, and age calculations
- Source metadata showing when each provider was added to tracking
### Supported Sources
Currently extracting articles from:
- freeCodeCamp
- Substack
- GitHub (Added 2024-03-18)
- Shopify (Added 2025-03-05)
- Stripe (Added 2025-11-19)
---
## π How This Project Evolved
Learn about the journey of this project: from local-only execution, to Docker containerization, to automated GitHub Actions workflows.
- [Part 1: From Pi to Cloud Automation](https://victoriacheng15.vercel.app/blog/from-pi-to-cloud-automation)
- [Part 2: From Links to Reading insights](https://victoriacheng15.vercel.app/blog/from-links-to-reading-insights)
- [Part 3: From Metrics to Milestones](https://mehub-git-fix-rss-escaping-victoriacheng15s-projects.vercel.app/blog/from-metrics-to-milestones)
---
## π Ready to Explore?
Don't just take my word for it, interact with the real data.
π **[Launch Personal Reading Analytics](https://victoriacheng15.github.io/personal-reading-analytics/)**