{"id":50792766,"url":"https://github.com/ical10/recall-ai","last_synced_at":"2026-06-12T12:02:24.380Z","repository":{"id":360103025,"uuid":"1231521712","full_name":"ical10/recall-ai","owner":"ical10","description":"A spaced-repetition vocabulary trainer with LLM-generated vocabs for young ESL learners","archived":false,"fork":false,"pushed_at":"2026-05-25T01:33:04.000Z","size":444,"stargazers_count":0,"open_issues_count":19,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-05-25T03:23:19.521Z","etag":null,"topics":["alembic","celery-beat","celery-worker","fastapi","postgresql","pydantic","python","redis","uvicorn"],"latest_commit_sha":null,"homepage":"https://recallai.up.railway.app","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ical10.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":".github/CODEOWNERS","security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-05-07T03:23:51.000Z","updated_at":"2026-05-19T06:40:54.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/ical10/recall-ai","commit_stats":null,"previous_names":["ical10/recall-ai"],"tags_count":2,"template":false,"template_full_name":null,"purl":"pkg:github/ical10/recall-ai","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ical10%2Frecall-ai","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ical10%2Frecall-ai/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ical10%2Frecall-ai/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ical10%2Frecall-ai/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ical10","download_url":"https://codeload.github.com/ical10/recall-ai/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ical10%2Frecall-ai/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34243053,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-12T02:00:06.859Z","response_time":109,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["alembic","celery-beat","celery-worker","fastapi","postgresql","pydantic","python","redis","uvicorn"],"created_at":"2026-06-12T12:01:36.983Z","updated_at":"2026-06-12T12:02:19.463Z","avatar_url":"https://github.com/ical10.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# RecallAI\n\nA spaced-repetition vocabulary trainer with nightly LLM-generated content, built for English-as-a-Second-Language learners.\n\n\u003e Five minutes a day. The words stick — because you caught them right before you forgot.\n\nRecallAI takes the forgetting curve seriously. Instead of shipping a static deck and hoping users grind through it, the app **generates a fresh, personalised batch of vocabulary every night** (kid-safe sentences, age-appropriate definitions, topics that match the learner's interests) and schedules every word using the classic **SM-2** algorithm — the same engine Anki has shipped for two decades. Easy words drift weeks into the future; hard words come back tomorrow.\n\n---\n\n## Why this exists\n\nTwo things have to be true for vocabulary to actually stick:\n\n1. **The content has to be worth learning** — relevant, age-appropriate, and tied to topics the learner cares about.\n2. **The review timing has to match the forgetting curve** — words seen too often waste attention; words seen too rarely fade.\n\nGeneric deck apps nail the timing but leave curation to the learner. Most kids (and most parents) don't curate. RecallAI takes both jobs off the table: **an LLM generates a personalised batch of cards every night, and SM-2 decides exactly when each card resurfaces.** The learner just shows up.\n\n---\n\n## Architecture\n\n```\n┌──────────┐                                        ┌──────────────┐\n│   User   │                                        │ LLM Provider │\n│ (browser)│                                        │  (external)  │\n└────┬─────┘                                        └──────▲───────┘\n     │ HTTPS                                               │\n     │ OAuth + review UI                                   │ POST /chat\n     ▼                                                     │ completions\n┌─────────────────┐   ┌──────────────────┐   ┌─────────────┴──────┐\n│   Web service   │   │   Beat service   │   │   Worker service   │\n│  (recall-ai)    │   │     replicas=1   │   │                    │\n│  FastAPI+uvicorn│   │  Celery scheduler│   │   Celery worker    │\n│  Tailwind+htmx  │   │ 18:00 + 19:00 UTC│   │  consumes tasks    │\n└────┬────────────┘   └────────┬─────────┘   └─────┬─────────▲────┘\n     │                         │ enqueues          │ persists│ pulls\n     │ session +               │ scheduled tasks   │ vocab + │ tasks\n     │ user CRUD               ▼                   │ reviews │\n     │           ┌───────────────────────────┐     │         │\n     │           │       Redis (addon)       │◄────┘         │\n     │           │   broker + result backend │───────────────┘\n     │           └───────────────────────────┘\n     │                         ▲\n     │ alembic migrations      │ (worker does NOT pub here;\n     │ (preDeployCommand)      │  it only consumes)\n     ▼                         │\n┌─────────────────────────────────────────────────┐\n│              Postgres (addon)                   │\n│  users · vocab_items · reviews · interest_tags  │\n└─────────────────────────────────────────────────┘\n                       ▲\n                       │ persists vocab generation results\n                       └─── (from Worker)\n```\n\n### How the three services collaborate\n\n| Service    | Process              | Responsibility                                          | Talks to                 |\n| ---------- | -------------------- | ------------------------------------------------------- | ------------------------ |\n| **web**    | `uvicorn` (async)    | OAuth, review UI, dashboard, settings, HTMX partials    | Postgres                 |\n| **beat**   | Celery beat (cron)   | Enqueues nightly content-generation tasks (UTC 18 \u0026 19) | Redis                    |\n| **worker** | Celery worker (sync) | Pulls tasks, calls LLM, validates, persists vocab       | Redis, Postgres, LLM API |\n\nKey design choices baked into the topology:\n\n- **Web never calls the LLM.** Request-path latency stays bounded; LLM hiccups can't 500 the dashboard.\n- **Beat has `replicas=1`** in production. Duplicate beats = duplicate enqueues = duplicate API spend.\n- **Worker is the only writer of generated content.** All LLM output flows through a Pydantic v2 validator before it touches Postgres.\n- **Alembic runs as a Railway `preDeployCommand` on the web service only.** Worker and beat reuse the migrated schema; they never race the migrator.\n\n---\n\n## The learning loop\n\n```\n   18:00 UTC                                           Next morning\n       │                                                    │\n       ▼                                                    ▼\n┌──────────────┐    enqueue     ┌──────────┐  pull  ┌────────────┐\n│  Celery beat │───────────────►│  Redis   │───────►│   Worker   │\n└──────────────┘                └──────────┘        └─────┬──────┘\n                                                          │\n                                       LLM call (timeout, ▼ retry, log tokens)\n                                          ┌────────────────────────┐\n                                          │      LLM Provider      │\n                                          └─────────┬──────────────┘\n                                                    │ raw JSON\n                                                    ▼\n                                          ┌────────────────────────┐\n                                          │  Pydantic v2 validator │\n                                          │  (shape + semantic +   │\n                                          │   length + safety)     │\n                                          └─────────┬──────────────┘\n                                                    │ pass / refine / fallback\n                                                    ▼\n                                          ┌────────────────────────┐\n                                          │ Postgres: vocab_items  │\n                                          │            + reviews   │\n                                          └─────────┬──────────────┘\n                                                    │\n                          User opens /review        ▼\n                                          ┌────────────────────────┐\n                                          │   Show card → rate     │\n                                          │   SM-2 picks next date │\n                                          └────────────────────────┘\n```\n\nEvery LLM call is wrapped in:\n\n- A **hard timeout** (no retries-from-hell on a hanging upstream).\n- A **structured cost log** (tokens in, tokens out, model, latency).\n- A **retry-with-prompt-refinement loop**: on validation failure, refine the prompt with the failed constraint and retry up to 3 times before falling back to a curated default.\n- **Idempotency markers** so a retried Celery task can't double-enqueue spend.\n\n---\n\n## Tech stack\n\n| Layer         | Choice                                             | Rationale                                                   |\n| ------------- | -------------------------------------------------- | ----------------------------------------------------------- |\n| Language      | Python 3.11                                        | Mature async, modern typing                                 |\n| Web framework | FastAPI                                            | Async-native, Pydantic-integrated                           |\n| Frontend      | Jinja2 + HTMX + Tailwind                           | Server-rendered, no SPA complexity for a server-driven UI   |\n| ORM           | SQLAlchemy 2.0 (async) with typed `Mapped[]`       | Write-time typing catches model bugs                        |\n| Validation    | Pydantic v2 (strict)                               | The LLM-output safety boundary                              |\n| Database      | Postgres 16                                        | Standard; Railway addon in prod, Docker in dev              |\n| Queue / cache | Redis 7                                            | Celery broker + result backend                              |\n| Async work    | Celery 5 (worker + beat)                           | Cron scheduling for nightly content                         |\n| LLM access    | OpenAI Python SDK against an OpenAI-compatible API | Swap providers via three env vars; no code change required  |\n| Auth          | Google OAuth                                       | No password storage, no liability                           |\n| Spacing       | SM-2                                               | Simple, well-understood, sufficient for the data scale      |\n| Migrations    | Alembic                                            | One migration per logical change, never edited post-deploy  |\n| Lint / types  | Ruff + mypy strict                                 | Both must pass before commit                                |\n| Tests         | pytest                                             | Happy-path + validation-failure per Pydantic schema         |\n| Monorepo      | pnpm workspaces + Turborepo                        | One repo, three Railway services                            |\n| Hosting       | Railway (web + worker + beat + Postgres + Redis)   | One project, per-service `railway.*.json` configs           |\n\n---\n\n## Prerequisites\n\n- [Docker](https://docs.docker.com/get-docker/)\n- [uv](https://docs.astral.sh/uv/)\n- [pnpm](https://pnpm.io/installation) (v9+)\n\n## Setup\n\n```bash\n# 1. Copy and fill in your credentials\ncp .env.example .env\n# Edit .env — fill LLM_API_KEY, LLM_BASE_URL, LLM_MODEL,\n#   GOOGLE_CLIENT_ID, GOOGLE_CLIENT_SECRET, SECRET_KEY.\n\n# 2. Install dependencies\npnpm install --frozen-lockfile\nuv sync --frozen\n\n# 3. Run migrations and seed sample vocabulary\n./scripts/dev_reset.sh\n```\n\n(Postgres and Redis start automatically — `pnpm dev`, `pnpm worker`, and `pnpm beat` all run `docker compose up -d` before their main process.)\n\n## Running\n\n```bash\npnpm dev      # web server → http://127.0.0.1:8000\npnpm worker   # Celery worker (LLM content generation)\npnpm beat     # Celery beat (daily scheduling)\n```\n\n## Run the full stack in Docker\n\nEnd-to-end smoke test of the deploy artifact (Postgres + Redis + the web image Railway will run):\n\n```bash\ncp .env.example .env   # set GOOGLE_CLIENT_*, LLM_*, SECRET_KEY\ndocker compose up --build\n```\n\n`pnpm dev` is still the recommended dev loop — it gives you Tailwind + uvicorn hot reload. The Docker target is for verifying the production-shaped image locally.\n\n## Reset database\n\nWipe everything and re-seed from scratch:\n\n```bash\ndocker compose exec postgres psql -U user -d postgres \\\n  -c \"DROP DATABASE recallai;\"\ndocker compose exec postgres psql -U user -d postgres \\\n  -c \"CREATE DATABASE recallai;\"\n./scripts/dev_reset.sh\n```\n\n## Teardown\n\n```bash\ndocker compose down      # stop, keep data\ndocker compose down -v   # stop, wipe data\n```\n\n---\n\n## Project layout\n\n```\napps/api/\n  app/\n    api/         route handlers (async)\n    core/        config, db engine, celery app, logging\n    models/      SQLAlchemy 2.0 ORM (users, vocab_items, reviews)\n    schemas/     Pydantic v2 (request, response, LLM-output contracts)\n    services/    business logic (sm2, selection, enrichment, llm, stats)\n    workers/     Celery tasks (sync only — Celery 5 constraint)\n  templates/     Jinja2 (pages/ + partials/)\n  static/        Tailwind output + minimal JS\n  alembic/       migrations\n  tests/         mirrors app/ structure\npackages/shared/ shared enums + constants\n.github/         CI (ruff + mypy + pytest + alembic round-trip + gitleaks)\nrailway.*.json   per-service deploy config (web / worker / beat)\nrailpack.*.json  per-service build config\n```\n\n---\n\n## Roadmap\n\nThings on the radar, not promises:\n\n- **FSRS upgrade.** SM-2 is great for a POC, but FSRS adapts per-card difficulty better once there's enough review data to justify it.\n- **Audio cards.** TTS for pronunciation; STT for spoken-answer rating. The recall side, not the recognition side.\n- **Image associations.** Auto-pair generated examples with safe stock imagery — visual memory anchor for ages 5–12.\n- **PWA / offline review.** Cache the next 50 due cards client-side so a kid can review on a tablet without a connection.\n- **Parent dashboard.** Weekly progress summary email, retention curves, words mastered.\n- **Multi-language support.** Today: English. Same SM-2 + LLM pipeline applies to any vocabulary target.\n- **Cost dashboard.** Per-user, per-day LLM spend with hard caps; alerts if a single user blows past a threshold.\n- **A/B-able prompt registry.** Version prompts in the DB instead of code so quality regressions can be diffed.\n\n---\n\n## Assumptions and limitations\n\nWorth being upfront about:\n\n- **Target audience is narrow.** Designed for ESL learners aged ~5–12 (the Novakid demographic). Generated content, tags, and prompts are explicitly kid-safe and age-tuned. It is not a TOEFL prep app.\n- **English-only at the moment.** The pipeline isn't language-locked, but the prompt library and content-safety checks are.\n- **The operator pays the LLM bill.** Every nightly batch is a real API call against whichever provider `LLM_BASE_URL` points at; whoever deploys the app eats that cost. The retry-with-refinement loop, capped `max_tokens`, and idempotency markers keep spend bounded, but a misconfigured prompt can still burn tokens before the cap kicks in.\n- **Provider compatibility is broad in theory, narrow in practice.** Any OpenAI-API-compatible endpoint should work, swapped via the three `LLM_*` env vars. In practice only **OpenRouter** (free + paid tiers) and **OpenCode Go** have been smoke-tested end-to-end. Other compatible providers (Groq, Together AI, direct OpenAI, self-hosted vLLM) should work but haven't been verified.\n- **SM-2, not FSRS.** Chosen for simplicity and explainability. Optimal review timing is sacrificed for predictability — fine for the data scale, not optimal for it.\n- **HTMX over SPA.** Means most interactions are full-page round trips (cheap, but visible on slow links). Acceptable for the review-card flow; would not scale to a complex multi-pane UI.\n\n---\n\n## Disclaimer\n\nRecallAI is a **personal project**. It is not affiliated with, endorsed by, or sponsored by any third-party platform, LLM provider, or tool referenced in this document.\n\nLLM-generated content can contain factual errors, awkward phrasing, or culturally insensitive outputs even with validation in place. The retry-with-refinement loop reduces this risk; it does not eliminate it. Treat all generated material as draft-quality educational content, not authoritative reference material. If you deploy this for real learners, **review the generated corpus periodically** and keep a kill-switch on the nightly job.\n\nThe codebase is provided as-is; see [LICENSE](./LICENSE).\n\n---\n\n## License\n\nMIT — see [LICENSE](./LICENSE).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fical10%2Frecall-ai","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fical10%2Frecall-ai","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fical10%2Frecall-ai/lists"}