An open API service indexing awesome lists of open source software.

https://github.com/lancedb/community-metrics

Dashboard for tracking downloads and stars for Lance format and LanceDB SDK adoption.
https://github.com/lancedb/community-metrics

Last synced: 2 months ago
JSON representation

Dashboard for tracking downloads and stars for Lance format and LanceDB SDK adoption.

Awesome Lists containing this project

README

          

# Community Metrics Dashboard

This repository tracks community metrics for Lance and LanceDB, stores them in LanceDB Enterprise, and renders a read-only dashboard frontend.

Architecture split:
- **Write path**: Python ingestion jobs run on a private host (for example EC2 + cron).
- **Read path**: Next.js dashboard app serves `/api/v1/dashboard/daily` and is deployed to Vercel.

## What This Tracks

- SDK downloads:
- `pylance` (PyPI)
- `lance` (crates.io)
- `lancedb` (PyPI)
- `@lancedb/lancedb` (npm)
- `lancedb` (crates.io)
- GitHub stars:
- `lance-format/lance`
- `lancedb/lancedb`
- `lance-format/lance-graph`
- `lance-format/lance-context`

## Prerequisites

- Python managed with `uv`
- Frontend managed with `npm`
- A running **LanceDB Enterprise** cluster

## Environment

Create `.env` in the repo root (or update existing):

```bash
LANCEDB_API_KEY=...
LANCEDB_HOST_OVERRIDE=https://
LANCEDB_REGION=us-east-1

# Strongly recommended for scheduled ingestion:
GITHUB_TOKEN=...
```

`GITHUB_TOKEN` should stay configured on the machine running scheduled updates.

## LanceDB Storage

Tables:
- `metrics`: metric definitions
- `stats`: daily observations keyed by `(metric_id, period_end)`
- `history`: ingestion run logs

Daily row semantics in `stats`:
- `period_start == period_end`
- routine provenance: `api_daily`
- recompute provenance: `recomputed`
- download `source_window`: `1d`
- star `source_window`: `cumulative_snapshot`

## Ingestion Jobs (EC2 / Private Host)

All writes happen directly through `LanceDBStore`.
No FastAPI/uvicorn runtime is required.

### Clean-Slate Bootstrap

```bash
uv run python -m community_metrics.jobs.bootstrap_tables
uv run python -m community_metrics.jobs.update_all --lookback-days 90
```

### Routine Refresh

```bash
uv run python -m community_metrics.jobs.daily_refresh
```

For ad-hoc correction windows:

```bash
uv run python -m community_metrics.jobs.daily_refresh --lookback-days 7
```

One-time star-history backfill for newly added GitHub repos:

```bash
uv run python -m community_metrics.jobs.update_daily_stars --lookback-days 180
```

### Suggested Cron (EC2)

Run daily at **09:00 UTC**:

```cron
0 9 * * * cd /path/to/community-metrics && /usr/bin/env -S bash -lc 'uv run python -m community_metrics.jobs.daily_refresh >> /var/log/community-metrics/daily_refresh.log 2>&1'
```

## Frontend (Next.js + Vercel)

The dashboard lives in `src/dashboard` and fetches:
- `GET /api/v1/dashboard/daily?days=180`
- Google SSO (restricted to `@lancedb.com` accounts)

### Local frontend dev

```bash
cd src/dashboard
npm install
npm run dev
```

Set these frontend env vars in `src/dashboard/.env.local` (local) or Vercel project settings (deployment):

```bash
GOOGLE_CLIENT_ID=...
GOOGLE_CLIENT_SECRET=...
NEXTAUTH_SECRET=...
NEXTAUTH_URL=http://127.0.0.1:3000
```

Google OAuth app setup must include this callback URI:

```bash
http://127.0.0.1:3000/api/auth/callback/google
```

### Vercel env vars

Set these in the Vercel project:

```bash
LANCEDB_API_KEY=...
LANCEDB_HOST_OVERRIDE=https://
LANCEDB_REGION=us-east-1
GOOGLE_CLIENT_ID=...
GOOGLE_CLIENT_SECRET=...
NEXTAUTH_SECRET=...
NEXTAUTH_URL=https://
```

The route is read-only by code path and only queries bounded dashboard windows.
If/when available, use a dedicated read-scoped key for Vercel.

### Frontend metric semantics

- Download chart points are monthly totals.
- Download card headline values are the last full-month totals.
- Through `2025-11-30`, download points come from seeded discrete snapshots.
- From `2025-12-01` onward, monthly download points are aggregated from daily rows.
- Star charts remain daily cumulative series.
- Total stars combine all tracked GitHub star repos.

## Which Job To Run

| Job | Use this for | Command |
| --- | --- | --- |
| `daily_refresh` | Normal daily updates (scheduled) | `uv run python -m community_metrics.jobs.daily_refresh` |
| `update_all` | Recompute/backfill a full lookback window | `uv run python -m community_metrics.jobs.update_all --lookback-days 90` |
| `bootstrap_tables` | Destructive reset/recreate before rebuild | `uv run python -m community_metrics.jobs.bootstrap_tables` |

## Debug helper

`debug.py` reads LanceDB Enterprise tables directly (no REST API required):

```bash
uv run debug.py metrics
uv run debug.py stats --metric-id downloads:lance:python --days 30
uv run debug.py history --start-date 2026-01-01 --end-date 2026-12-31 --limit 200
uv run debug.py all
```

## Development

Format and lint Python:

```bash
uv run ruff format .
uv run ruff check --fix --select I .
```

Run tests:

```bash
uv run pytest -q
```