{"id":45719472,"url":"https://github.com/suminb/datahub","last_synced_at":"2026-02-25T05:30:20.286Z","repository":{"id":334896443,"uuid":"1143086588","full_name":"suminb/datahub","owner":"suminb","description":"Personal DataHub - simple, yet effective","archived":false,"fork":false,"pushed_at":"2026-02-06T07:51:20.000Z","size":146,"stargazers_count":0,"open_issues_count":1,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-02-06T15:32:06.474Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/suminb.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-01-27T06:49:25.000Z","updated_at":"2026-02-06T07:51:22.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/suminb/datahub","commit_stats":null,"previous_names":["suminb/datahub"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/suminb/datahub","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/suminb%2Fdatahub","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/suminb%2Fdatahub/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/suminb%2Fdatahub/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/suminb%2Fdatahub/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/suminb","download_url":"https://codeload.github.com/suminb/datahub/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/suminb%2Fdatahub/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29811534,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-25T03:30:18.102Z","status":"ssl_error","status_checked_at":"2026-02-25T03:30:17.799Z","response_time":61,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-02-25T05:30:19.642Z","updated_at":"2026-02-25T05:30:20.281Z","avatar_url":"https://github.com/suminb.png","language":"TypeScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"# DataHub\n\nCentralized metadata hub for dataset management across multiple hosts.\n\n## Features\n\n- **Dataset Registry**: Track datasets from multiple sources (Confluence, Jira, Notion, GitHub, Slack, etc.)\n- **Full-Text Search**: PostgreSQL-powered search with relevance ranking\n- **Fuzzy Matching**: Typo-tolerant search using trigram similarity\n- **Next.js**: Single application handling both UI and API\n- **Multi-Host Support**: Track which host stores each dataset\n\n## Architecture\n\n```\n┌─────────────────────────────────────────────────┐\n│                   DataHub                       │\n│                                                 │\n│  ┌──────────────┐     ┌──────────────────────┐  │\n│  │  PostgreSQL  │◄────│  Next.js             │  │\n│  │  (metadata   │     │  (UI + API routes)   │  │\n│  │   + search)  │     │  :3000               │  │\n│  └──────────────┘     └──────────────────────┘  │\n│                               ▲                 │\n└───────────────────────────────┼─────────────────┘\n                                │\n         ┌──────────────────────┼──────────────────────┐\n         │                      │                      │\n    ┌────┴────┐           ┌─────┴─────┐         ┌──────┴──────┐\n    │collector│           │  indexer  │         │other service│\n    └─────────┘           └───────────┘         └─────────────┘\n```\n\n## Quick Start\n\n### Local Development\n\n1. **Start PostgreSQL:**\n\n```bash\ndocker run -d \\\n  --name datahub-postgres \\\n  -e POSTGRES_DB=datahub \\\n  -e POSTGRES_USER=datahub \\\n  -e POSTGRES_PASSWORD=datahub \\\n  -p 5432:5432 \\\n  postgres:16-alpine\n```\n\n2. **Install dependencies and run migrations:**\n\n```bash\ncd datahub\nnpm install\nDATABASE_URL=postgresql://datahub:datahub@localhost:5432/datahub npm run db:migrate\n```\n\n3. **Start the app:**\n\n```bash\nDATABASE_URL=postgresql://datahub:datahub@localhost:5432/datahub npm run dev\n```\n\n4. Open http://localhost:3000\n\n### Kubernetes Deployment\n\n1. **Create secrets** (edit `k8s/secrets.yaml` first!):\n\n```bash\nkubectl apply -f k8s/secrets.yaml\n```\n\n2. **Deploy PostgreSQL:**\n\n```bash\nkubectl apply -f k8s/postgres.yaml\n```\n\n3. **Run migrations** (one-time, from a pod with DB access):\n\n```bash\nDATABASE_URL=postgresql://... npm run db:migrate\n```\n\n4. **Build and deploy:**\n\n```bash\ncd datahub\ndocker build -t your-registry/datahub:latest .\ndocker push your-registry/datahub:latest\nkubectl apply -f k8s/deployment.yaml\nkubectl apply -f k8s/ingress.yaml\n```\n\n## Project Structure\n\n```\ndatahub/\n├── k8s/\n│   ├── postgres.yaml        # PostgreSQL StatefulSet\n│   ├── secrets.yaml         # Credentials template\n│   ├── deployment.yaml      # App Deployment\n│   └── ingress.yaml         # Ingress routing\n├── package.json\n├── Dockerfile\n├── scripts/\n│   └── migrate.mjs          # Database migrations\n└── src/\n    ├── app/\n    │   ├── api/             # Next.js API routes\n    │   │   ├── datasets/\n    │   │   └── health/\n    │   ├── datasets/[id]/\n    │   └── page.tsx         # Dashboard\n    ├── components/\n    └── lib/\n        ├── api.ts           # Client-side API helpers\n        └── db.ts            # PostgreSQL connection\n```\n\n\n## API Reference\n\n### Authentication\n\nAll API endpoints require authentication using an API key. Include your API key in the `X-DataHub-API-Key` header:\n\n```bash\ncurl -H \"X-DataHub-API-Key: dh_your_api_key_here\" http://localhost:3000/api/datasets\n```\n\n**For Testing Only:** You can temporarily disable API key verification by setting the `DISABLE_API_KEY_AUTH` environment variable:\n\n```bash\n# Disable authentication for local testing\nDISABLE_API_KEY_AUTH=true npm run dev\n\n# Or for a single curl request\ncurl http://localhost:3000/api/datasets  # No API key needed when disabled\n```\n\n⚠️ **Warning:** Never use `DISABLE_API_KEY_AUTH=true` in production environments. This flag is intended only for local testing and development.\n\n**Managing API Keys:**\n\n```bash\n# Issue a new API key\nnpm run apikey:issue \u003ckey-name\u003e\n\n# List all API keys\nnpm run apikey:list\n\n# Revoke an API key (marks as inactive)\nnpm run apikey:revoke \u003ckey-name-or-id\u003e\n\n# Delete an API key permanently\nnpm run apikey:delete \u003ckey-name-or-id\u003e\n```\n\n### Interactive API Documentation\n\nFor a complete, interactive API documentation experience, visit:\n\n**http://localhost:3000/api-docs** (when running locally)\n\nThe interactive documentation is powered by Swagger UI and provides:\n- ✅ Complete endpoint descriptions with request/response schemas\n- ✅ Try-it-out functionality to test APIs directly from the browser\n- ✅ Example requests and responses\n- ✅ Parameter validation and type information\n- ✅ OpenAPI 3.0 specification available at `/api/openapi.json`\n\n### Quick Reference\n\n#### Endpoints\n\n| Method   | Endpoint                     | Description                   |\n| -------- | ---------------------------- | ----------------------------- |\n| `GET`    | `/api/datasets`              | List all datasets (paginated) |\n| `POST`   | `/api/datasets`              | Create a new dataset          |\n| `GET`    | `/api/datasets/{id}`         | Get a specific dataset        |\n| `PATCH`  | `/api/datasets/{id}`         | Update a dataset              |\n| `DELETE` | `/api/datasets/{id}`         | Delete a dataset              |\n| `GET`    | `/api/datasets/search?q=...` | Search datasets               |\n| `GET`    | `/api/datasets/stats`        | Get aggregate statistics      |\n| `GET`    | `/api/health`                | Health check                  |\n\n### Search Parameters\n\n| Parameter     | Type   | Description                              |\n| ------------- | ------ | ---------------------------------------- |\n| `q`           | string | Search query (required)                  |\n| `source_type` | string | Filter by source type                    |\n| `status`      | string | Filter by status                         |\n| `owner`       | string | Filter by owner                          |\n| `tags`        | string | Comma-separated tags                     |\n| `fuzzy`       | bool   | Enable fuzzy matching (default: false)   |\n| `limit`       | int    | Results per page (default: 20, max: 100) |\n| `offset`      | int    | Pagination offset                        |\n\n### Example: Create a Dataset\n\n```bash\ncurl -X POST http://localhost:3000/api/datasets \\\n  -H \"Content-Type: application/json\" \\\n  -H \"X-DataHub-API-Key: dh_your_api_key_here\" \\\n  -d '{\n    \"name\": \"confluence-engineering-docs\",\n    \"source_type\": \"confluence\",\n    \"storage_backend\": \"s3\",\n    \"storage_path\": \"s3://datasets/confluence/engineering-2024\",\n    \"host\": \"collector-01.local\",\n    \"owner\": \"data-team\",\n    \"tags\": [\"documentation\", \"engineering\"],\n    \"description\": \"Engineering documentation from Confluence\"\n  }'\n```\n\n### Example: Search\n\n```bash\n# Full-text search\ncurl -H \"X-DataHub-API-Key: dh_your_api_key_here\" \\\n  \"http://localhost:3000/api/datasets/search?q=engineering+docs\"\n\n# Fuzzy search (typo-tolerant)\ncurl -H \"X-DataHub-API-Key: dh_your_api_key_here\" \\\n  \"http://localhost:3000/api/datasets/search?q=enginering\u0026fuzzy=true\"\n```\n\n## Testing\n\nThe project has comprehensive test coverage with Jest and React Testing Library.\n\n### Run Tests\n\n```bash\n# Run all tests\nnpm test\n\n# Run with coverage report\nnpm test -- --coverage\n\n# Run in watch mode\nnpm run test:watch\n```\n\n### Test Coverage\n\n- ✅ **81 passing tests** across 11 test suites\n- ✅ **94-100% coverage** on API routes\n- ✅ **87-100% coverage** on UI components\n- ✅ Full coverage on utility functions including authentication\n\nSee [TESTING.md](./TESTING.md) for detailed testing documentation.\n\n## CI/CD\n\n### Continuous Integration\n\nGitHub Actions automatically runs on every push and pull request:\n\n- ✅ Test suite on Node.js 18.x and 20.x\n- ✅ ESLint and code formatting checks\n- ✅ TypeScript type checking\n- ✅ Production build verification\n- ✅ Docker image build (main branch)\n\n### Local Pre-Push Checks\n\nRun these commands to match CI checks before pushing:\n\n```bash\nnpm test -- --coverage    # Tests\nnpm run lint              # Linting\nnpm run format -- --check # Formatting\nnpm run type-check        # Types\nnpm run build             # Build\n```\n\n## Configuration\n\n| Variable                 | Description                                      |\n| ------------------------ | ------------------------------------------------ |\n| `DATABASE_URL`           | PostgreSQL connection string                     |\n| `DISABLE_API_KEY_AUTH`   | (Testing only) Set to \"true\" to disable API key verification. Only works in non-production environments. Never use in production! |\n\n## License\n\nMIT\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsuminb%2Fdatahub","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsuminb%2Fdatahub","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsuminb%2Fdatahub/lists"}