{"id":31648908,"url":"https://github.com/gigapi/metadata","last_synced_at":"2026-05-14T12:34:08.913Z","repository":{"id":294745702,"uuid":"987932383","full_name":"gigapi/metadata","owner":"gigapi","description":"GigAPI Unified Metadata Engine","archived":false,"fork":false,"pushed_at":"2025-07-23T10:01:45.000Z","size":112,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-10-07T07:06:53.268Z","etag":null,"topics":["catalog","gigapi","metadata","parquet-catalog","redis"],"latest_commit_sha":null,"homepage":"https://gigapipe.com","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/gigapi.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-05-21T20:01:56.000Z","updated_at":"2025-07-23T10:01:22.000Z","dependencies_parsed_at":"2025-06-10T17:33:07.196Z","dependency_job_id":null,"html_url":"https://github.com/gigapi/metadata","commit_stats":null,"previous_names":["gigapi/metadata"],"tags_count":7,"template":false,"template_full_name":null,"purl":"pkg:github/gigapi/metadata","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gigapi%2Fmetadata","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gigapi%2Fmetadata/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gigapi%2Fmetadata/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gigapi%2Fmetadata/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/gigapi","download_url":"https://codeload.github.com/gigapi/metadata/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gigapi%2Fmetadata/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33025112,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-13T13:14:54.681Z","status":"online","status_checked_at":"2026-05-14T02:00:06.663Z","response_time":57,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["catalog","gigapi","metadata","parquet-catalog","redis"],"created_at":"2025-10-07T07:01:45.455Z","updated_at":"2026-05-14T12:34:08.893Z","avatar_url":"https://github.com/gigapi.png","language":"Go","funding_links":[],"categories":[],"sub_categories":[],"readme":"# \u003cimg src=\"https://github.com/user-attachments/assets/5b0a4a37-ecab-4ca6-b955-1a2bbccad0b4\" /\u003e\n\n# \u003cimg src=\"https://github.com/user-attachments/assets/74a1fa93-5e7e-476d-93cb-be565eca4a59\" height=25 /\u003e GigAPI Metadata Engine\n\n[![Metadata Redis CI](https://github.com/gigapi/metadata/actions/workflows/redis-ci.yml/badge.svg)](https://github.com/gigapi/metadata/actions/workflows/redis-ci.yml)\n\nGigapi Metadata provides a high-performance indexing system for managing metadata about data files (typically Parquet files) organized in time-partitioned structures. It supports efficient querying, merging operations, and provides both local JSON file storage and distributed Redis storage backends.\n\n## Features\n\n- **Dual Storage Backends**: JSON file-based storage for local deployments and Redis for distributed systems\n- **Time-Partitioned Data**: Optimized for date/hour partitioned data structures [1](#0-0) \n- **Merge Planning**: Intelligent merge planning for data consolidation across different layers [2](#0-1) \n- **Async Operations**: Promise-based asynchronous operations for better performance [3](#0-2) \n- **Efficient Querying**: Time-range and folder-based querying capabilities [4](#0-3) \n\n## Installation\n\n```bash\ngo get github.com/gigapi/metadata\n```\n\n## Core Concepts\n\n### IndexEntry\n\nThe fundamental data structure representing metadata about a single data file: [5](#0-4) \n\n### Storage Backends\n\n#### JSON Index\nFor local file-based storage, suitable for single-node deployments: [6](#0-5) \n\n#### Redis Index  \nFor distributed deployments with Redis backend: [7](#0-6) \n\n## Configuration\n\n### Merge Configurations\n\nBefore using the library, initialize merge configurations which define merge behavior across different iterations: [8](#0-7) \n\nExample configuration:\n```go\nimport \"github.com/gigapi/metadata\"\n\n// Configure merge settings: [timeout_sec, max_size_bytes, iteration_id]\nmetadata.MergeConfigurations = [][3]int64{\n    {10, 10 * 1024 * 1024, 1}, // 10s timeout, 10MB max size, iteration 1\n    {30, 50 * 1024 * 1024, 2}, // 30s timeout, 50MB max size, iteration 2\n}\n```\n\n## Usage Examples\n\n### Basic JSON Index Usage\n\n```go\n// Create a JSON-based table index\ntableIndex := metadata.NewJSONIndex(\"/data/root\", \"my_database\", \"my_table\")\n\n// Add metadata entries\nentries := []*metadata.IndexEntry{\n    {\n        Database:  \"my_database\",\n        Table:     \"my_table\", \n        Path:      \"date=2024-01-15/hour=14/file1.parquet\",\n        SizeBytes: 1000000,\n        MinTime:   1705327200000000000, // nanoseconds\n        MaxTime:   1705327800000000000,\n    },\n}\n\n// Batch operation (async)\npromise := tableIndex.Batch(entries, nil)\nresult, err := promise.Get()\n```\n\n### Redis Index Usage [9](#0-8) \n\n### Querying Data\n\n```go\n// Query with time range\noptions := metadata.QueryOptions{\n    After:  time.Now().Add(-24 * time.Hour),\n    Before: time.Now(),\n}\n\nentries, err := tableIndex.GetQuerier().Query(options)\n```\n\n### Merge Operations\n\n```go\n// Get merge plan\nplanner := tableIndex.GetMergePlanner()\nplan, err := planner.GetMergePlan(\"layer1\", 1)\n\nif plan != nil {\n    // Execute merge (external process)\n    // ...\n    \n    // Mark merge as complete\n    err = planner.EndMerge(plan)\n}\n```\n\n## Interfaces\n\n### TableIndex Interface\n\nThe main interface for table-level operations: [10](#0-9) \n\n### DBIndex Interface  \n\nFor database-level operations: [11](#0-10) \n\n## Data Organization\n\nThe system expects data organized in the following structure:\n```\n/root/\n  ├── database1/\n  │   ├── table1/\n  │   │   ├── date=2024-01-15/\n  │   │   │   ├── hour=00/\n  │   │   │   ├── hour=01/\n  │   │   │   └── ...\n  │   │   └── date=2024-01-16/\n  │   └── table2/\n  └── database2/\n```\n\n## Redis Configuration\n\nFor Redis backend, use standard Redis connection URLs:\n- `redis://localhost:6379/0` - Standard Redis\n- `rediss://user:pass@host:6380/1` - Redis with TLS [12](#0-11) \n\n## Error Handling\n\nAll operations return errors through the Promise interface or standard Go error handling. The library uses async operations for better performance in high-throughput scenarios.\n\n## Thread Safety\n\nBoth JSON and Redis implementations are thread-safe and can be used concurrently across multiple goroutines.\n\n## Testing\n\nRun tests with a local Redis instance:\n```bash\n# Start Redis\ndocker run -d -p 6379:6379 redis:alpine\n\n# Run tests  \ngo test ./...\n```\n\n## License\n\nThis project is licensed under the Apache License 2.0. [13](#0-12) \n\n## Documentation\n\n[![Ask DeepWiki](https://deepwiki.com/badge.svg)](https://deepwiki.com/gigapi/metadata)\n\n## Contributing\n\n1. Fork the repository\n2. Create a feature branch\n3. Add tests for new functionality\n4. Ensure all tests pass\n5. Submit a pull request\n\n## Notes\n\n- The library is optimized for time-series data workloads with frequent writes and time-range queries\n- Redis backend is recommended for distributed deployments and high-throughput scenarios\n- JSON backend is suitable for single-node deployments and development environments\n- Merge operations are designed to be executed by external processes, with the library managing the planning and coordination\n- All time values are stored as Unix nanoseconds for high precision temporal operations\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgigapi%2Fmetadata","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgigapi%2Fmetadata","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgigapi%2Fmetadata/lists"}