{"id":39871886,"url":"https://github.com/mabel-dev/s1","last_synced_at":"2026-01-18T14:17:49.334Z","repository":{"id":144107936,"uuid":"444881868","full_name":"mabel-dev/s1","owner":"mabel-dev","description":"An AWS S3 emulator, only implementing a little bit of s3","archived":false,"fork":false,"pushed_at":"2025-12-23T16:52:36.000Z","size":1234,"stargazers_count":1,"open_issues_count":1,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-12-25T06:49:09.341Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mabel-dev.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2022-01-05T16:55:17.000Z","updated_at":"2025-10-06T23:10:42.000Z","dependencies_parsed_at":null,"dependency_job_id":"5fdab63a-1f58-4a42-8090-a8bb87101171","html_url":"https://github.com/mabel-dev/s1","commit_stats":null,"previous_names":["mabel-dev/s1"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/mabel-dev/s1","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mabel-dev%2Fs1","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mabel-dev%2Fs1/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mabel-dev%2Fs1/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mabel-dev%2Fs1/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mabel-dev","download_url":"https://codeload.github.com/mabel-dev/s1/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mabel-dev%2Fs1/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28537496,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-18T13:04:05.990Z","status":"ssl_error","status_checked_at":"2026-01-18T13:01:44.092Z","response_time":98,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-01-18T14:17:49.222Z","updated_at":"2026-01-18T14:17:49.311Z","avatar_url":"https://github.com/mabel-dev.png","language":"Python","readme":"# S1\n\nImplementing only a little bit of S3\n\n## Overview\n\nS1 is a lightweight S3-compatible API implementation that provides core S3 services for reading data from Google Cloud Storage (GCS). It implements the following S3 APIs:\n\n### Implemented Features\n\n1. **GetBucketLocation** - Returns the region where the bucket resides\n2. **ListObjects** - Lists objects in a bucket with support for filtering\n3. **GetObject** - Retrieves objects from a bucket\n4. **SelectObjectContent (S3 Select)** - Enables SQL queries on S3 objects for data filtering and transformation\n\n## API Endpoints\n\n### 1. GetBucketLocation\n```\nGET /{bucket}?location\n```\nReturns the AWS region for the bucket (always returns `eu-west-2`).\n\n### 2. ListObjects\n```\nGET /{bucket}?delimiter={delimiter}\u0026prefix={prefix}\u0026max-keys={max-keys}\u0026marker={marker}\n```\nLists objects in a bucket. Supports query parameters:\n- `prefix` - Limits response to keys that begin with the specified prefix\n- `delimiter` - Character used to group keys\n- `max-keys` - Maximum number of keys to return (default: 1000)\n- `marker` - Key to start with when listing objects\n\n### 3. GetObject\n```\nGET /{bucket}/{object}\n```\nRetrieves an object from the bucket.\n\n### 4. SelectObjectContent (S3 Select)\n```\nPOST /{bucket}/{object}?select\u0026select-type=2\n```\nPerforms SQL queries on objects stored in S3. The request body should contain XML with:\n- SQL expression\n- Input serialization format (Parquet only)\n- Output serialization format (CSV or JSON)\n\n**Note**: The SQL API only supports Parquet files. For accessing other file types (CSV, JSON, etc.), use the GetObject endpoint.\n\n#### Example Request Body:\n```xml\n\u003c?xml version=\"1.0\" encoding=\"UTF-8\"?\u003e\n\u003cSelectObjectContentRequest xmlns=\"http://s3.amazonaws.com/doc/2006-03-01/\"\u003e\n    \u003cExpression\u003eSELECT * FROM S3Object WHERE price \u003e 100\u003c/Expression\u003e\n    \u003cExpressionType\u003eSQL\u003c/ExpressionType\u003e\n    \u003cInputSerialization\u003e\n        \u003cParquet/\u003e\n    \u003c/InputSerialization\u003e\n    \u003cOutputSerialization\u003e\n        \u003cJSON/\u003e\n    \u003c/OutputSerialization\u003e\n\u003c/SelectObjectContentRequest\u003e\n```\n\n## Supported S3 Select Features\n\n- **Input Formats**: Parquet only\n- **Output Formats**: CSV, JSON\n- **SQL Operations**: \n  - SELECT with column specification or wildcard (*)\n  - Basic WHERE clause filtering\n  - Queries against S3Object alias\n\n**Important**: The SQL API (SelectObjectContent) only supports Parquet files. For other file formats like CSV or JSON, use the GetObject API for blob access.\n\n## Architecture\n\nThe implementation uses:\n- **FastAPI** for the web framework\n- **Storage abstraction layer** supporting both Google Cloud Storage and local filesystem\n- **LRU caching** for improved read performance using Python's `functools.lru_cache`\n- **XML parsing** for S3 Select request handling\n- **Parquet support** for SQL queries (other formats available via GetObject)\n\n## Running the Service\n\n```bash\npython src/main.py\n```\n\nThe service will start on port 8080 (or the port specified in the `PORT` environment variable).\n\n## Storage Backend\n\nS1 supports two storage backends:\n\n### Google Cloud Storage (GCS)\nThe default backend uses Google Cloud Storage (GCS). When `STORAGE_EMULATOR_HOST` environment variable is set, it connects to a storage emulator for testing purposes.\n\n### Local Filesystem\nS1 can also use the local filesystem as a storage backend, which is useful for testing or development environments.\n\n### Configuration\n\nThe storage backend is configured using environment variables:\n\n- **`STORAGE_BACKEND`** - Set to `gcs` (default) or `local` to choose the backend\n- **`STORAGE_CACHE_SIZE`** - LRU cache size for blob content (default: 128)\n- **`LOCAL_STORAGE_PATH`** - Base path for local filesystem storage (default: `/data`)\n- **`GCS_PROJECT`** - GCS project name (default: `PROJECT`)\n- **`STORAGE_EMULATOR_HOST`** - GCS emulator host for testing\n\n### LRU Caching\n\nS1 implements LRU (Least Recently Used) caching for blob content reads. This significantly improves performance when the same objects are accessed multiple times. The cache size can be configured using the `STORAGE_CACHE_SIZE` environment variable.\n\nThe caching layer operates transparently for both GCS and local filesystem backends, making S1 an effective caching layer for systems like Opteryx.\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmabel-dev%2Fs1","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmabel-dev%2Fs1","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmabel-dev%2Fs1/lists"}