{"id":28609707,"url":"https://github.com/marcboeker/gmail-to-sqlite","last_synced_at":"2025-08-26T20:51:29.061Z","repository":{"id":214847821,"uuid":"737505151","full_name":"marcboeker/gmail-to-sqlite","owner":"marcboeker","description":"Index your Gmail account to a SQLite DB and play with the data.","archived":false,"fork":false,"pushed_at":"2025-05-27T08:46:49.000Z","size":87,"stargazers_count":1061,"open_issues_count":2,"forks_count":45,"subscribers_count":8,"default_branch":"main","last_synced_at":"2025-06-11T22:16:26.168Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/marcboeker.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2023-12-31T09:48:35.000Z","updated_at":"2025-06-10T15:09:12.000Z","dependencies_parsed_at":"2025-05-26T13:48:20.512Z","dependency_job_id":null,"html_url":"https://github.com/marcboeker/gmail-to-sqlite","commit_stats":null,"previous_names":["marcboeker/gmail-to-sqlite"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/marcboeker/gmail-to-sqlite","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/marcboeker%2Fgmail-to-sqlite","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/marcboeker%2Fgmail-to-sqlite/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/marcboeker%2Fgmail-to-sqlite/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/marcboeker%2Fgmail-to-sqlite/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/marcboeker","download_url":"https://codeload.github.com/marcboeker/gmail-to-sqlite/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/marcboeker%2Fgmail-to-sqlite/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":272254471,"owners_count":24901048,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-26T02:00:07.904Z","response_time":60,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-06-11T22:01:55.904Z","updated_at":"2025-08-26T20:51:29.053Z","avatar_url":"https://github.com/marcboeker.png","language":"Python","funding_links":[],"categories":["Python","Tools"],"sub_categories":[],"readme":"# Gmail to SQLite\n\nA robust Python application that syncs Gmail messages to a local SQLite database for analysis and archival purposes.\n\n## Features\n\n- **Incremental Sync**: Only downloads new messages by default\n- **Full Sync**: Option to download all messages and detect deletions\n- **Parallel Processing**: Multi-threaded message fetching for improved performance\n- **Robust Error Handling**: Automatic retries with exponential backoff\n- **Graceful Shutdown**: Handles interruption signals cleanly\n- **Type Safety**: Comprehensive type hints throughout the codebase\n\n## Installation\n\n### Prerequisites\n\n- Python 3.8 or higher\n- Google Cloud Project with Gmail API enabled\n- OAuth 2.0 credentials file (`credentials.json`)\n\n### Setup\n\n1. **Clone the repository:**\n\n   ```bash\n   git clone https://github.com/marcboeker/gmail-to-sqlite.git\n   cd gmail-to-sqlite\n   ```\n\n2. **Install dependencies:**\n\n   ```bash\n   # Using uv\n   uv sync\n   ```\n\n3. **Set up Gmail API credentials:**\n   - Go to the [Google Cloud Console](https://console.cloud.google.com/)\n   - Create a new project or select an existing one\n   - Enable the Gmail API\n   - Create OAuth 2.0 credentials (Desktop application)\n   - Download the credentials file and save it as `credentials.json` in the project root\n\n## Usage\n\n### Basic Commands\n\nYou can run the application using either `python` directly or via `uv`:\n\n```bash\n# Incremental sync (default)\npython main.py sync --data-dir ./data\n# or: uv run main.py sync --data-dir ./data\n\n# Full sync with deletion detection\npython main.py sync --data-dir ./data --full-sync\n\n# Sync a specific message\npython main.py sync-message --data-dir ./data --message-id MESSAGE_ID\n\n# Detect and mark deleted messages only\npython main.py sync-deleted-messages --data-dir ./data\n\n# Use custom number of worker threads\npython main.py sync --data-dir ./data --workers 8\n\n# Get help\npython main.py --help\npython main.py sync --help\n```\n\n### Command Line Arguments\n\n- `command`: Required. One of `sync`, `sync-message`, or `sync-deleted-messages`\n- `--data-dir`: Required. Directory where the SQLite database will be stored\n- `--full-sync`: Optional. Forces a complete sync of all messages\n- `--message-id`: Required for `sync-message`. The ID of a specific message to sync\n- `--workers`: Optional. Number of worker threads (default: number of CPU cores)\n- `--help`: Show help information for commands and options\n\n### Graceful Shutdown\n\nThe application supports graceful shutdown when you press CTRL+C:\n\n1. Stops accepting new tasks\n2. Waits for currently running tasks to complete\n3. Saves progress of completed work\n4. Exits cleanly\n\nPressing CTRL+C a second time will force an immediate exit.\n\n## Database Schema\n\nThe application creates a SQLite database with the following schema:\n\n| Field        | Type     | Description                      |\n| ------------ | -------- | -------------------------------- |\n| message_id   | TEXT     | Unique Gmail message ID          |\n| thread_id    | TEXT     | Gmail thread ID                  |\n| sender       | JSON     | Sender information (name, email) |\n| recipients   | JSON     | Recipients by type (to, cc, bcc) |\n| labels       | JSON     | Array of Gmail labels            |\n| subject      | TEXT     | Message subject                  |\n| body         | TEXT     | Message body (plain text)        |\n| size         | INTEGER  | Message size in bytes            |\n| timestamp    | DATETIME | Message timestamp                |\n| is_read      | BOOLEAN  | Read status                      |\n| is_outgoing  | BOOLEAN  | Whether sent by user             |\n| is_deleted   | BOOLEAN  | Whether deleted from Gmail       |\n| last_indexed | DATETIME | Last sync timestamp              |\n\n## Example queries\n\n### Get the number of emails per sender\n\n```sql\nSELECT sender-\u003e\u003e'$.email', COUNT(*) AS count\nFROM messages\nGROUP BY sender-\u003e\u003e'$.email'\nORDER BY count DESC\n```\n\n### Show the number of unread emails by sender\n\nThis is great to determine who is spamming you the most with uninteresting emails.\n\n```sql\nSELECT sender-\u003e\u003e'$.email', COUNT(*) AS count\nFROM messages\nWHERE is_read = 0\nGROUP BY sender-\u003e\u003e'$.email'\nORDER BY count DESC\n```\n\n### Get the number of emails for a specific period\n\n- For years: `strftime('%Y', timestamp)`\n- For months in a year: `strftime('%m', timestamp)`\n- For days in a month: `strftime('%d', timestamp)`\n- For weekdays: `strftime('%w', timestamp)`\n- For hours in a day: `strftime('%H', timestamp)`\n\n```sql\nSELECT strftime('%Y', timestamp) AS period, COUNT(*) AS count\nFROM messages\nGROUP BY period\nORDER BY count DESC\n```\n\n### Find all newsletters and group them by sender\n\nThis is an amateurish way to find all newsletters and group them by sender. It's not perfect, but it's a start. You could also use\n\n```sql\nSELECT sender-\u003e\u003e'$.email', COUNT(*) AS count\nFROM messages\nWHERE body LIKE '%newsletter%' OR body LIKE '%unsubscribe%'\nGROUP BY sender-\u003e\u003e'$.email'\nORDER BY count DESC\n```\n\n### Show who has sent the largest emails in MB\n\n```sql\nSELECT sender-\u003e\u003e'$.email', sum(size)/1024/1024 AS size\nFROM messages\nGROUP BY sender-\u003e\u003e'$.email'\nORDER BY size DESC\n```\n\n### Count the number of emails that I have sent to myself\n\n```sql\nSELECT count(*)\nFROM messages\nWHERE EXISTS (\n  SELECT 1\n  FROM json_each(messages.recipients-\u003e'$.to')\n  WHERE json_extract(value, '$.email') = 'foo@example.com'\n)\nAND sender-\u003e\u003e'$.email' = 'foo@example.com'\n```\n\n### List the senders who have sent me the largest total volume of emails in megabytes\n\n```sql\nSELECT sender-\u003e\u003e'$.email', sum(size)/1024/1024 as total_size\nFROM messages\nWHERE is_outgoing=false\nGROUP BY sender-\u003e\u003e'$.email'\nORDER BY total_size DESC\n```\n\n### Find all deleted messages\n\n```sql\nSELECT message_id, subject, timestamp\nFROM messages\nWHERE is_deleted=1\nORDER BY timestamp DESC\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmarcboeker%2Fgmail-to-sqlite","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmarcboeker%2Fgmail-to-sqlite","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmarcboeker%2Fgmail-to-sqlite/lists"}