{"id":30687600,"url":"https://github.com/mizzy/tweetduck","last_synced_at":"2025-09-02T00:04:27.695Z","repository":{"id":307476766,"uuid":"1029259550","full_name":"mizzy/tweetduck","owner":"mizzy","description":"Twitter Archive to DuckDB Importer - Extract and import Twitter archive data (2025 format) into DuckDB for analysis","archived":false,"fork":false,"pushed_at":"2025-08-02T22:04:48.000Z","size":12,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-08-30T14:47:20.912Z","etag":null,"topics":["archive","cli","data-analysis","duckdb","golang","twitter"],"latest_commit_sha":null,"homepage":null,"language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mizzy.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-07-30T19:03:12.000Z","updated_at":"2025-08-02T22:04:50.000Z","dependencies_parsed_at":"2025-07-31T14:35:59.111Z","dependency_job_id":"e7fd1c54-7698-48f6-b42a-6d22d4df3c47","html_url":"https://github.com/mizzy/tweetduck","commit_stats":null,"previous_names":["mizzy/tweetduck"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/mizzy/tweetduck","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mizzy%2Ftweetduck","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mizzy%2Ftweetduck/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mizzy%2Ftweetduck/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mizzy%2Ftweetduck/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mizzy","download_url":"https://codeload.github.com/mizzy/tweetduck/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mizzy%2Ftweetduck/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":273208777,"owners_count":25064204,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-09-01T02:00:09.058Z","response_time":120,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["archive","cli","data-analysis","duckdb","golang","twitter"],"created_at":"2025-09-02T00:03:26.891Z","updated_at":"2025-09-02T00:04:27.674Z","avatar_url":"https://github.com/mizzy.png","language":"Go","funding_links":[],"categories":[],"sub_categories":[],"readme":"# TweetDuck\n\nTwitter Archive to DuckDB Importer\n\nTweetDuckは、TwitterアーカイブのZIPファイルからデータを抽出し、DuckDBデータベースにインポートするGoプログラムです。2025年形式の新しいTwitterアーカイブに対応しています。\n\n## 機能\n\n- **2025年形式対応**: 最新のTwitterアーカイブZIPファイルの自動解析\n- **包括的データ抽出**: ツイート、フォロワー、フォロイングデータの抽出\n- **高速インポート**: DuckDBへの効率的なデータインポート\n- **重複データ処理**: INSERT OR IGNOREによる重複データの自動処理\n- **詳細ログ**: verboseモードでの詳細な処理状況表示\n\n## 必要要件\n\n- Go 1.21以上\n- DuckDBライブラリ（自動取得）\n\n## インストール\n\n```bash\ngit clone \u003crepository-url\u003e\ncd tweetduck\ngo mod tidy\ngo build -o tweetduck\n```\n\n## 使用方法\n\n### 基本的な使用方法\n```bash\n./tweetduck --archive=\"path/to/twitter-archive.zip\" --db=\"output.duckdb\"\n```\n\n### 詳細ログ付きで実行\n```bash\n./tweetduck --archive=\"path/to/twitter-archive.zip\" --db=\"output.duckdb\" --verbose\n```\n\n### オプション\n\n- `--archive`, `-a`: TwitterアーカイブのZIPファイルパス（必須）\n- `--db`, `-d`: 出力先DuckDBファイルパス（デフォルト: tweets.duckdb）\n- `--verbose`, `-v`: 詳細ログの出力\n\n## データベーススキーマ\n\n### tweets テーブル\n- `id` (VARCHAR): ツイートID（主キー）\n- `text` (TEXT): ツイート本文\n- `created_at` (TIMESTAMP): 作成日時\n- `retweet_count` (INTEGER): リツイート数\n- `favorite_count` (INTEGER): お気に入り数\n- `retweeted` (BOOLEAN): リツイート済みフラグ\n- `favorited` (BOOLEAN): お気に入り済みフラグ\n- `source` (TEXT): 投稿元アプリケーション\n- `lang` (VARCHAR(10)): 言語コード\n\n### followers テーブル\n- `follower_id` (VARCHAR): フォロワーのユーザーID（主キー）\n- `user_link` (TEXT): フォロワーのTwitterリンク\n\n### following テーブル\n- `following_id` (VARCHAR): フォロー中のユーザーID（主キー）\n- `user_link` (TEXT): フォロー中ユーザーのTwitterリンク\n\n## 使用例\n\n### データ検索例\n\n```sql\n-- ドクターペッパー関連のツイートを検索\nSELECT text, created_at FROM tweets \nWHERE text LIKE '%ドクペ%' OR text LIKE '%ドクターペッパー%' OR text LIKE '%Dr Pepper%'\nORDER BY created_at DESC \nLIMIT 10;\n\n-- 特定の期間のツイート数をカウント\nSELECT DATE(created_at) as date, COUNT(*) as tweet_count \nFROM tweets \nWHERE created_at \u003e= '2025-01-01' \nGROUP BY DATE(created_at) \nORDER BY date DESC;\n\n-- フォロワー数とフォロー数を確認\nSELECT \n  (SELECT COUNT(*) FROM followers) as follower_count,\n  (SELECT COUNT(*) FROM following) as following_count;\n```\n\n## 対応アーカイブ形式\n\n- 2025年形式のTwitterアーカイブ（新形式）\n- `data/tweets.js` および `data/deleted-tweets.js` の処理\n- `data/follower.js` および `data/following.js` の処理\n- JavaScript形式データファイル（`window.YTD.*` 形式）\n\n## ライセンス\n\nMIT License","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmizzy%2Ftweetduck","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmizzy%2Ftweetduck","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmizzy%2Ftweetduck/lists"}