{"id":17030531,"url":"https://github.com/vsoch/google-group-export","last_synced_at":"2025-08-18T16:08:01.300Z","repository":{"id":141667782,"uuid":"216623505","full_name":"vsoch/google-group-export","owner":"vsoch","description":"An interactive script to export a Google Group","archived":false,"fork":false,"pushed_at":"2019-10-21T18:43:26.000Z","size":89681,"stargazers_count":7,"open_issues_count":1,"forks_count":2,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-04-12T12:12:37.269Z","etag":null,"topics":["export","google-export","google-group","selenium"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/vsoch.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-10-21T17:15:01.000Z","updated_at":"2023-05-28T23:13:00.000Z","dependencies_parsed_at":null,"dependency_job_id":"f4a5e2a2-04b8-4eb9-bf32-743e238825b9","html_url":"https://github.com/vsoch/google-group-export","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/vsoch/google-group-export","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vsoch%2Fgoogle-group-export","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vsoch%2Fgoogle-group-export/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vsoch%2Fgoogle-group-export/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vsoch%2Fgoogle-group-export/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/vsoch","download_url":"https://codeload.github.com/vsoch/google-group-export/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vsoch%2Fgoogle-group-export/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":271019805,"owners_count":24685740,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-18T02:00:08.743Z","response_time":89,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["export","google-export","google-group","selenium"],"created_at":"2024-10-14T08:07:18.246Z","updated_at":"2025-08-18T16:08:01.267Z","avatar_url":"https://github.com/vsoch.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Google Group Scraper\n\nThis is a derivation of an old set of scripts that I derived in graduate school (2017 possibly?),\nlost for a few years, and amazingly found on a USB drive! I refactored them\nto work with (what seems to be) a slightly different Groups interface.\nYou need to use Python 3, because I haven't thought about Python 2 compatability.\nUsing this module relies on being able to see all the posts, as it's been developed\nusing a public group (which I'm a member of). This should work to export\ncorrespondence from a Google group into text files, for doing whatever you please with.\n\nYes, this is definitely hacky, but (at least for today) it's working for me!\n\n## Usage\n\nThe only arguments that are needed are the organization (e.g., 'a/lbl.gov')\nand the group name (e.g., singularity). \n\n```bash\n$ python scrape.py --help\nusage: scrape.py [-h] org group\n\nGoogle Group Scraper\n\npositional arguments:\n  org         the name the organization (e.g., a/lbl.gov)\n  group       the name of the Google Group (e.g., singularity)\n\noptional arguments:\n  -h, --help  show this help message and exit\n```\n\nYou can provide them as the first\nand second positional argument as follows:\n\n```python\n$ python scrape.py a/lbl.gov singularity\n```\n\nYou'll need Chrome, as we use the chromedriver included in this repository.\nIf you have Chrome, you'll see a window open that is being controlled by \n\"automated test software.\" In the window that you are running the script,\n you'll see an instruction:\n\n```bash\n$ python scrape.py a/lbl.gov singularity\n/home/vanessa/Documents/Dropbox/Code/Python/GoogleGroupScraper/exports/a-lbl.gov/singularity/2019-10-21 already exists, will only get new topics.\nOutput: /home/vanessa/Documents/Dropbox/Code/Python/GoogleGroupScraper/exports/a-lbl.gov/singularity/2019-10-21\nScroll to the bottom (the first post) and press Enter to continue...\n```\n\nAnd this is exactly what you need to do - in the browser, scroll (page down is fast)\nto the absolute bottom of the forum. Yes, this means in many cases posts from 2015\nor earlier! Here is the bottom of the Singularity forum:\n\n![img/bottom.png](img/bottom.png)\n\nOnce you've scrolled to the bottom and pressed enter, it will proceed to\nextract the messages from the topic threads.\n\n```bash\n...\n/home/vanessa/Documents/Dropbox/Code/Python/GoogleGroupScraper/exports/a-lbl.gov/singularity/2019-10-21 already exists, will only get new topics.\nOutput: /home/vanessa/Documents/Dropbox/Code/Python/GoogleGroupScraper/exports/a-lbl.gov/singularity/2019-10-21\nScroll to the bottom (the first post) and press Enter to continue...\nFound 1016 topics links on forum\nObtained 3 raw urls.\nObtained 4 raw urls.\n```\n\n## Concepts\n\nBefore we talk about output files, let's review some basic concepts that will\nhelp us define the output organization.\n\n - *topics/threads* are akin to posts. When I post to the list, I create a new topic. This is also sometimes referred to as a thread, because a topic is a thread of one or more messages.\n - *messages* are sub sections of topics. The first message is my original post, and if you respond, that would be a new message.\n - *forum* is the entire set of topics\n\n## Output\n\nThus, we are going to organize output based on the organization and group, along\nwith the date and thread id and message id.\n\n```\n# exports/\u003corg\u003e/\u003cgroup\u003e/\u003cdate\u003e/topics/\u003cthread_id\u003e/\u003cmessage_id\u003e\n# exports/\u003corg\u003e/\u003cgroup\u003e/\u003cdate\u003e/urls.txt\n```\n\nA real example might look like this:\n\n```\nexports/\n└── a-lbl.gov\n    └── singularity\n        ├── 2017-03-31\n        │   ├── group.json\n        │   ├── README.md\n        │   └── urls.txt\n        └── 2019-10-21\n            ├── topics\n            │   ├── 1BUYcwzc2ww.txt\n            │   │   ├── DNGGUjnxFQAJ.txt\n            │   │   └── gYYrT5uXAwAJ.txt\n...\n            │   └── zcoOui_RoAo.txt\n            │       ├── 5x9MQ9VbDAAJ.txt\n            │       ├── cBniyJB4DAAJ.txt\n            │       ├── frM-tvYSDQAJ.txt\n            │       ├── K6lcgar6DAAJ.txt\n            │       ├── PBZxW3kgDAAJ.txt\n            │       └── q9BoZwEuDAAJ.txt\n            └── urls.txt\n\n```\n\nIf you look at any particular file, it's the entire content of the email message.\nSo likely you'll need to parse that further, either to extract text content,\nor do some kind of fun natural language processing project.\n\nThat's basically it! You can look at the included export for an example.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvsoch%2Fgoogle-group-export","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fvsoch%2Fgoogle-group-export","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvsoch%2Fgoogle-group-export/lists"}