https://github.com/casbin/google-groups-crawler
This tool is for legitimate use. We internally use it for synchronizing posts between Casnode forum and our own Casbin Google Group
https://github.com/casbin/google-groups-crawler
Last synced: 11 months ago
JSON representation
This tool is for legitimate use. We internally use it for synchronizing posts between Casnode forum and our own Casbin Google Group
- Host: GitHub
- URL: https://github.com/casbin/google-groups-crawler
- Owner: casbin
- License: apache-2.0
- Created: 2021-04-24T14:17:13.000Z (about 5 years ago)
- Default Branch: master
- Last Pushed: 2021-06-20T12:52:58.000Z (almost 5 years ago)
- Last Synced: 2025-06-09T13:27:39.623Z (11 months ago)
- Language: Go
- Homepage: https://github.com/casbin/casnode
- Size: 45.9 KB
- Stars: 5
- Watchers: 2
- Forks: 3
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Google Groups Crawler
## Installation
``` go
import crawler "github.com/casbin/google-groups-crawler"
```
## Usage
We must get an instance of `GoogleGroup` first:
``` go
group := crawler.NewGoogleGroup(groupName string, cookie ...string)
```
The second parameter `cookie` is optional. Google group won't tell you email address of all repliers until you logged
in, so you need to fill the parameter with a logged-in user's cookie. (Of course, this user must be a member of the
group)
It is OK to leave `cookie` blank, code still works. But `AuthorEmail` in `GoogleGroupMessage` will be empty. If you do need `cookie` to access emails of repliers, please follow these steps:
- open Google Chrome (or another browser) and login
- Navigate to [Google Group](https://groups.google.com/), select the group you want to craw
- Press F12, and select `network`
- Select a conversation (any conversation in this group is OK)
- Select the first item in the request list
- Select `Headers`
- In `Request Headers`, right click `cookie`, and copy the value
- Fill the parameter `cookie` with what you copied
### Get all conversations of the group
- For some special reasons, you cannot access Google Groups in some area. You can set up a http proxy, and fill the parameter `http.Client` with it. If you can access Google Groups directly, then you can just fill the parameter like the example code.
- this function returns an array of `GoogleGroupConversation`
``` go
conversations := group.GetConversations(http.Client{})
```
### Get all messages of the conversation
- `conversation` is an instance of `GoogleGroupConversation`
- parameter `http.Client` is the same effect as above
- this function returns an array of `GoogleGroupMessage`
```go
messages := conversation.GetAllMessages(http.Client{}, removeGmailQuote)
```
## Data Structure
```go
type GoogleGroup struct {
GroupName string
Cookie string
}
type GoogleGroupConversation struct {
Title string
Id string
GroupName string
Time float64
Cookie string
}
type GoogleGroupMessage struct {
Author string
AuthorEmail string
Content string
Time float64
}
```