{"id":16577386,"url":"https://github.com/oliver006/sockpuppet","last_synced_at":"2025-03-23T14:31:01.693Z","repository":{"id":32812488,"uuid":"36405226","full_name":"oliver006/sockpuppet","owner":"oliver006","description":"Having fun with WebSockets, Python, Golang and nytimes.com","archived":false,"fork":false,"pushed_at":"2016-04-13T16:21:38.000Z","size":10,"stargazers_count":38,"open_issues_count":2,"forks_count":7,"subscribers_count":4,"default_branch":"master","last_synced_at":"2025-03-02T01:13:33.991Z","etag":null,"topics":["golang","python","websocket-client","websockets"],"latest_commit_sha":null,"homepage":"","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/oliver006.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2015-05-28T00:47:46.000Z","updated_at":"2025-02-02T12:08:53.000Z","dependencies_parsed_at":"2022-09-12T14:01:02.368Z","dependency_job_id":null,"html_url":"https://github.com/oliver006/sockpuppet","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/oliver006%2Fsockpuppet","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/oliver006%2Fsockpuppet/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/oliver006%2Fsockpuppet/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/oliver006%2Fsockpuppet/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/oliver006","download_url":"https://codeload.github.com/oliver006/sockpuppet/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":244306108,"owners_count":20431746,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["golang","python","websocket-client","websockets"],"created_at":"2024-10-11T22:10:43.250Z","updated_at":"2025-03-23T14:31:01.206Z","avatar_url":"https://github.com/oliver006.png","language":"Go","funding_links":[],"categories":[],"sub_categories":[],"readme":"# SockPuppet\n#### Having fun with WebSockets, Python, Golang and nytimes.com \u003cbr\u003e\n\u003cimg src =\"http://upload.wikimedia.org/wikipedia/commons/a/a7/Sock-puppet.jpg\" height=\"50px\"\u003e \u003cimg src =\"http://upload.wikimedia.org/wikipedia/commons/a/a7/Sock-puppet.jpg\" height=\"50px\"\u003e \u003cimg src =\"http://upload.wikimedia.org/wikipedia/commons/a/a7/Sock-puppet.jpg\" height=\"50px\"\u003e \u003cimg src =\"http://upload.wikimedia.org/wikipedia/commons/a/a7/Sock-puppet.jpg\" height=\"50px\"\u003e \u003cimg src =\"http://upload.wikimedia.org/wikipedia/commons/a/a7/Sock-puppet.jpg\" height=\"50px\"\u003e \u003cimg src =\"http://upload.wikimedia.org/wikipedia/commons/a/a7/Sock-puppet.jpg\" height=\"50px\"\u003e \u003cimg src =\"http://upload.wikimedia.org/wikipedia/commons/a/a7/Sock-puppet.jpg\" height=\"50px\"\u003e \u003cimg src =\"http://upload.wikimedia.org/wikipedia/commons/a/a7/Sock-puppet.jpg\" height=\"50px\"\u003e \u003cimg src =\"http://upload.wikimedia.org/wikipedia/commons/a/a7/Sock-puppet.jpg\" height=\"50px\"\u003e \u003cimg src =\"http://upload.wikimedia.org/wikipedia/commons/a/a7/Sock-puppet.jpg\" height=\"50px\"\u003e \u003cimg src =\"http://upload.wikimedia.org/wikipedia/commons/a/a7/Sock-puppet.jpg\" height=\"50px\"\u003e \u003cimg src =\"http://upload.wikimedia.org/wikipedia/commons/a/a7/Sock-puppet.jpg\" height=\"50px\"\u003e \u003cimg src =\"http://upload.wikimedia.org/wikipedia/commons/a/a7/Sock-puppet.jpg\" height=\"50px\"\u003e \u003cimg src =\"http://upload.wikimedia.org/wikipedia/commons/a/a7/Sock-puppet.jpg\" height=\"50px\"\u003e \u003cimg src =\"http://upload.wikimedia.org/wikipedia/commons/a/a7/Sock-puppet.jpg\" height=\"50px\"\u003e \u003cimg src =\"http://upload.wikimedia.org/wikipedia/commons/a/a7/Sock-puppet.jpg\" height=\"50px\"\u003e \u003cimg src =\"http://upload.wikimedia.org/wikipedia/commons/a/a7/Sock-puppet.jpg\" height=\"50px\"\u003e \u003cimg src =\"http://upload.wikimedia.org/wikipedia/commons/a/a7/Sock-puppet.jpg\" height=\"50px\"\u003e \n\n\n\u003cbr\u003e\n### What's this all about? \nDid you ever wonder how **nytimes.com** pushes breaking news articles to the front page while you have it open in your browser? Well, I used my browser's developer tools to look at what's going one and it turns out, they don't periodically reload JSON data but use websockets to push new events directly to your browser ([see here](https://developer.mozilla.org/en-US/docs/WebSockets) for more information about websockets).\u003cbr\u003e\nIt's a system called `nyt-fabrik`, here are a few talks and presentations where they give some insight into the architecture: [search google for \"nytimes fabrik websockets\"](https://www.google.com/search?q=nytimes+fabrik+websockets). \n\nThere is example code, see [here for the Python code](sockpuppet.py) and [here for the Golang example](sockpuppet.go).\n\n\u003cbr\u003e\n### Cool, so how does it work?\n\nWhen you go to **nytimes.com**, your browser will establish a websocket connection with the NYT fabrik server and, after a little login dance, will start listening for news events.\nYour browser opens a websocket TCP connection to e.g. `ws://blablabla.fabrik.nytimes.com./123/abcde123/websocket` and the server sends a one-character frame `o` which is a request to provide some sort of login identification.\u003cbr\u003e\nThe client (your browser) responds with `[\"{\\\"action\\\":\\\"login\\\",\\\"client_app\\\":\\\"hermes.push\\\",\\\"cookies\\\":{\\\"nyt-s\\\":\\\"SOME_COOKIE_VALUE_HERE\\\"}}\"]` and next thing you know you, you either receive a `h` every 20-30 seconds which is some sort of keep-alive or a frame that starts with `a` and has all sorts of data encoded as JSON.\n\nIf we receive a message starting with `a`, we can strip the first character and JSON decode the rest. \n\n```json\n{\n    \"body\": \"{\\\"status\\\":\\\"updated\\\",\\\"version\\\":1,\\\"links\\\":[{\\\"url\\\":\\\"http://www.nytimes.com/2015/05/26/us/cleveland-police.html\\\",\\\"count\\\":0,\\\"content_id\\\":\\\"100000003702598\\\",\\\"content_type\\\":\\\"article\\\",\\\"offset\\\":0}],\\\"title\\\":\\\"Cleveland Is Said to Settle Justice Department Lawsuit Over Policing\\\",\\\"start_time\\\":1432581057,\\\"display_duration\\\":null,\\\"label\\\":\\\"Breaking News\\\",\\\"last_modified\\\":1432581057,\\\"display_type_id\\\":1,\\\"end_time\\\":1432581057,\\\"id\\\":34931339,\\\"sub_type\\\":\\\"BreakingNews\\\"}\",\n    \"timestamp\": \"2015-05-21T11:21:11.123456Z\",\n    \"hash_key\": \"34131339\",\n    \"uuid\": \"1234\",\n\t...\n    \"account\": \"nyt1\",\n    \"type\": \"feeds_item\"\n}\n```\n\nIf the decoded message has field \"body\", we can decode it. In case of a breaking news item it looks something like this: \n\n```json\n{\"status\": \"updated\", \"sub_type\": \"BreakingNews\", \n\"links\": [{\"url\": \"http://www.nytimes.com/2015/05/26/us/cleveland-police.html\", \"count\": 0, \"content_id\": \"100000003702598\", \"content_type\": \"article\", \"offset\": 0}], \n\"title\": \"Cleveland Is Said to Settle Justice Department Lawsuit Over Policing\", \n\"start_time\": 1432581057, \"display_duration\": null, \"label\": \"Breaking News\",\n\"version\": 1, \"display_type_id\": 1, \"end_time\": 1432581057, \n\"last_modified\": 1432581057, \"id\": 34131339}\n```\n\u003cbr\u003e\n### Neat but how do I access the feed programmatically?\n\nGood question, let's see, we need about 3-4 things to get this to work, easy. For the Python example, I'll be using the [Tornado websocket framework](http://tornado.readthedocs.org/en/latest/websocket.html) and for the Golang example I'll be using the [Golang.org websocket package](https://godoc.org/golang.org/x/net/websocket).\n\n#### Connect to the websocket\n\nIn Python, this is easy:\n\n```python\nurl = \"ws://blablabla.fabrik.nytimes.com./123/abcdef123/websocket\"\ntry:\n    w = yield tornado.websocket.websocket_connect(url, connect_timeout=5)\n    logging.info(\"Connected to %s\", url)\nexcept Exception as ex:\n    logging.error(\"couldn't connect, err: %s\", ex)\n``` \n\nIn Golang, it looks about the same:\n\n```go\naddr := \"ws://blablabla.fabrik.nytimes.com./123/abcdef123/websocket\"\nws, err := websocket.Dial(addr, \"\", \"http://www.nytimes.com/\")\nif err != nil {\n\tlog.Fatal(err)\n}\nlog.Printf(\"Connected to %s\", addr)\n```\nThat was easy, wasn't it?\n\n#### Listen for incoming messages \nGood, we now are connected and have a websocket object/struct we can work with, let's listen for incoming messages.\u003cbr\u003e\n\nPython:\n\n```python\nwhile True:\n    payload = yield w.read_message()\n    if payload is None:\n        logging.error(\"uh oh, we got disconnected\")\n        return\n```\nand in Golang:\n\n```go\nvar msgBuf = make([]byte, 4096)\nfor {\n\tbufLen, err := ws.Read(msgBuf)\n\tif err != nil {\n\t\tlog.Printf(\"read err: %s\", err)\n\t\treturn\n\t}\n```\nOne caveat here, the Golang version can't handle messages longer than 4k (it'll chunk them into 4k pieces) but for our purposes that's not an issue.\n\n#### Send the login message \n\nIf we receive `o` we need to send the login message. We need a cookie value so let's make one up:\n\n```python\nif payload[0] == \"o\":\n    cookie = ''.join(random.choice(string.ascii_letters + string.digits) for _ in range(32))\n    msg = json.dumps(['{\"action\":\"login\", \"client_app\":\"hermes.push\", \"cookies\":{\"nyt-s\":\"%s\"}}' % cookie])\n    w.write_message(msg.encode('utf8'))\n    logging.info(\"sent cookie: %s\", cookie)\n```\n\nIn Golang this is a bit more verbose:\n\n```go\nif msgBuf[0] == 'o' {\n\t// reply to the login request\n\tcookie := randCookie()\n\tmsg := fmt.Sprintf(`[\"{\\\"action\\\":\\\"login\\\", \\\"client_app\\\":\\\"hermes.push\\\", \\\"cookies\\\":{\\\"nyt-s\\\":\\\"%s\\\"}}\"]`, cookie)\n\t_, err := ws.Write([]byte(msg))\n\tif err != nil {\n\t\tlog.Fatal(err)\n\t}\n\tlog.Printf(\"Sent cookie: %s\\n\", cookie)\n}\n```\nand `randCookie()` looks like this:\n\n```go\nfunc randCookie() string {\n\tletters := []rune(\"abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890\")\n\tb := make([]rune, 30)\n\tfor i := range b {\n\t\tb[i] = letters[rand.Intn(len(letters))]\n\t}\n\treturn string(b)\n}\n```\n\n#### Patiently wait; and (mostly) ignore the `h` messages\nNothing much to do here, whenever we get a `h` message we can simply write `ping` to the console.\n\n```python\nelif payload[0] == 'h':\n    logging.info('ping')\n```\nand\n\n```go\nif payload[0] == \"o\" {\n\tlog.Println(\"ping\")\n}\n```\n \n\n#### Decode the news alert message when we receive one\n\nMessages from the server that start with `a` contain JSON encoded data that we can decode. \nPython first:\n\n```go\nelif payload[0] == 'a':\n    frame = json.loads(payload[1:])\n\tif 'body' in frame:\n\t    body = json.loads(frame['body'])\n```\t\nNow you can for check `if body['sub_type'] == \"BreakingNews\"` or whatever else you plan on doing with this.\n\nIn Golang everything is a bit more verbose but roughly works the same (inlined and shortened for brevity).\n\n```python\nif payload[0] == \"o\" {\n\n\tframe := []struct {\n\t\tUUID        string `json:\"uuid\"`\n\t\tProduct     string `json:\"product\"`\n\t\tProject     string `json:\"project\"`\n\t\t...\n\t\tBody        string `json:\"body,omitempty\"`\n\t}{}\n\n\t// [1:] as we want to skip the leading character `a`\n\terr = json.Unmarshal(payload[1:], \u0026frame)\n\tif err != nil {\n\t\treturn\n\t}\n\tif len(frame.Body) \u003e 1 {\n\t\t// here we should try to JSON unmarshal frame.Body\n\t}\n}\n\n```\n`frame.Body` can now be unmarshaled in the same way as `payload[1:]` earlier.\nThe resulting struct for it looks something like this:\n\n```go\ntype MessageBody struct {\n\tID           int    `json:\"id\"`\n\tTitle        string `json:\"title\"`\n\tStatus       string `json:\"status\"`\n\tVersion      int    `json:\"version\"`\n\tSubType      string `json:\"sub_type\"`\n\tLabel        string `json:\"label\"`\n\tStartTime    int    `json:\"start_time\"`\n\tEndTime      int    `json:\"end_time\"`\n\tLastModified int    `json:\"last_modified\"`\n\tLinks []struct {\n\t\tURL         string `json:\"url\"`\n\t\tContentID   string `json:\"content_id\"`\n\t} `json:\"links\"`\n}\n\n``` \n\n\u003cbr\u003e\n### Sweet but what do I do with this?\n\nTotally up to you. Send yourself an email or txt msg using Twilio or Plivo every time something happens. \n\n\n### Cool, how do I run the examples?\n\nPython\n\n```\npython sockpuppet.py --ws_addr=\"ws://\u003c\u003cADDRESS HERE\u003e\u003e\"\n```\n\nGo\n\n```\ngo run sockpuppet.go --ws_addr=\"ws://\u003c\u003cADDRESS HERE\u003e\u003e\"\n```\n\nYou can find a valid websocket host by using the Developer Console of your favorite browser and visit [nytimes.com](nytimes.com) and look for websocket connections in the network tab.\n\n\n\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Foliver006%2Fsockpuppet","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Foliver006%2Fsockpuppet","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Foliver006%2Fsockpuppet/lists"}