{"id":13690331,"url":"https://github.com/PhantomInsights/tweet-transcriber","last_synced_at":"2025-05-02T06:32:29.795Z","repository":{"id":101907044,"uuid":"203445732","full_name":"PhantomInsights/tweet-transcriber","owner":"PhantomInsights","description":"A Reddit bot that transcribes tweets from comments and submissions links, mirrors their images and replies back with a formatted Markdown message.","archived":false,"fork":false,"pushed_at":"2022-04-04T15:23:35.000Z","size":58,"stargazers_count":18,"open_issues_count":0,"forks_count":3,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-04-29T14:35:06.372Z","etag":null,"topics":["beautifulsoup","imgur","praw","python3","reddit-bot","web-scraper"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/PhantomInsights.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null},"funding":{"github":"agentphantom","patreon":"agentphantom"}},"created_at":"2019-08-20T19:59:23.000Z","updated_at":"2024-04-13T08:38:38.000Z","dependencies_parsed_at":null,"dependency_job_id":"76b0cbfe-2913-4d4e-ad38-fcc526c5f4e8","html_url":"https://github.com/PhantomInsights/tweet-transcriber","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/PhantomInsights%2Ftweet-transcriber","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/PhantomInsights%2Ftweet-transcriber/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/PhantomInsights%2Ftweet-transcriber/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/PhantomInsights%2Ftweet-transcriber/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/PhantomInsights","download_url":"https://codeload.github.com/PhantomInsights/tweet-transcriber/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":251998736,"owners_count":21678007,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["beautifulsoup","imgur","praw","python3","reddit-bot","web-scraper"],"created_at":"2024-08-02T16:01:03.194Z","updated_at":"2025-05-02T06:32:29.502Z","avatar_url":"https://github.com/PhantomInsights.png","language":"Python","funding_links":["https://github.com/sponsors/agentphantom","https://patreon.com/agentphantom","https://www.patreon.com/bePatron?u=20521425"],"categories":["Misc"],"sub_categories":["External Reddit Tools"],"readme":"# Tweet Transcriber\n\nThis project implements a custom algorithm that extracts the most important values from a given tweet url, converts it into a `Markdown`  formatted text and mirrors any images into `Imgur`.\n\nThe formatted message and images are then posted to `Reddit` with a simple bot framework.\n\nThe code has been organized in a way that you will only require to call one function with a tweet JSON source and you will get all the important values in a dictionary. This way you can integrate it into your data pipelines with very low effort.\n\nIt was fully developed in `Python` and it is inspired by similar projects seen on `Reddit` that appear to be defunct.\n\nThe 2 most important files are:\n\n* `twitter.py` : This script includes 2 functions, one extracts all important values from a tweet JSON source and the other creates a `Markdown` text and mirrors twitter images to Imgur.\n\n* `bot_sidewide.py` : A Reddit bot that checks all posts from the domain twitter.com and replies to them with a transcribed tweet.\n\n## Requirements\n\nThis project uses the following Python libraries\n\n* `PRAW` : Makes the use of the Reddit API very easy.\n* `Requests` : To perform HTTP requests to twitter.com.\n* `BeautifulSoup` : To extract twitter urls from the Reddit comments.\n\n## Reddit Bots\n\nThis project includes 2 bots, `bot_sitewide.py` and `bot.py`. They share most of the code but have a small difference:\n\n* `bot.py` - This bot will look for tweets in subreddits posts and comments and reply to them with transcribed tweets.\n\n* `bot_sitewide.py` - This bot will only look for posts from the twitter.com domain and reply to them with a transcribed tweet.\n\nBoth bots keep a log of which comments and posts they have processed, this is to avoid making duplicate comments.\n\nWhen they request the new posts they first check that the post is not already processed and that the post contains in the url the strings `twitter.com` and `/status/`. This will ensure the link is indeed a tweet and not other Twitter url.\n\n```python\nprocessed_posts = load_log(POSTS_LOG)\n\n# We iterate over all new twitter.com posts.\nfor submission in reddit.domain(\"twitter.com\").new(limit=100):\n\n    if \"twitter.com\" in submission.url and \"status\" in submission.url and submission.id not in processed_posts:\n```\n\nAfter that the script removes the `mobile.` part of the tweet url, I found out that the HTML source varies a lot between mobile and not mobile tweet urls.\n\nWith the proper url at hand we send it to the `transcribe_tweet()` function from the `twitter.py` file.\n\n```python\nreddit.submission(submission.id).reply(transcribe_tweetsubmission.url.replace(\"mobile.\", \"\"), MESSAGE_TEMPLATE))\n```\n\nThis function will return us a `Markdown` formatted text that will then be used to reply to the original post. This function takes 2 parameters, a tweet url and a string template, you can find them in the templates folder.\n\nIf all this process was successful we update our processed posts log and move to the next post. If it fails we log the error for later verification.\n\n```python\nupdate_log(POSTS_LOG, submission.id)\nprint(\"Replied:\", submission.id)\n```\n\nThe process is a bit similar for tweet links inside a comment, all links are extracted from the comment and those that match our criteria are transcribed and added into a list.\n\n```python\n# Sometimes a comment may contain several links, we look for all of them.\ncomment_text = list()\n\n# Get all tweet links.\nsoup = BeautifulSoup(comment.body_html, \"html.parser\")\n\nfor link in soup.find_all(\"a\"):\n\n    if \"twitter.com\" in link[\"href\"] and \"/status/\" in link[\"href\"]:\n        comment_text.append(transcribe_tweet(link[\"href\"].replace(\"mobile.\", \"\"), MESSAGE_TEMPLATE))\n```\n\nThis list is then joined into a string and this string is then used to reply to the comment.\n\n```python\nreddit.comment(comment.id).reply(\"\\n\\n*****\\n\\n\".join(comment_text))\n```\n\n## Tweet Scraper\n\nTo extract the values from the tweet JSON source I used the same technique as other Twitter content downloaders.\n\nYou must first request a `guest_token` to the Twitter API sending a harcoded `Bearer Token`.\n\nOnce you get the `guest_token` you can sign with it some of the Twitter read-only endpoints, cush as `statuses/show.json` which is used in this project.\n\nYou will receive almost the same JSON as with the regular API, which contains the most useful fields such as the Tweet contents and its author metadata.\n\n## Conclusion\n\nI'm currently using these bots only on the subreddits I manage as an extra enhancement to the user experience. As of lately when celebrities and politicians publish highly controversial tweets they often delete them after a few minutes and I despise that behaviour.\n\nOne of the purposes of this project is to have a backup mechanism for said tweets. It can also be used to keep a local copy of the tweets and perform analysis on them in an easier to parse format (Python dictionary).\n\nIf you have any questions you are always welcome to open an issue.\n\n[![Become a Patron!](https://c5.patreon.com/external/logo/become_a_patron_button.png)](https://www.patreon.com/bePatron?u=20521425)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FPhantomInsights%2Ftweet-transcriber","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FPhantomInsights%2Ftweet-transcriber","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FPhantomInsights%2Ftweet-transcriber/lists"}