{"id":27334884,"url":"https://github.com/dmuth/twitter-aws-comprehend","last_synced_at":"2025-04-12T14:46:34.102Z","repository":{"id":50186369,"uuid":"128685272","full_name":"dmuth/twitter-aws-comprehend","owner":"dmuth","description":"An app to analyze tweets using Amazon Comprehend's Sentiment Analysis service","archived":false,"fork":false,"pushed_at":"2022-12-08T00:51:46.000Z","size":3967,"stargazers_count":16,"open_issues_count":1,"forks_count":3,"subscribers_count":2,"default_branch":"main","last_synced_at":"2024-05-02T06:07:45.137Z","etag":null,"topics":["analyze-tweets","aws","nlp","sentiment-analysis","splunk","tweets","twitter"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/dmuth.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null},"funding":{"github":"dmuth"}},"created_at":"2018-04-08T21:58:29.000Z","updated_at":"2023-01-18T07:10:44.000Z","dependencies_parsed_at":"2023-01-25T01:45:36.017Z","dependency_job_id":null,"html_url":"https://github.com/dmuth/twitter-aws-comprehend","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dmuth%2Ftwitter-aws-comprehend","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dmuth%2Ftwitter-aws-comprehend/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dmuth%2Ftwitter-aws-comprehend/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dmuth%2Ftwitter-aws-comprehend/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/dmuth","download_url":"https://codeload.github.com/dmuth/twitter-aws-comprehend/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248585248,"owners_count":21128974,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["analyze-tweets","aws","nlp","sentiment-analysis","splunk","tweets","twitter"],"created_at":"2025-04-12T14:46:33.333Z","updated_at":"2025-04-12T14:46:34.096Z","avatar_url":"https://github.com/dmuth.png","language":"Python","funding_links":["https://github.com/sponsors/dmuth"],"categories":[],"sub_categories":[],"readme":"\nNOTE: If you like this, you'll also like \u003ca href=\"https://github.com/dmuth/twitter-sentiment-analysis\"\u003emy next project\u003c/a\u003e, which performs sentiment analysis on Tweets by keyword!\n\n\n# Twitter AWS Comprehend\n\nI recently learned of \u003ca href=\"https://aws.amazon.com/comprehend/\"\u003eAmazon Comprehend\u003c/a\u003e and wanted\nto play around with its sentiment analysis.\n\nSo I built this app to download user timelines from Twitter, send them to AWS for analysis, and visualize them in Splunk.  The following metrics are reported:\n\n- Start and end dates for tweets\n- Number of tweets\n- A graph of \"Sentiment Over Time\"\n- Number of F-bombs used\n- Net Happiness Index (percent of happy tweets minus precent of unhappy tweets)\n- Top Positive and Negative tweets\n\n\n## Screenshots\n\n\u003ca href=\"./img/splunk-twitter-sentiment-dashboard.png\"\u003e\u003cimg src=\"./img/splunk-twitter-sentiment-dashboard.png\" width=\"275\" /\u003e\u003c/a\u003e \u003ca href=\"./img/obama-twitter-sentiment-dashboard.png\"\u003e\u003cimg src=\"./img/obama-twitter-sentiment-dashboard.png\" width=\"275\" /\u003e\u003c/a\u003e  \u003ca href=\"./img/phillies-twitter-sentiment-dashboard.png\"\u003e\u003cimg src=\"./img/phillies-twitter-sentiment-dashboard.png\" width=\"275\" /\u003e\u003c/a\u003e\n\nAdditional screenshots \u003ca href=\"img\"\u003eare available in the img/ directory\u003c/a\u003e.\n\n\n## Requirements\n\n- An AWS Account\n- The \u003ca href=\"https://docs.aws.amazon.com/cli/latest/userguide/installing.html\"\u003eAWS Command Line Interface installed and configured\u003c/a\u003e\n- Python 3\n- Run the command `pip install -r requirements.txt` to download all required packages\n- A Twitter app created at \u003ca href=\"https://apps.twitter.com/\"\u003ehttps://apps.twitter.com/\u003c/a\u003e.  Read-only access is fine.\n- A running Splunk Instance.  A free copy of Splunk can be downlaoded from \u003ca href=\"https://www.splunk.com/\"\u003eSplunk.com\u003c/a\u003e.\n\n\n## Getting started\n\n### Downloading Tweets\n\nYou'll want to start off by running the script **./0-fetch-tweets -u username -n num_tweets_to_download** to download Tweets via Twitter's API.\nWhen you first run the script, it will notice the lack of credentials and send you over to Twitter's App page,\nwhere you'll need to create an app.  Then grab the App Key and App Secret and enter them when the script prompts you.\nNext, you'll be sent over to Twitter one more time and will receive a PIN to enter in the script.  Do so,\nand you'll be authenticated to Twitter.  **This is a one-time process**, so once you do it, you should not need\nto do it again.\n\nThe maximum number of tweets you can download from Twitter's API is **3200**, but the actual number you get will\nbe much lower as RTs are ignored and Twitter's API is really weird about giving you the actual number of tweets that you ask for.  I do not understand it.\n\n\n### Analyizing Tweets\n\nWARNING: **This costs money!**  Based on \u003ca href=\"https://aws.amazon.com/comprehend/pricing/\"\u003eAWS's pricing structure\u003c/a\u003e, a tweet will be treated as 3 \"units\", which will cost you $.0003, or 3 hundredths of a cent to analyze.  So 100 tweets will cost 3 cents, while 1,000 tweets will cost 30 cents.\n\nThe syntax for the script to analyize sentment is **1-analyze-sentiment -u username -n num_tweets [ --fake ]**\n\nI strongly encourage you to run the script with **--fake** on the first few tries so that you can fake calls to AWS and get comfortable running the script.\n\n\n### Feeding the analyzed tweets to Splunk\n\nThe syntax for the script to feed the tweets into Splunk is: **2-ingest-into-splunk -u username [ --splunk-port port ] [ --splunk-host hostname ]**  Defaults are 9997 and localhost, respectively.\n\nThe data is sent to Splunk over a raw TCP connection, so you'll want to configure Splunk accordingly.  Here's a screenshot to help with that:\n\n\u003cimg src=\"./img/splunk-tcp-port.png\" /\u003e\n\nYou'll want to have this source saving to the **main** Index.\n\n\n## Visualization \n\nThis is the most interesting part.  So far, we are making the following assumptions about Splunk:\n- Use of the **main** Index\n- Use of the Sourcetype **twitter**\n- Use of the Splunk app **Search**\n\nAssuming those are the case, you're good to go!  Just copy the file **splunk/twitter_activity_sentiment.xml** into **$SPLUNK_HOME/etc/apps/search/local/data/ui/views**, restart Splunk, and you should be all set!  \n\nAlternatively, a less convoluted way (which does not require restarting Splunk) would be to create a new dashboard, click **Edit**, click **Source**, and paste in the contents of **twitter_activity_sentiment.xml**.\n\n\n## A Word on Idempotency\n\nI am a HUGE fan \u003ca href=\"https://en.wikipedia.org/wiki/Idempotence\"\u003eof Idempotency\u003c/a\u003e.  Especially because\nAWS Comprehend costs money!  Once I analyze a tweet, I never want to analyze it again.  So I made a conscious\nchoice to build my code that way.  So, for example, if a tweet is analyzed and later the script **0-fetch-tweets** is \nrun, that code will not overwrite the sentiement fields.  And once a tweet is analyzed by **1-analyze-sentiemtn**, it will never be analyzed again!\n\nOne place where this does break down is with Slplunk, since the data is fed in through raw TCP and Splunk does not seem to give any acknowledgement (don't know why...), running that script twice will result in duplicate events.  The way around that is to run a Splunk query like **index=main sourcetype=twitter username=dmuth | delete** before re-ingesting any data.  I'm not thrilled with this particular workflow, and am looking at some alternatives.  \n\n\n## Future TODO Items\n\n- ~~Make tweet ingestion idempotent~~\n- ~~See about using Twitter's search API to get older tweets~~ Seriously, Twitter.  Let us get more than 3,200 Tweets through your API!\n- Come up with a metric to measure profanity on an account, not just f-bombs\n- Add \"username\" field to the database schema so we can analyze multiple users at once\n- Dockerize this to download a user's tweets, analyzes them, exports them, then loads up a Splunk instance to ingest them\n\n\n## Contact\n\nI had fun writing this, and I hope you had enjoy using this.  If there are any issues, feel\nfree to file an issue against this project, \u003ca href=\"http://twitter.com/dmuth\"\u003ehit me up on Twitter\u003c/a\u003e\n\u003ca href=\"http://facebook.com/dmuth\"\u003eor Facebook\u003c/a\u003e, or drop me a line: **dmuth AT dmuth DOT org**.\n\n\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdmuth%2Ftwitter-aws-comprehend","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdmuth%2Ftwitter-aws-comprehend","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdmuth%2Ftwitter-aws-comprehend/lists"}