{"id":14483046,"url":"https://github.com/Tazeg/dbtweets","last_synced_at":"2025-08-30T03:32:53.886Z","repository":{"id":77736079,"uuid":"160978405","full_name":"Tazeg/dbtweets","owner":"Tazeg","description":"Get a lot of tweets on a specific topic.","archived":false,"fork":false,"pushed_at":"2020-10-29T09:19:34.000Z","size":96,"stargazers_count":4,"open_issues_count":0,"forks_count":0,"subscribers_count":3,"default_branch":"master","last_synced_at":"2024-07-14T14:32:18.439Z","etag":null,"topics":["docker","docker-compose","dockerfile","neo4j","redis","twitter","twitter-api"],"latest_commit_sha":null,"homepage":"","language":"PHP","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Tazeg.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null},"funding":{"github":["Tazeg"],"custom":["https://keybase.io/jeffprod"]}},"created_at":"2018-12-08T21:04:00.000Z","updated_at":"2024-07-14T14:32:30.825Z","dependencies_parsed_at":"2023-03-05T23:45:44.350Z","dependency_job_id":null,"html_url":"https://github.com/Tazeg/dbtweets","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Tazeg%2Fdbtweets","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Tazeg%2Fdbtweets/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Tazeg%2Fdbtweets/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Tazeg%2Fdbtweets/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Tazeg","download_url":"https://codeload.github.com/Tazeg/dbtweets/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":217593012,"owners_count":16201561,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["docker","docker-compose","dockerfile","neo4j","redis","twitter","twitter-api"],"created_at":"2024-09-03T00:01:27.686Z","updated_at":"2024-09-03T00:04:43.432Z","avatar_url":"https://github.com/Tazeg.png","language":"PHP","funding_links":["https://github.com/sponsors/Tazeg","https://keybase.io/jeffprod"],"categories":["PHP"],"sub_categories":[],"readme":"### DB Tweets\n\nOr how to vacuum a lot of tweets on a specific topic.\n\nIt uses both Twitter SEARCH (recursive intervals) and STREAM API.\nTweets are finally inserted into a Neo4j database that you can query in real time.\n\nPlease, read and agree the [Twitter Developer Agreement and Policy](https://developer.twitter.com/en/developer-terms/agreement-and-policy.html)\n\n![Schéma](dbtweets.png)\n\n### Requirement\n\n- [Docker](https://docs.docker.com/install/)\n- [Docker Compose](https://docs.docker.com/compose/)\n\n### Install\n\n```\ngit clone https://github.com/Tazeg/dbtweets.git\ncd dbtweets\n```\n\n- Create a local directory for the Neo4j database :\n\n```\nmkdir /home/user/neo4j\n```\n\n- Rename `docker/twitter/src/.env.sample` to `.env` and edit it.\n- Update the Neo4j volume path in `docker-compose.yml` according to the directory you created on the previous command\n\n\n### Run\n\n```\ndocker-compose rm\ndocker-compose build\ndocker-compose up\n(or \"docker-compose up -d\" for daemon mode)\n```\n\nGo to :\n\n- http://localhost:7474 for direct access to the Database (login/pass : neo4j/123456)\n\n### Sample Neo4j queries\n\nYou just have to copy/paste into the Neo4j dashboard :\n\nHow many users in database :\n```\nMATCH (n:User) RETURN COUNT(n)\n```\n\nHow many tweets :\n```\nMATCH (n:Tweet) RETURN COUNT(n)\n```\n\nTweets count by date :\n```\nMATCH (t:Tweet)\nRETURN t.created_at_YMD, count(*) AS nb\nORDER BY t.created_at_YMD\n```\n\nApp used to post tweets :\n```\nMATCH (u:User)-[:POSTS|RETWEETS]-\u003e(t:Tweet) \nWITH COUNT(t.source) AS nb, t.source AS source \nRETURN source, nb ORDER By nb DESC LIMIT 8\n```\n\nTweets having coordinates :\n```\nMATCH (u:User)-[:POSTS]-\u003e(t:Tweet) \nWHERE t.latitude\u003c\u003e0 AND t.longitude\u003c\u003e0 \nRETURN u,t\n```\n\nSearching strings in tweets :\n```\nMATCH (u:User)-[]-\u003e(t:Tweet) \nWHERE toLower(t.text) CONTAINS 'car'\nRETURN t.created_at_YMD, t.created_at_HIS, u.screen_name,t.text\nORDER BY t.created_at_YMD DESC, t.created_at_HIS DESC\nLIMIT 20\n```\n\nHashtags and counts :\n```\nMATCH (h:Hashtag)\u003c-[:TAGS]-(t:Tweet) \nWITH h, COUNT(h) AS nb\nORDER BY nb DESC\nRETURN h.text AS text, nb \nLIMIT 15\n```\n\nMost shared links :\n```\nMATCH (t:Tweet)-[r:CONTAINS]-\u003e(l:Link) \nWITH l.url AS url,COUNT(r) AS nb \nWHERE nb\u003e1 \nRETURN url,nb \nORDER BY nb DESC\n```\n\nMost shared medias :\n```\nMATCH (t:Tweet) \nWHERE t.media_url\u003c\u003e\"\" \nRETURN t.media_url AS media, COUNT(t.media_url) AS nb \nORDER BY nb DESC\n```\n\nMost retweeted tweets :\n```\nMATCH (u:User)-[:POSTS]-\u003e(t:Tweet) \nRETURN 'https://twitter.com/' + u.screen_name + '/status/' + t.id_str AS tweeturl,t.text,t.retweet_count\nORDER BY t.retweet_count DESC LIMIT 100\n```\n\nTop languages of tweets :\n```\nMATCH (t:Tweet) \nWHERE t.lang\u003c\u003e\"und\" \nRETURN t.lang AS lng, COUNT(t.lang) AS nb \nLIMIT 10\n```\n\nMost active users :\n```\nMATCH (u:User)-[r:POSTS|RETWEETS]-\u003e(t:Tweet)\nRETURN u.screen_name AS screen_name,\n       COUNT(r) AS tweet_or_rt_count,\n       u.friends_count AS friends_count,\n       u.followers_count AS followers_count\nORDER BY tweet_or_rt_count DESC\nLIMIT 15\n```\n\nMost mentionned users :\n```\nMATCH (t:Tweet)-[:MENTIONS]-\u003e(u:User)\nWHERE NOT t.text=~(\"(?i)^RT @\"+u.screen_name+\".*\")\nRETURN  u.screen_name AS screen_name,\n        COUNT(u.screen_name) AS count,\n        u.friends_count AS friends_count,\n        u.followers_count AS followers_count\nORDER BY count DESC\nLIMIT 15\n```\n\nUsers who RT the most :\n```\nMATCH (u:User)-[r:RETWEETS]-\u003e(t:Tweet)\nRETURN u.screen_name AS screen_name,\n       COUNT(r) AS nbRT\nORDER BY nbRT DESC       \nLIMIT 15\n```\n\nList of RT from a given user sorted by date :\n```\nMATCH (u1:User)-[r:RETWEETS]-\u003e(t:Tweet)\u003c-[:POSTS]-(u2:User)\nWHERE u1.id_str='123456789' \nRETURN t.created_at_YMD + ' ' + t.created_at_HIS AS dateTweet, r.created_at_YMD + ' ' + r.created_at_HIS AS dateRT, u2.screen_name, t.text\nORDER BY t.created_at_YMD DESC, t.created_at_HIS DESC\nLIMIT 15\n```\n\nList of the very first retweeters of a given tweet :\n```\nMATCH (u:User)-[r:RETWEETS]-\u003e(t:Tweet {id_str: '123456789'})\nRETURN t.created_at_YMD + ' ' + t.created_at_HIS AS dateTweet, r.created_at_YMD + ' ' + r.created_at_HIS AS dateRT,\n\tduration.between(datetime(t.created_at_YMD + 'T' + t.created_at_HIS), datetime(r.created_at_YMD + 'T' + r.created_at_HIS)).seconds AS seconds,\n\tu.screen_name AS retweeter\nORDER BY seconds\nLIMIT 15\n```\n\n### Stop\n\n```\nCTRL+C\n(or \"docker-compose stop\" if daemon)\n```\n\n### Donate\n\nhttps://en.jeffprod.com/donate/\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FTazeg%2Fdbtweets","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FTazeg%2Fdbtweets","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FTazeg%2Fdbtweets/lists"}