{"id":23110345,"url":"https://github.com/melvynator/elk_twitter","last_synced_at":"2025-07-14T18:39:30.126Z","repository":{"id":73381927,"uuid":"101749638","full_name":"melvynator/ELK_twitter","owner":"melvynator","description":"This is a data pipeline for Twitter (ETL) using the elastic stack Elasticsearch, Logstash and Kibana (version 6.1)","archived":false,"fork":false,"pushed_at":"2018-02-19T12:28:56.000Z","size":6914,"stargazers_count":58,"open_issues_count":5,"forks_count":24,"subscribers_count":6,"default_branch":"master","last_synced_at":"2025-04-04T16:06:28.563Z","etag":null,"topics":["data-collection","data-visualization","elasticsearch","elk","elk-stack","kibana","logstash","machine-learning","natural-language-processing","twitter","twitter-api"],"latest_commit_sha":null,"homepage":"","language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/melvynator.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2017-08-29T10:34:23.000Z","updated_at":"2023-09-01T20:28:26.000Z","dependencies_parsed_at":"2023-05-25T07:15:22.046Z","dependency_job_id":null,"html_url":"https://github.com/melvynator/ELK_twitter","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/melvynator/ELK_twitter","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/melvynator%2FELK_twitter","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/melvynator%2FELK_twitter/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/melvynator%2FELK_twitter/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/melvynator%2FELK_twitter/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/melvynator","download_url":"https://codeload.github.com/melvynator/ELK_twitter/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/melvynator%2FELK_twitter/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":265333745,"owners_count":23748862,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-collection","data-visualization","elasticsearch","elk","elk-stack","kibana","logstash","machine-learning","natural-language-processing","twitter","twitter-api"],"created_at":"2024-12-17T01:48:52.001Z","updated_at":"2025-07-14T18:39:30.104Z","avatar_url":"https://github.com/melvynator.png","language":null,"funding_links":[],"categories":[],"sub_categories":[],"readme":"# Out of the box Twitter pipeline using the Elastic stack (ELK)\n\n\n## Contributing\n\nThis repository is fully free and fully open source. The license is Apache 2.0, meaning you are pretty much free to use it however you want in whatever way.\n\nAll contributions are welcome: ideas, pull requests, issues, documentation improvement, complaints.\n\n\n## Summary\n\n#### [+ Introduction](#introduction)\n#### [+ Getting started](#getting-started)\n#### [+ Requirements](#requirements)\n#### [+ Ressources](#ressources)\n\n\n\n\n\n## Introduction\n\nThis repository aims to provide a fully working \"out-of-the-box\" data pipeline for doing Machine learning on Twitter data using the ELK (Elasticsearch, Logstash, and Kibana) stack. \n\nIf you are not familiar with Logstash you may want to follow this [tutorial](https://github.com/melvynator/Logstash_tutorial/blob/master/README.md) first.\n\nAfter having installed ELK you should be able in 5 minutes to visualize dashboard like the following:\n\n\u003cp align=\"center\"\u003e\n   \u003cimg src =\"https://github.com/melvynator/ELK_twitter/blob/master/img/dashboard_visualization.gif\" /\u003e\n\u003c/p\u003e\n\nThe offered pipeline can be modelized by the following flow chart:\n\n![alt text](https://github.com/melvynator/ELK_twitter/blob/master/img/pipeline.png \"Pipeline\")\n\nHere are some slides that present the logstash part of the pipeline: https://www.slideshare.net/hypto/machine-learning-in-a-twitter-etl-using-elk .\n\nLet's have a look to the different part that are covered by this pipeline:\n\n### Concerning the Logstash part\n\n___\n\n#### Input\n\nThe input used is Twitter, you can use it to track users or keywords or tweets in a specific location.\n\n#### Filter\n\nA lot of filters are applied and they are in charge of the following tasks:\n\n* Remove depreciated field\n* Divide the tweet in two or three events (users and tweet)\n* Flatten the JSON\n* Remove the fields not used\n\n#### Output\n\nTwo output are defined:\n\n* Elasticsearch: To allow a better search of your data\n* MongoDB: To store your data\n\n### Concerning the Elasticsearch part\n____\n\n#### Mapping\n\nA mapping is provided and offers the following:\n\n* A parent/child relationship between the tweet author and their tweets\n* On text fields (Tweet content, User description, User location):\n  * 3 Analyzers\n  * Storing of the term vectors (For the 3 analyzers)\n  * Storing of the token numbers (For the 3 analyzers)\n* One geofield to locate the provenance of the tweet (if available)\n* Many \"keyword\", \"integer\" field to all allow data filtering\n\nThe 3 analyzers are:\n1. Standard\n1. English\n1. A custom analyzer that keeps emoticons and punctuations, which is useful for sentimental and emotion analysis\n\nThe mapping is not dynamic, Twitter having a lot of fields that are not (or poorly) documented, it avoid data polution and keep only the wanted data.\n\n### Concerning the Kibana part\n____\n\nOn Kibana side the repository offer:\n\n* A dashboard for general data visualization\n* A dashboard for comparison between a positive and negative tweet\n* Different kind of visualizations\n\n### Machine learning\n____\n\nLogstash make it simple to integrate machine learning model directly into your pipeline using the rest filter. A small \"API\" has been created to give you an idea about how you can use the rest filter in order to \"label\" your tweet on the fly before indexation. You can find this toy API here:\n\nhttps://github.com/melvynator/toy_sentiment_API\n\nThe model is a dummy model but you can easily introduce your own complex model on the form of such API.\n\n## Requirements\n\nFor the pipeline to work, you need a Twitter developer account, which you can obtain here: https://dev.twitter.com/resources/signup\n\n### Linux users\n\nThis guide assumes that you have already installed [Elasticsearch](https://www.elastic.co/guide/en/elasticsearch/reference/current/setup.html), [Logstash](https://www.elastic.co/guide/en/logstash/current/installing-logstash.html) and [Kibana](https://www.elastic.co/guide/en/kibana/current/install.html). All three need to be installed properly in order to use this pipeline.\n\nOnce having installed ELK, here are some instructions to configure Elasticsearch to start automatically when the system boots up.\n\n      sudo /bin/systemctl daemon-reload\n      sudo /bin/systemctl enable elasticsearch.service\n\nElasticsearch can be started and stopped as follows:\n\n      sudo systemctl start elasticsearch.service\n      sudo systemctl stop elasticsearch.service\n\n(Note that the same steps can be used for Kibana and Logstash)\n\n### Mac users\n\n```\nbrew install elasticsearch\nbrew install logstash\nbrew install kibana\n```\n\n## Getting started\n\nClone the repository:\n\n`git clone https://github.com/melvynator/ELK_twitter.git`\n\n### Setting up Elasticsearch\n____\n\nMake sure that you don't have an index `twitter` already present.\n\n### Setting up your Machine Learning API\n____\n\n:warning: **If you don't have the need to make any API call you can skip this part** :warning:\n\n:warning: **If you have your own API you can skip this part** :warning:\n\nDownload the toy API:\n\n`git clone https://github.com/melvynator/toy_sentiment_API`\n\nGo into the main repository and create a virtual environement:\n\n    cd toy_sentiment_API\n    virtualenv -p python3 venv\n    source venv/bin/activate\n\nThen install Flask and Scikit-Learn (For the machine learning)\n\n`pip install -r requirements.txt`\n\nThen you can launch your local server:\n\n    python sentiment_server.py\n\n### Setting up Logstash\n___\n\nTo start configuring your logstash you have to open the configuration file:\n\n`ELK_twitter/src/twitter-pipeline/config/twitter-pipeline.conf`\n\nReplace the `\u003cYOUR-KEY\u003e` by your corresponding twitter key:\n\n\n      consumer_key =\u003e \"\u003cYOUR-KEY\u003e\"\n      consumer_secret =\u003e \"\u003cYOUR-KEY\u003e\"\n      oauth_token =\u003e \"\u003cYOUR-KEY\u003e\"\n      oauth_token_secret =\u003e \"\u003cYOUR-KEY\u003e\"\n\nNow go into `twitter-pipeline`:\n\n`cd ../src/twitter-pipeline`\n\nMake sure that Elasticsearch is started and run on the port `9200`.\n\nIn addition, you also have to manually install the following plugins for Logstash:\n\n:warning: **If you don't have the need to make any API call you don't have to install the REST Plugin** :warning:\n\n:warning: **If you don't want to use mongoDB you don't have to install the MongoDB Plugin** :warning:\n\n1. [MongoDB](https://github.com/logstash-plugins/logstash-output-mongodb) for Logstash (Allow you to store your data into mongoDB)\n`sudo /usr/share/logstash/bin/logstash-plugin install logstash-output-mongodb`\n2. [REST](https://github.com/lucashenning/logstash-filter-rest) for Logstash (Allow you to make API call)\n`sudo /usr/share/logstash/bin/logstash-plugin install logstash-filter-rest`\n\n\n:warning: **By default, the pipeline is only configured to output to Elasticsearch**, but if you have MongoDB installed, then you can uncomment the mongo output in the config file:\n`ELK_twitter/src/twitter-pipeline/config/twitter-pipeline.conf`\n\n:warning: **By default, the pipeline is not configured to make API call**, if you have an API you can uncomment the `rest` filter in the config file:\n`ELK_twitter/src/twitter-pipeline/config/twitter-pipeline.conf`\n\nDon't forget to specify your own endpoint and data.\n\nThen, you can run the pipeline using:\n\n`sudo /usr/share/logstash/bin/logstash -f config/twitter-pipeline.conf`\n\nOr define logstash in your `SYSTEM_PATH` and run the following:\n\n`logstash -f config/twitter-pipeline.conf`\n\nYou should see some logs that end up with:\n\n`Successfully started Logstash sentiment_service endpoint {:port=\u003e9600}`\n\n### Setting up Kibana\n___\n\nNow go to Kibana: http://localhost:5601/\n\n*Management =\u003e Index Patterns =\u003e Create Index Pattern*\n\nInto the text box `Index name or pattern` type: `twitter`\n\nInto the drop down box `Time Filter field name` choose: `inserted_in_es_at`\n\nClick on create\n\nNow go to:\n\n*Management =\u003e Saved Objects =\u003e import*\n\nAnd select the file in:\n\n`ELK_twitter/src/twitter-pipeline/kibana-visualization/kibana_charts.json`\n\nYou can now go to *Dashboard*\n\nThis gif summarize the different step if you are lost.\n\n![alt text](https://github.com/melvynator/ELK_twitter/blob/master/img/kibana_config.gif \"Summary\")\n\n\n\n## Ressources\n\nThanks to stackoverflow community and Elastic community for the answer provided.\n\nhttps://www.elastic.co/guide/en/logstash/current/introduction.html\nhttps://www.elastic.co/guide/en/elasticsearch/reference/current/index.html\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmelvynator%2Felk_twitter","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmelvynator%2Felk_twitter","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmelvynator%2Felk_twitter/lists"}