{"id":29253464,"url":"https://github.com/dj623/twitter-sentiment-analysis","last_synced_at":"2026-05-08T07:34:40.542Z","repository":{"id":302691483,"uuid":"1013322735","full_name":"DJ623/Twitter-Sentiment-Analysis","owner":"DJ623","description":"A mini project for analyzing, classifying, and visualizing the sentiments of tweets","archived":false,"fork":false,"pushed_at":"2025-07-03T18:21:07.000Z","size":1098,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-07-03T18:51:44.661Z","etag":null,"topics":["hadoop-hdfs","kafka","mongodb","python","spark","streamlit"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/DJ623.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-07-03T17:50:28.000Z","updated_at":"2025-07-03T18:25:35.000Z","dependencies_parsed_at":"2025-07-03T18:51:48.588Z","dependency_job_id":"88b40434-2b56-4f63-985d-96c791726333","html_url":"https://github.com/DJ623/Twitter-Sentiment-Analysis","commit_stats":null,"previous_names":["dj623/twitter-sentiment-analysis"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/DJ623/Twitter-Sentiment-Analysis","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DJ623%2FTwitter-Sentiment-Analysis","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DJ623%2FTwitter-Sentiment-Analysis/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DJ623%2FTwitter-Sentiment-Analysis/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DJ623%2FTwitter-Sentiment-Analysis/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/DJ623","download_url":"https://codeload.github.com/DJ623/Twitter-Sentiment-Analysis/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DJ623%2FTwitter-Sentiment-Analysis/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":263432279,"owners_count":23465575,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["hadoop-hdfs","kafka","mongodb","python","spark","streamlit"],"created_at":"2025-07-04T02:02:37.548Z","updated_at":"2026-05-08T07:34:40.255Z","avatar_url":"https://github.com/DJ623.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Twitter-Sentiment-Analysis\nA mini project for analyzing, classifying, and visualizing the sentiments of tweets\n\n## **Overview :** \n\nThis project features a real-time Twitter (now X) sentiment analysis model designed for scalability and large-scale data processing using technologies like Apache Spark for model training, Hadoop Distributed File System (HDFS) for storing the model to ensure it is available to different services at the same time, Kafka for fetching the tweets data, MongoDB to store the processed data, Streamlit to design and deploy the dashboard to view the analysis, and Docker to containerize all the services to ensure loose coupling of the system for scalability, availability, and fault tolerance of the system.\n\n## **Architecture**\n\n![Architecture diagram](https://github.com/DJ623/Twitter-Sentiment-Analysis/blob/main/Architecture%20and%20dashboard%20of%20the%20model%20/Architecture.png)\n\n\n## **Prerequisites**\n\n1. **Hadoop installed** (version 3.4.0 on M1 Mac)  \n2. **Spark installed** (version 3.5.1 )  \n3. **Python 3.9.6** (PySpark requires Python)  \n4. **Java 8 or later** (required by Spark)  \n5. **Hadoop configurations** should be correctly set up for HDFS and YARN.  \n6. **Before starting** the dashboard, make sure to upload the model.pkl file from /model directory to hdfs at /models path  \n7. **Install** requirements.txt\n\n## **Environment Details :** \n\n1. Machine: Macbook Air M1  \n2. Architecture : ARM-64  \n3. Operating System : macOS (Unix Based)\n\n## **Deployment :**\n\n1. **Install requirements.txt:** \n\n\tMake sure you are in the project’s root directory  \n    In terminal:  \n    `pip install -r requirements.txt`  \n\t\n\n2. **Start the MongoDB server (ignore this setup if already installed):**  \n   1. If not downloaded , download from [https://www.mongodb.com/try/download/community](https://www.mongodb.com/try/download/community).   \n   2. Note : The following setup and installation is **only for Mac ARM-based architecture.**  \n   3. Set up and add the mongodb/bin path to global environment of the shell profile for better accessibility and make sure to give proper permissions using sudo.  \n   4. Create a data directory in HOME\\_DIRECTORY and inside data, create a directory named db.  \n   5. In terminal : `sudo mongod –dbpath=HOME_DIRECTORY/path/to/data`\n\n \n\n\n3. **Start Hadoop:**  \n   In terminal: \n   `start-all.sh`\n\n4. **Check if all services of Hadoop are started (Namenode, Datanode, ResourceManager, SecondaryNameNode, NodeManager, Jps):**   \n   In terminal: `jps`\n\n    This shows that Hadoop has started successfully, and it is in pseudo-distributed mode.\n\n5. **Start and run Docker Compose for Kafka and Zookeeper:**  \n   **Make sure docker is up and running**  \n   In terminal (root directory of project):   \n   `docker compose up -d`  \n   `docker ps`\n\n6. **Create a topic name twitter in Kafka (run commands in the Kafka shell):**  \n   1. Make use of the Docker GUI to access the terminal of Kafka or use the following command: `docker exec -it \u003ckafka-container-id\\\u003e /bin/bash`  \n   2. In Kafka shell, run the following:  \n        \n      `kafka-topics --create --topic twitter --bootstrap-server localhost:9092`  \n         \n      `kafka-topics --describe --topic twitter --bootstrap-server localhost:9092`\n\n7. **Create a new directory named “models” on HDFS:**   \n   In terminal: \n    `hdfs dfs -mkdir -p /models`\n\n    You can check the directory using the web interface by visiting [http://localhost:9870](http://localhost:9870) (default port for hadoop configured in core-site.xml)\n\n8. **Upload the model.pkl file located in the model directory of the project to the hdfs directory created in the previous step:**  \n   In terminal:   \n   `cd model`  \n    `hdfs dfs -put /path/to/local/file.pkl  /path/of/hdfs/directory`\n  \n\n    Example :  \n    `hdfs dfs -put model.pkl /models`\n\n9. **Run the producer.py and consumer.py programs located in the kafka directory :**   \n   In terminal :   \n   `cd kafka`   \n   `python3 producer.py`  \n     \n   In a new terminal window :   \n   `python3 consumer.py`\n\n10. **Run the dashboard.py located in the dashboard directory using streamlit command :**  \n    **This will start the dashboard in your browser.**   \n    In terminal :   \n    `cd dashboard`   \n    `streamlit run dashboard.py`   \n      \n    **The dashboard is refreshed automatically every 10 seconds to update itself based on new incoming data.**\n   \n***  \n\n## **For training the model and running everything from scratch :** \n\nNOTE : We have already provided the trained model file as model.pkl file in /model directory. To skip the training part, upload this model. pkl file to hdfs (Done in Previous Section)\n\n1. **Download the dataset from Kaggle: [https://www.kaggle.com/datasets/jp797498e/twitter-entity-sentiment-analysis?select=twitter\\_training.csv](https://www.kaggle.com/datasets/jp797498e/twitter-entity-sentiment-analysis?select=twitter_training.csv)**  \n     \n2. **The dataset contains two files named twitter\\_training.csv and twitter\\_validation.csv. Both the files need to be uploaded to HDFS before training the model (twitter\\_classification.py).**  \n   1. **Make sure Hadoop is up and running : `jps`**   \n   2. **In terminal (where both the files of the dataset are located), run the following :**\n\n   `hdfs dfs -mkdir -p /data` \n\n   `hdfs dfs -put twitter_training.csv /data`\n\n   `hdfs dfs -put twitter_validation.csv /data`\n\n   \n\n3. **Run the twitter\\_classification.py file to train the model:**  \n   1. Change the directory to model in the projects’ root directory.  \n   2. In terminal : `python3 text_classification.py`  \n   3. The model will be trained, and a new directory named /models on HDFS will be created, which will include the trained model.pkl file.\n\n\t\t\n\n4. **Follow steps 5 to 10 in the previous section of Deployment.**  \n   **Happy running\\!**\n\n\n## **Dashboard**\n![Dashboard 1](https://github.com/DJ623/Twitter-Sentiment-Analysis/blob/main/Architecture%20and%20dashboard%20of%20the%20model%20/dashboard_01.png)\n\n![Dashboard 2](https://github.com/DJ623/Twitter-Sentiment-Analysis/blob/main/Architecture%20and%20dashboard%20of%20the%20model%20/dashboard_02.png)\n\n![Dashboard 3](https://github.com/DJ623/Twitter-Sentiment-Analysis/blob/main/Architecture%20and%20dashboard%20of%20the%20model%20/dashboard_03.png)\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdj623%2Ftwitter-sentiment-analysis","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdj623%2Ftwitter-sentiment-analysis","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdj623%2Ftwitter-sentiment-analysis/lists"}