{"id":15056839,"url":"https://github.com/m-molaei/twitter-sentiment-analysis-using-apache-spark-","last_synced_at":"2026-01-01T23:06:25.295Z","repository":{"id":218970072,"uuid":"528875968","full_name":"m-molaei/Twitter-Sentiment-Analysis-using-Apache-Spark-","owner":"m-molaei","description":"Sentiment analysis using deep learning models and FastText embedding on Apache Spark","archived":false,"fork":false,"pushed_at":"2022-08-26T08:41:29.000Z","size":44,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-03-14T07:45:55.974Z","etag":null,"topics":["apache-cassandra","apache-spark","big-data","fasttext","fasttext-embeddings","mongodb","pyspark","rdd","sentiment-analysis","sentiment140-dataset","spark"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/m-molaei.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2022-08-25T13:59:50.000Z","updated_at":"2022-09-19T17:17:59.000Z","dependencies_parsed_at":"2024-01-24T19:44:57.009Z","dependency_job_id":null,"html_url":"https://github.com/m-molaei/Twitter-Sentiment-Analysis-using-Apache-Spark-","commit_stats":null,"previous_names":["m-molaei/twitter-sentiment-analysis-using-apache-spark-"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/m-molaei%2FTwitter-Sentiment-Analysis-using-Apache-Spark-","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/m-molaei%2FTwitter-Sentiment-Analysis-using-Apache-Spark-/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/m-molaei%2FTwitter-Sentiment-Analysis-using-Apache-Spark-/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/m-molaei%2FTwitter-Sentiment-Analysis-using-Apache-Spark-/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/m-molaei","download_url":"https://codeload.github.com/m-molaei/Twitter-Sentiment-Analysis-using-Apache-Spark-/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":243544665,"owners_count":20308168,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["apache-cassandra","apache-spark","big-data","fasttext","fasttext-embeddings","mongodb","pyspark","rdd","sentiment-analysis","sentiment140-dataset","spark"],"created_at":"2024-09-24T21:56:52.256Z","updated_at":"2026-01-01T23:06:25.203Z","avatar_url":"https://github.com/m-molaei.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Twitter-Sentiment-Analysis-using-Apache-Spark\n### Sentiment analysis using deep learning models and FastText embeddings on Apache Spark\n\nI implemented a sentiment analysis model on Twitter using Apache Spark. I used FastText embeddings and deep learning RNN models (LSTM, GRU, and CNN) with Analytics Zoo library. Also, this work included a pre-processing framework based on Dataframe that performs much better than RDD-based architectures in terms of processing time and volume of data that can be processed.\nIn addition, I used MongoDB and Apache Cassandra as this model's databases and compared them to the Apache Spark file storing and retrieving system.\n\nWe also published an article for introducing a Dataframe based pre-processing framework that you can get from here:\nhttps://jad.shahroodut.ac.ir/article_2394.html\n\nI hope this will be useful for you ;)\n\n## Code Explanation\n\n1. Importing libraries (Probably you will need to install some of them such as [`Analytics Zoo`](https://analytics-zoo.readthedocs.io/en/latest/doc/UserGuide/python.html) and [`findspark`](https://github.com/minrk/findspark))\n2. Initialize Apache spark cluster\n3. Import and reading sentiemnt140 dataset with pandas. (You will need to change dataset's path)\n4. Import FastText embeddings with gensim\n5. Pre-processing tweets including cleansing, tokening, padding and vectorizing (This step is implemented in two ways: RDD-based and Dataframe-based)\n6. Configuration of Apache Cassandra and MongoDB on Apache Spark\n7. Sentiment Analysis models\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fm-molaei%2Ftwitter-sentiment-analysis-using-apache-spark-","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fm-molaei%2Ftwitter-sentiment-analysis-using-apache-spark-","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fm-molaei%2Ftwitter-sentiment-analysis-using-apache-spark-/lists"}