https://github.com/dhruvp-8/twitter_stream_analysis
Analyzing the Words from Twitter Stream to obtain a categorization of most used words in real-time using Apache Kafka and Apache Storm
https://github.com/dhruvp-8/twitter_stream_analysis
Last synced: 3 months ago
JSON representation
Analyzing the Words from Twitter Stream to obtain a categorization of most used words in real-time using Apache Kafka and Apache Storm
- Host: GitHub
- URL: https://github.com/dhruvp-8/twitter_stream_analysis
- Owner: dhruvp-8
- Created: 2019-08-20T19:50:27.000Z (almost 6 years ago)
- Default Branch: master
- Last Pushed: 2020-10-13T15:29:18.000Z (over 4 years ago)
- Last Synced: 2025-01-10T01:52:04.306Z (5 months ago)
- Language: Java
- Size: 12.7 KB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Twitter Stream Analysis
A Storm Topology to generate a list of popular words used in twitter. Data is ingested from either a storm spout or a kafka spout and processed downstream using Storm Bolts.
## Requirements
- IDE
- Apache Maven 3.x
- JVM 6 or 7## General Info
The source folder is organized into 2 packages i.e. Kafka and Storm. Storm package has two topologies. The KafkaTwitterTopology using a Kafka spout and TwitterWordCountTopology using a Twitter Sample spout. Below is the list of classes:
* com/dhruvrp/Kafka
* KafkaTwitterProducer.java -- A Kafka Producer that publishes twitter data to a kafka broker
* com/dhruvrp/Storm
* TwitterWordCountTopology.java -- A topology which uses the TwitterSampleSpout to get the list of top words in twitter
* KafkaTwitterTopology.java -- A topology which uses the KafkaSpout to get the list of top words in twitter
* TwitterSampleSpout.java -- A spout which uses the twittet4j library to receive twitter data
* StringWordSplitterBolt.java -- A bolt which receives tweets and emits its words which are over a certain length
* IgnoreWordsBolt.java -- A bolt which filters out a predefined set of words
* WordCounterBolt.java -- A bolt which calculates and prints list of popular words over a time interval
* JsonWordSplitterBolt.java -- A bolt which receives tweets and emits its words which are over a certain length