An open API service indexing awesome lists of open source software.

https://github.com/isabeljohnson001/twitter_tweets_data_streaming


https://github.com/isabeljohnson001/twitter_tweets_data_streaming

docker flask kafka mongodb python reactjs spark

Last synced: 3 months ago
JSON representation

Awesome Lists containing this project

README

          

# Twitter Tweets Data Streaming

## Overview
This repository contains the data analysis project for the Britney Spears tweet dataset. It includes scripts for processing and analyzing two TSV files (~50MB and ~500MB) to extract insights about the public discourse around Britney Spears. The purpose of this analysis is to showcase data handling, processing capabilities, and exploratory data analysis techniques as part of the application process for the Data Engineer position at IDI.

## Repository Structure
```
Twitter_Streaming
├── src # Contains source code files for the project
├── app # Contains application logic and server-side scripts
├── config # Holds configuration files for the application
├── datasets # Directory for storing data files
├── jobs # Contains scripts for batch jobs and data processing tasks
├── docker-compose.yml # Docker Compose file for defining and running multi-container Docker applications
├── Dockerfile # Dockerfile for building Docker images
└── requirements # Text file listing dependencies to be installed with pip
├── ui-screen # Holds files for the user interface screen

```
## Getting Started

### Prerequisites
1. Ensure you have Python 3.8+,Docker installed on your machine. Additionally, you will need pip to install the dependencies.

### Installation
1.Read the ![Twitter DataStreaming - Installation](https://github.com/isabeljohnson001/Twitter_Tweets_Data_Streaming/blob/75733e4b57ab6e7059849d0449405c3add4bf52b/Twitter%20Tweets%20Data%20Streaming.pdf) for detailed design and Installation Setup