An open API service indexing awesome lists of open source software.

https://github.com/undisputed-jay/streaming-data-from-reddit-using-kafka-spark-and-mongodb

A data pipeline that streams Reddit comments from the 'Politics' subreddit using Kafka and Apache Spark. Processed data is stored in MongoDB for real-time analysis and management.
https://github.com/undisputed-jay/streaming-data-from-reddit-using-kafka-spark-and-mongodb

apache-spark big-data data-engineering etl-pipeline kafka mongodb mongodb-atlas pyspark real-time-streaming redditapi streaming-analytics

Last synced: 6 months ago
JSON representation

A data pipeline that streams Reddit comments from the 'Politics' subreddit using Kafka and Apache Spark. Processed data is stored in MongoDB for real-time analysis and management.

Awesome Lists containing this project