Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/msthamizh/youtube-data-harvesting-and-warehousing
Developing a streamlit application enabling users to explore and analyze the data from various Youtube channels
https://github.com/msthamizh/youtube-data-harvesting-and-warehousing
googleapi mongodb mysql python streamlit youtube-api
Last synced: about 1 month ago
JSON representation
Developing a streamlit application enabling users to explore and analyze the data from various Youtube channels
- Host: GitHub
- URL: https://github.com/msthamizh/youtube-data-harvesting-and-warehousing
- Owner: MSThamizh
- Created: 2024-02-28T14:27:21.000Z (10 months ago)
- Default Branch: main
- Last Pushed: 2024-03-22T10:54:21.000Z (10 months ago)
- Last Synced: 2024-03-22T11:57:28.018Z (10 months ago)
- Topics: googleapi, mongodb, mysql, python, streamlit, youtube-api
- Language: Python
- Homepage:
- Size: 51.8 KB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# YouTube Data Harvesting and Warehousing using SQL & Streamlit
## Problem Statement
The goal of this project is to develop a Streamlit application that enables users to access and analyze data from multiple YouTube channels efficiently. The application should provide the following features:
- **Data Retrieval**: Allow users to input a YouTube channel ID and retrieve relevant data such as channel name, subscribers, total video count, playlist ID, video ID, likes, dislikes, and comments for each video using the Google API.
- **Data Collection**: Provide users with the ability to collect data for the YouTube channels simultaneously and store them in a data lake by clicking a button.
- **Data Storage**: Offer options to store the collected data in MySQL database for efficient storage and retrieval.
- **Data Search and Retrieval**: Allow users to search and retrieve data from the SQL database using different search options.## Workflow
The workflow of this project can be summarized as follows:
1. **YouTube API Data Retrieval**: Utilize the YouTube API to fetch data such as video metadata, statistics, and comments. Requires a valid API key from the Google Developer Console.
2. **Storing Data in MongoDB**: Interacts with MongoDB and Establishes a connection to a MongoDB database and stores the retrieved data (Temporary Storage) in a suitable format.
3. **Transfer Data to SQL**: Extract data from MongoDB collections, facilitating smooth conversion into a DataFrame. Converts the DataFrame into SQL tables, ensuring seamless data transformation.
4. **Streamlit Visualization**: Utilize Streamlit to build an interactive web application for visualizing and exploring the data.## Technologies Used
- **Python**: Main programming language used for scripting and development.
- **YouTube API**: Used for retrieving data from YouTube.
- **MongoDB**: NoSQL database used for storing raw data.
- **MySQL**: Relational database used for data transformation and warehousing.
- **Streamlit**: Python library for building interactive web applications.## References
- **Python**: [https://docs.python.org/3/](https://docs.python.org/3/)
- **Youtube API**: [https://developers.google.com/youtube](https://developers.google.com/youtube/v3/getting-started)
- **MongoDB Documentation**: [https://www.mongodb.com/](https://www.mongodb.com/)
- **MySQL Documentation**: [https://www.mysql.com/](https://www.mysql.com/)
- **Streamlit Documentation**: [https://docs.streamlit.io/library/api-reference](https://docs.streamlit.io/library/api-reference)