Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/moha-cm/youtube-dataharvesting-
YouTube Data Harvesting and Warehousing using SQL, MongoDB and Streamlit
https://github.com/moha-cm/youtube-dataharvesting-
api-integration data-collection data-management maria mongodb mongodb-atlas pymongo python-script sql streamlit
Last synced: 7 days ago
JSON representation
YouTube Data Harvesting and Warehousing using SQL, MongoDB and Streamlit
- Host: GitHub
- URL: https://github.com/moha-cm/youtube-dataharvesting-
- Owner: Moha-cm
- Created: 2023-10-29T01:38:42.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2023-12-02T13:29:24.000Z (12 months ago)
- Last Synced: 2024-01-26T02:11:34.494Z (10 months ago)
- Topics: api-integration, data-collection, data-management, maria, mongodb, mongodb-atlas, pymongo, python-script, sql, streamlit
- Language: Python
- Homepage:
- Size: 46.9 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
Copy code
# YouTube Data Collection and Analysis## Overview
This project is written in Python and is designed to collect data from YouTube channels using their unique channel IDs. The gathered data is then stored in both SQL and MongoDB databases, enabling further analysis. The results of this analysis can be visualized through a Streamlit application.## Process
### 1. Google Cloud Setup
To begin with, you will need to set up a **Google Cloud account**. This includes enabling the necessary extensions and generating API credentials. You can access the Google Cloud Console through the following link: [Google Cloud Console](https://console.cloud.google.com/apis/dashboard?project=skilled-text-400719).### 2. YouTube Data Extraction
Utilize the generated **API key** to extract data from YouTube channels. For detailed information on how to use the functions to retrieve YouTube data, refer to the official documentation: [YouTube Data API Documentation](https://developers.google.com/youtube/v3/docs).### 3. Data Storage
#### Structured Data (SQL)
The structured data is stored in a **SQL database**, providing a tabular and relational format for the collected information. This format is particularly suitable for data that adheres to a well-defined schema.#### Unstructured Data (MongoDB)
Unstructured data, which may not conform to a fixed schema, is stored in **MongoDB Atlas**. MongoDB is a NoSQL database that accommodates flexible and dynamic data structures, making it a suitable choice for diverse or evolving data.### 4. Data Analysis and Visualization
Access the stored data to uncover valuable insights from the YouTube channel. Visualize the data for user-friendly presentation, often using plotting techniques.### 5. Streamlit Application
To interact with and visualize the collected data, run the **Streamlit application**.Please make sure you have the necessary requirements installed, including Python, Streamlit, SQLAlchemy, MongoDB, SQL databases, MySQL, googleapiclient, and Plotly. These dependencies are crucial for the proper functioning of the project.Follow these steps:
1. Download the source files provided for download.
2. After downloading, navigate to the project directory in your terminal.
3. Run the following command to start the application:```bash
streamlit run Home.py
```### Required Python Packages
```
pip install pandaspip install streamlit
python -m pip install pymongo
pip install mysql-connector-python
Pip install sqlalchemy
pip install PyMySQL
pip install isodate
pip install google-api-python-client
pip install plotly
```