https://github.com/shibam120302/youtube-channel-videos-scraper
https://github.com/shibam120302/youtube-channel-videos-scraper
Last synced: 7 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/shibam120302/youtube-channel-videos-scraper
- Owner: shibam120302
- License: apache-2.0
- Created: 2022-11-12T18:01:20.000Z (almost 3 years ago)
- Default Branch: main
- Last Pushed: 2022-11-12T18:12:30.000Z (almost 3 years ago)
- Last Synced: 2025-01-21T17:50:40.072Z (9 months ago)
- Language: Jupyter Notebook
- Size: 11.7 KB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Youtube Channel Videos Scraper BY-SHIBAM ❤

## Table of Contents
* [About the Project](#about-the-project)
* [Tasks](#tasks)
* [Built With](#built-with)
* [Fork the Repo and Contribute](#Fork-the-Repo-and-Contribute)
* [Contact](#contact)## About the Project
In this [`Webscraping Project`](https://github.com/shibam120302/Youtube-Channel-Videos-Scraper) Jupyter notebook, we scrape the Wikipedia pages for Disney movies to create a Disney Movies dataset. We scrape data like `Title`, `Directed by`, `Produced by`, `Written by`, `Narrated by`, `Music by`, `Cinematography`, `Edited by`, `Production company`, `Distributed by`, `Release date`, `Running time`, `Country`, `Language` from Wikipedia. We also work with OMDb API to get `imdb`, `metascore`, `rotten_tomatoes` data. The data is stored as JSON and CSV and intermediately using Pickle library in Python.

### Tasks
* Task 1: Scrape info box from Toy Story 3 Wiki page and save in python dictionary.
* Task 2: Scrape info box for all Disney movies and save in list of python dictionaries.
* Task 3: Clean the data!
- Strip out all references ([1], [2], etc)
- Split up long strings
- Convert 'Running time' field to integer
- Convert 'Budget' and 'Box office' fields to floats
- Convert dates to datetime objects
- Save data using Pickle
* Task 4: Attach IMDb, Rotten Tomatoes, Metascores to dataset using OMDb API.
* Task 5: Save final dataset as JSON and CSV files.### Built With
* Jupyter Notebook
* Beautiful Soup
* Requests
* Pickle
* Pandas## Fork the Repo and Contribute
Contributions are what make the open source community such an amazing place to be learn, inspire, and create. Any contributions you make are **greatly appreciated**.
1. Fork the Project (click on `Fork` in the top-left corner)
2. Create your Feature Branch (`git checkout -b feature`)
3. Commit your Changes (`git commit -m 'Add some AmazingFeature'`)
4. Push to the Branch (`git push origin feature`)
5. Open a Pull Request## Contact
### SHIBAM NATH ❤❤
* [LinkedIn](https://www.linkedin.com/in/shibam-nath-0a23a6227/)