Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/damiieibikun/web-scrapping-and-python-data-visualization-on-top-500-movies-imdb
Web Scrapping and Python Data visualization on Top 500 movies IMDb
https://github.com/damiieibikun/web-scrapping-and-python-data-visualization-on-top-500-movies-imdb
beautifulsoup4 data-analysis data-visualization matplotlib-pyplot numpy pandas plotly-express python requests seaborn web-scraping
Last synced: 7 days ago
JSON representation
Web Scrapping and Python Data visualization on Top 500 movies IMDb
- Host: GitHub
- URL: https://github.com/damiieibikun/web-scrapping-and-python-data-visualization-on-top-500-movies-imdb
- Owner: Damiieibikun
- Created: 2023-07-17T12:34:30.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2023-09-08T20:52:50.000Z (over 1 year ago)
- Last Synced: 2024-11-15T03:33:14.488Z (2 months ago)
- Topics: beautifulsoup4, data-analysis, data-visualization, matplotlib-pyplot, numpy, pandas, plotly-express, python, requests, seaborn, web-scraping
- Language: Jupyter Notebook
- Homepage:
- Size: 8.39 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Web Scrapping and Python Data visualization on Top 500 Movies IMDb
A Python visualization project on analysing the top 500+ movies of all time from IMDb website### Introduction ###
The aim of this project was to gather valuable information from movie websites like IMDb by extracting data through web scraping techniques. IMDb stands for the Internet Movie Database. It is an online database that provides information related to movies, television shows, actors, directors, producers, and other film industry professionals. It serves as a valuable resource for movie enthusiasts, professionals in the entertainment industry, and anyone looking to explore information about films and TV shows. Through data analysis and visualization of the scrapped data, we can begin to unravel hidden patterns and unveil intriguing insights based on features of a movie such as ratings, genre, director's information, release date of films, year of production, etc.### Methodology Approach ###
The methodology adopted during the course of this project includes:
* Data collected through [IMDb Website](https://www.imdb.com/list/ls062911411/?st_dt=&mode=detail&).
* Descriptive analysis
* Exploratory Data Analysis using Python visualization tools to gain insights into the data and identify any patterns or trends.
### Dependencies ###
* Data collection libraries
* BeautifulSoup4
* Requests
* Data Wrangling and processing Libraries
* Pandas
* Numpy
* Tqdm
* Visualization Libraries
* Matplotlib
* Seaborn
* Wordcloud
* Plotly express### Deliverables ###
* Comprehensive Jupyter notebooks containing codes and graphs### Findings/Results ###
Through analysis of the data, the following can be interpreted.
* The length of the film did not ensure positive reviews.
* Similarly, the film's box office earnings did not ensure positive reviews.
* There was a disparity between the preferences of average IMDb viewers and the criteria that critics use to judge a film's greatness.
* Crime dramas and dramas were likely to receive high ratings from IMDb users and Metacritic users.
* Movies directed by Steven Spielberg, Alfred Hitchcock, Stanley Kubrick, and Martin Scorsese were very likely to get excellent reviews.