Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/arosas17/movies-etl.
https://github.com/arosas17/movies-etl.
Last synced: 1 day ago
JSON representation
- Host: GitHub
- URL: https://github.com/arosas17/movies-etl.
- Owner: arosas17
- Created: 2022-08-11T19:33:10.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2022-08-24T18:20:10.000Z (about 2 years ago)
- Last Synced: 2023-12-24T08:30:57.545Z (11 months ago)
- Language: Jupyter Notebook
- Size: 2.29 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Movies-ETL.
The general purpose of this assignment is to create a code that will take three files, organize it, and send it to a database. This code was built in four separate files, each one was added a little bit more code until the final code was made. It was also meant to be reusable so that if another set of ratings and list of movies were to be given, the data can be cleaned and usable in a short amount of time.
To clean the data, regular expression was used to detect data structured in particular manners and clean them. Some data was discarded, determining that the data would be difficult to fix and would add very little to the total data sets. After the data was cleaned to an acceptable level, the data is then sent to a database where it will reside as a table.
### Movies and Ratings Row Count
![movies_query.png](/Images/movies_query.png) ![ratings_query.png](/Images/ratings_query.png)These tables are screenshots of the number of rows that were obtained after exporting to a database, the movies row count on the left and the ratings on the right. These counts suggest that everything was transferred correctly.
Two resource files are too large to attach.