Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/petrsevcik/petrfd

PetrFD is film database of top 1000 movies and its actors from http://www.csfd.cz
https://github.com/petrsevcik/petrfd

csfd docker fastapi movie-database python3 sqlite3

Last synced: 6 days ago
JSON representation

PetrFD is film database of top 1000 movies and its actors from http://www.csfd.cz

Host: GitHub
URL: https://github.com/petrsevcik/petrfd
Owner: petrsevcik
Created: 2022-06-17T21:47:40.000Z (over 2 years ago)
Default Branch: main
Last Pushed: 2022-06-17T21:53:57.000Z (over 2 years ago)
Last Synced: 2024-11-07T22:32:25.575Z (about 2 months ago)
Topics: csfd, docker, fastapi, movie-database, python3, sqlite3
Language: Python
Homepage:
Size: 1.62 MB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# README #

### PetrFD is film database of top 1000 movies and its actors from http://www.csfd.cz

#### Search page ####

Welcome page with input field for search in movie and actors

When you click on "Search" button you will be redirected to Search result page

`http://localhost:8080/`

`http://localhost:8080/search`

#### Search result page ####
Page with search results for Actors and Movies. Contains also link to search homepage.

List of Actors and Movies and links to `/actor/{name}` and `/movie/{name}` endpoints

`http://localhost:8080/searchresultpage`

#### Actor Page ####
Page contains movies where actor star

`http://localhost:8080/actor/{name}`

e.g. `http://localhost:8080/actor/Tom%20Hanks`

#### Movie Page ####
Page contains list of all actors in the movie

`http://localhost:8080/movie/{name}`

e.g. `http://localhost:8080/movie/Forrest%20Gump`

## Architecture ##

App is build on `FASTAPI` framework with four endpoints described above. Scrapers of `http://csfd.cz` content are in `csfd_scraper.py` file. Packages used `aiohttp`, `requests`, `beautifulsoup`. Top 1000 movies is scraped asynchronously and then each movie is scraped in single thread to avoid blocking/DDoS attack. Possible TODO - rewrite it to scrapy or make request in batches via `aiohttp`. Results from scraping are stored in `actors.csv / movies.csv` files where from they are loaded to `sqlite3` database named `petrfd.db`.

**DATA ARE ALREADY IN `petrfd.db` READY TO USE**

File `database.py` contains all methods for loading movies and actors to db as well as for searching in db according to input. Actors and Movies are not changing on CSFD often so I commented out running scraper and loading data to db. Uncomment if you want fresh data.

#### Flaws of app ####
Data extraction can be done in better&faster (asynchronous) way - with scrapy as well as data processing does not require middle step via `.csv` files. But I focus more on design FastAPI app its endpoints and sqlite query. Nature of this data is that they cannot be refreshed often.

### All is ready to use via Dockerfile
#### Commands for docker startup
App run on **port 8080**. Run commands, when current location is repository:
```
docker build -t petrfd:1.0 .
docker run -d -p 8080:8080 petrfd:1.0
```