Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/kinoute/scraper-allocine
Just playing with BeautifulSoup and Python to scrap some movies on Allocine.fr.
https://github.com/kinoute/scraper-allocine
allocine beautifulsoup csv docker movies postgresql python scraping scraping-websites
Last synced: 4 days ago
JSON representation
Just playing with BeautifulSoup and Python to scrap some movies on Allocine.fr.
- Host: GitHub
- URL: https://github.com/kinoute/scraper-allocine
- Owner: kinoute
- Created: 2020-05-25T21:53:44.000Z (over 4 years ago)
- Default Branch: master
- Last Pushed: 2024-07-06T15:49:04.000Z (4 months ago)
- Last Synced: 2024-07-06T17:01:06.405Z (4 months ago)
- Topics: allocine, beautifulsoup, csv, docker, movies, postgresql, python, scraping, scraping-websites
- Language: Python
- Homepage:
- Size: 48.8 KB
- Stars: 1
- Watchers: 4
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Scraper Allociné
Just a random scraper to retrieve some data about movies listed on Allociné.fr.
The script will save movie data available on the http://www.allocine.fr/films webpage as a `.csv` file and in a postgres database.
## Movies informations scraped
The movie attributes retrieved when available are:
* The movie ID ;
* The title ;
* The release date ;
* The duration ;
* The genre(s) ;
* The director(s) ;
* The main actor(s) ;
* The press rating ;
* The spectators rating ;
* The movie Summary.
## Installation
First, clone the repository:
```bash
git clone [email protected]:kinoute/scraper-allocine.git
```Go to the folder and build the container:
```bash
docker-compose build
# or "make build"
```## Usage
**Important:** First, you have to rename the `.env.dist` template file to `.env`. Then fill it with your own values. At first start, the postgres environment variables will be used to create the postgres server.
By default, the script will:
* Scrap the first 50 pages of Allociné ;
* Save every movie to the postgres database in its own container ;
* Wait 10 seconds between each page scraped ;
* Save the full results in a csv filename called `allocine.csv` in the `files` folder.To run the script with these default options, simply do:
```bash
docker-compose up --build
# or make start
```### Change default options
The script has 3 customizable options that can be changed in the `.env` file:
* **The number of pages to scrap** (Default: 50) ;
* **The time in sec to wait before each page is scraped** (Default: 10) ;
* **The CSV filename where results will be stored** (Default: `allocine.csv`).## Data
The script automatically update and save the results after every page scraped for the `.csv` file. For postgres, the database is updated on every movie scraped.
If for whatever reason, you want to stop the scraping, just do `Ctrl+C` in your Terminal.
## Test
While the scraper is running, you can connect into the postgres container and use `psql` to do any SQL operation by typing `make admin-db` in your project.
You can also simply type `make test-db`. It should return 5 records for the movies table if everything went well.
## Abuse
This script was just made for fun to play around with BeautifulSoup and Python. Please don't use to do bad things and ruin Allociné servers!