https://github.com/medartus/airbnb-paris
📊🛏 Prediction model to control apartment rental flows on the Airbnb platform for Paris City Hall
https://github.com/medartus/airbnb-paris
airbnb insideairbnb paris rental-analytics
Last synced: 7 months ago
JSON representation
📊🛏 Prediction model to control apartment rental flows on the Airbnb platform for Paris City Hall
- Host: GitHub
- URL: https://github.com/medartus/airbnb-paris
- Owner: medartus
- Created: 2020-10-05T11:09:04.000Z (over 5 years ago)
- Default Branch: main
- Last Pushed: 2021-01-23T15:04:49.000Z (over 5 years ago)
- Last Synced: 2024-12-27T17:42:30.578Z (over 1 year ago)
- Topics: airbnb, insideairbnb, paris, rental-analytics
- Language: Jupyter Notebook
- Homepage:
- Size: 442 KB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Airbnb - Paris
## Getting Started
1. Install Python modules
```
pip install -r requirements.txt
```
2. Create a `dev.env` file in the **root folder** with the following content:
```
POSTGRESQL_HOST=
POSTGRESQL_USER=
POSTGRESQL_PASSWORD=
POSTGRESQL_DATABASE=
DATASETS_FOLDER_PATH=
```
To access and visualize the database, you can use [pgAdmin](https://www.pgadmin.org/download/).
## Organization of Files and Folders
- **datasets**: Regroup all datasets files
- **datasets/listings**: Regroup listings datasets
- **datasets/reviews**: Regroup reviews datasets
- **datasets/calendar**: Regroup calendars datasets
- **tests**: All the tests files
- **notebook**: All the files containing ideas to be implemented
## Getting Started
1. Create a table `calendars` on your database :
``` SQL
CREATE TABLE public.calendars
(
cal_key serial,
listing_id integer,
available text COLLATE pg_catalog."default",
start_date date,
end_date date,
num_day integer,
minimum_nights double precision,
maximum_nights double precision,
label text COLLATE pg_catalog."default",
validation boolean DEFAULT false,
proba double precision,
ext_validation double precision DEFAULT 0.0,
CONSTRAINT calendars_pkey PRIMARY KEY (cal_key)
)
```
2. Create a table `listings` on your database :
``` SQL
CREATE TABLE public.listings
(
id integer NOT NULL,
listing_url text COLLATE pg_catalog."default",
scrape_id bigint,
last_scraped text COLLATE pg_catalog."default",
name text COLLATE pg_catalog."default",
description text COLLATE pg_catalog."default",
neighborhood_overview text COLLATE pg_catalog."default",
host_id integer,
host_acceptance_rate text COLLATE pg_catalog."default",
host_listings_count integer,
neighbourhood text COLLATE pg_catalog."default",
neighbourhood_cleansed text COLLATE pg_catalog."default",
neighbourhood_group_cleansed text COLLATE pg_catalog."default",
latitude double precision,
longitude double precision,
property_type text COLLATE pg_catalog."default",
room_type text COLLATE pg_catalog."default",
minimum_nights integer,
maximum_nights integer,
calendar_updated text COLLATE pg_catalog."default",
has_availability text COLLATE pg_catalog."default",
availability_365 integer,
calendar_last_scraped text COLLATE pg_catalog."default",
number_of_reviews integer,
first_review text COLLATE pg_catalog."default",
last_review text COLLATE pg_catalog."default",
license text COLLATE pg_catalog."default",
instant_bookable text COLLATE pg_catalog."default",
calculated_host_listings_count integer,
reviews_per_month double precision,
CONSTRAINT id PRIMARY KEY (id)
)
```
2. Create a table `results` on your database :
``` SQL
CREATE TABLE public.results
(
extraction_date date,
listing_id integer,
past12_m50 integer,
past12_m75 integer,
past12_m95 integer,
past12_m100 integer,
past12_m95_e75 integer,
past12_m100_e75 integer,
civil_m50 integer,
civil_m75 integer,
civil_m95 integer,
civil_m100 integer,
civil_m95_e75 integer,
civil_m100_e75 integer,
predict_m50 integer,
predict_m75 integer,
predict_m95 integer
)
```
### Daily execution
This project has been created in such a way that it can be run every day. To process the data present on InsideAirbnb yesterday, just run the `Daily.py` file. To automate this execution, you can schedule it.
> For Windows, you can for example follow [this tutorial](https://www.jcchouinard.com/python-automation-using-task-scheduler/).
On average, the script execution time to retrieve the files and do the processing is 30 minutes. Your computer will still be usable because Python uses only one core to run.