https://github.com/plugarut/land_registry_api

Last synced: 8 days ago
JSON representation

Host: GitHub
URL: https://github.com/plugarut/land_registry_api
Owner: PlugaruT
Created: 2020-04-27T15:35:41.000Z (about 6 years ago)
Default Branch: master
Last Pushed: 2020-05-07T09:17:16.000Z (about 6 years ago)
Last Synced: 2025-03-06T18:17:30.572Z (over 1 year ago)
Language: Python
Size: 37.1 KB
Stars: 0
Watchers: 2
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Backend Code Challenge

This project represents a web application written in Django. The application is able to seed it's database with CSV files from [here](https://data.gov.uk/dataset/4c9b7641-cf73-4fd9-869a-4bfeed6d440e/hm-land-registry-price-paid-data).

### Project Setup

Make sure you have Python `3.8+` installed on your machine.
I recommend to also create a virtual environment in order to isolate project dependencies.

To install project dependencies just run
```
make install
```

After that, check if all the tests are green
```
make test
```
The project uses `black` for code formatting, and to format the code, just run
```
make lint
```
To start the development server on your machine, run
```
make run
```

As I mentioned, the project database needs to be seeded with data. For this, we have a download a CSV file from [here](https://data.gov.uk/dataset/4c9b7641-cf73-4fd9-869a-4bfeed6d440e/hm-land-registry-price-paid-data). Any CSV file will do.

After that, just run
```
python manage.py seed_db --file_path=your_path_to_downloaded_file
```
This command will read the file and will insert all it's contents into the database. All of this is done with the power of SQL 💪.

### Endpoints

At the moment, there are only two endpoints available.

- `/api/v1/house-prices` is the endpoint where data is grouped by month for each property type.

For filtering of the data, there are a few query params available:
- `from_date`, a string of the format `YYYY-MM-DD` representing from which date the data should be filtered.
- `to_date`, a string of the format `YYYY-MM-DD` representing to which date the data should be filtered.
- `postcode`, a string representing the postcode of the region where the land is located.
- `/api/v1/transactions` is the endpoint where transactions are grouped by price range types.

For filtering of the data, there are a few query params available:
- `from_date`, a string of the format `YYYY-MM-DD` representing from which date the data should be filtered.
- `to_date`, a string of the format `YYYY-MM-DD` representing to which date the data should be filtered.
- `postcode`, a string representing the postcode of the region where the land is located.

### DB Tweaks

The SQL queries that are aggregating the data are relatively simple, but because of tha amount of data to aggregate, they are slow.
Because of this, I've added to `registry_api_landtransaction` two indexes, for `postcode` and `price` columns. After I've added these two indexes, I've seen a quite big jump in terms of speed, especially when filtering by postcode.
There is one more issue, filtering by date range is quite slow in my opinion, and I wasn't able to find a decent solution 😞.

### Decisions

As you might have noticed, I haven't used Django ORM at all. Now, there are a few reasons why:
- First, I could use the ORM when reading the file and storing data into DB, but, it would be very slow and doing the same action using SQL adds a big performance benefit.
- When aggregating data, I decided to go with SQL because SQL is very good at aggregating big amount of data and the ORM would just be an intermediate layer I decided to skip.

I know this may be a controversy decision, when to use and when to now use the ORM, but, if the task would be to build a CRUD API, I would definitely choose to use the ORM, but in this particular case, I decided to go with SQL because it's just better for this kind of job.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/plugarut/land_registry_api

Awesome Lists containing this project

README