https://github.com/cristidraghici/geocoded-bucharest-family-medicine-providers
https://github.com/cristidraghici/geocoded-bucharest-family-medicine-providers
Last synced: 3 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/cristidraghici/geocoded-bucharest-family-medicine-providers
- Owner: cristidraghici
- Created: 2023-08-29T14:09:30.000Z (almost 2 years ago)
- Default Branch: master
- Last Pushed: 2024-06-04T19:49:07.000Z (about 1 year ago)
- Last Synced: 2024-06-05T13:35:02.722Z (about 1 year ago)
- Language: Python
- Size: 815 KB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# geocoded-bucharest-family-medicine-providers
> The list of family medicine offices in Bucharest with approximate coordinates
## About
The `output.json` file contains information about the family medicine doctors in bucharest, together with geolocation information. This file will contain the most recent list of family medicine doctors.
It will have the following structure:
```python
data = [
{
"title": str, # str
"description": [str], # list of str
"latitude": float, # float
"longitude": float # float
},
...
]
```### View on a map
[https://cristidraghici.github.io/generic-map-with-pois/?api=https://cristidraghici.github.io/geocoded-bucharest-family-medicine-providers/output.json](https://cristidraghici.github.io/generic-map-with-pois/?api=https://cristidraghici.github.io/geocoded-bucharest-family-medicine-providers/output.json)
### Versions
We will use simple versioning for the code and also the output files. The releases will be tagged first with `v1`, `v2`, `v3` etc. Before we start working on a new version for the parser, we will save the output in the `./.archive/` folder, in a newly created corresponding version subfolder.
## File structure
This is a YOLO structure which has the purpose to maintain older versions in the git repository. The files are pretty small, so the cost is not great from that point of view. And it seems that it's worth paying to be sure we will always have the data available.
- `.cache` contains cache from previous runs. If you specify the `--cache` param when you run the script you will use the data in the cache if available, but also update it at the end of the run;
- `.archive` contains a history of results after running the parser. In a folder called `v1`, `v2`, etc. we will store the source file and the outputs generated by running the parser. We will not keep the files of the parser, but each version folder will correspond to a tagged release of the script;
- we keep the current source and outputs at the root of the project.```
.cache/
|-- addresses_cache.json
|-- coordinates_cache.json
.archive/
|-- v1/
| |-- 20230721_Lista cabinete medicina de familie_20.07.2023
| |-- input.xlsx
| |-- output.json
|-- v2/
| |-- ...
20240401_Lista cabinete medicina de familie_01.04.2024
index.html
input.xlsx
output.json
geocode_medical_addresses.py
...
```## How to use
The source list is not consistent, nor in a proper format. This is why we will start with separate parsers which can later be merged if needed. It's also the reason why we store the source in this repo.
These are some examples of how to run the script:
- `python geocode_medical_addresses.py`
- `python ./geocode_medical_addresses.py --addresses --geocodes --excel --json --cache`
- `python ./geocode_medical_addresses.py --addresses --geocodes --excel --json --cache --dev`### New data sources
Main source:
- [http://cas.cnas.ro/casmb/page/lista-cabinete-medicina-de-familie.html](http://cas.cnas.ro/casmb/page/lista-cabinete-medicina-de-familie.html)
Here are some ideas about how to handle the newly downloaded files:
- we keep the filename as close to the source as possible;
- before starting, remember to create a release for the parser and also save the current output in the `./.archive` folder;
- make a minimal cleanup in the file (remove the formatting, remove the headers form the file), using a previous source file as a model.### Coordinates
We use OSM and Nominatim to get the coordinates for the address. In case an address is not found automatically, we can go to the Nominatim website, search for the address for which the error was encountered, manually find something close, then update the `manual_address` column in the excel (`./input.xlsx`).
- [https://nominatim.openstreetmap.org/](https://nominatim.openstreetmap.org/)
- [https://www.openstreetmap.org/#map=7/45.997/26.906](https://www.openstreetmap.org/#map=7/45.997/26.906)