Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/honzajavorek/czap
Scraping czap.cz data so you can filter available psychotherapists by any criteria you wish
https://github.com/honzajavorek/czap
czech czech-republic czechia git-scraping psychoterapists psychotherapy registry scraper scrapy
Last synced: about 5 hours ago
JSON representation
Scraping czap.cz data so you can filter available psychotherapists by any criteria you wish
- Host: GitHub
- URL: https://github.com/honzajavorek/czap
- Owner: honzajavorek
- License: unlicense
- Created: 2024-02-25T13:53:12.000Z (9 months ago)
- Default Branch: main
- Last Pushed: 2024-04-30T04:51:36.000Z (7 months ago)
- Last Synced: 2024-05-02T01:14:55.698Z (7 months ago)
- Topics: czech, czech-republic, czechia, git-scraping, psychoterapists, psychotherapy, registry, scraper, scrapy
- Language: Python
- Homepage:
- Size: 1.26 MB
- Stars: 1
- Watchers: 2
- Forks: 0
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# 💆 czap.cz members
Scraping [czap.cz members](https://czap.cz/adresar) so you can filter available psychotherapists by any criteria you wish:
- [Download JSON](https://raw.githubusercontent.com/honzajavorek/czap/main/items.json)
I wanted to filter a list of Czech psychotherapists according to different criteria than those available at the [registry website](https://czap.cz/adresar). For example, the registry allows to filter by location, but only to the level of region. As there is 700+ therapists in Prague itself, it's not very useful.
## Monitoring changes
I don't think it's particularly useful to monitor changes in the registry, but I used [git scraping](https://simonwillison.net/2020/Oct/9/git-scraping/) nevertheless, because why not:
- [History of changes](https://github.com/honzajavorek/czap/commits/main/items.json)
- [Feed of changes](https://github.com/honzajavorek/czap/commits/main.atom) (aka RSS)## Notes on development
The scraper uses my favorite [Scrapy](https://docs.scrapy.org/) framework.
So far I scrape only a few fields.
If you want to build on top of the data and you're missing something, let me know in [issues](https://github.com/honzajavorek/czap/issues).
However, because I won't have time to add the fields, you better edit the code and add them yourself.The scraper first downloads all registry with a single request.
The data is encoded not as a JSON, but as a non-standard JavaScript mess.
I figured out the library `demjson3` can parse it, but it takes long minutes (e.g. 30 min) to get the result.
I added cache so that the parse result stays around at least for a day.That data contains some info about members.
It is structured, but it's in a very cryptic structure which needs to be reverse-engineered.
If you're the kind of person who is into such thing, look at the end of the `parse()` method, where it iterates over individual members, and feel free to add fields there.If you prefer good old HTML scraping, look at the `parse_member()` method, where you can access response of individual member profile pages.
There you can use [Scrapy selectors](https://docs.scrapy.org/en/latest/topics/selectors.html) to add fields to the data.