https://github.com/boechat107/myscrapys
Some scrapers using Scrapy framework.
https://github.com/boechat107/myscrapys
Last synced: 2 months ago
JSON representation
Some scrapers using Scrapy framework.
- Host: GitHub
- URL: https://github.com/boechat107/myscrapys
- Owner: boechat107
- Created: 2014-02-24T21:32:15.000Z (about 11 years ago)
- Default Branch: master
- Last Pushed: 2014-05-16T13:57:53.000Z (about 11 years ago)
- Last Synced: 2025-01-21T13:25:56.373Z (4 months ago)
- Language: Python
- Size: 164 KB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# myscrapys
Just my personal repository of scrapers using the
[Scrapy](http://doc.scrapy.org/en/latest/intro/overview.html) framework.
For now, it comes just for learning purposes.## List of implemented scrapers
### GovFiredSpider
Fired public employees of Brazil,
[Portal Transparência](http://www.portaldatransparencia.gov.br/expulsoes/entrada).* There are multiple pages to be scraped (pagination).
* Some targets, employees, have more than one dismissal table (containing
information like occupation, department, reasons...).
* Not implemented yet:
+ There is no data treatment yet (there are empty spaces of fields).
+ Only the first page is captured for now (it should be solved very soon).### IptuCuritibaSpider
It is intended to scrape information about properties of
[Curitiba city](http://www2.curitiba.pr.gov.br/gtm/iptu/carnet/default.aspx).* Not all data is been scraped, more fields need to be added.
* It's very hard to generate existent register numbers.