https://github.com/boechat107/myscrapys

Some scrapers using Scrapy framework.
https://github.com/boechat107/myscrapys

Last synced: 2 months ago
JSON representation

Some scrapers using Scrapy framework.

Host: GitHub
URL: https://github.com/boechat107/myscrapys
Owner: boechat107
Created: 2014-02-24T21:32:15.000Z (about 11 years ago)
Default Branch: master
Last Pushed: 2014-05-16T13:57:53.000Z (about 11 years ago)
Last Synced: 2025-01-21T13:25:56.373Z (4 months ago)
Language: Python
Size: 164 KB
Stars: 0
Watchers: 2
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# myscrapys

Just my personal repository of scrapers using the
[Scrapy](http://doc.scrapy.org/en/latest/intro/overview.html) framework.
For now, it comes just for learning purposes.

## List of implemented scrapers

### GovFiredSpider

Fired public employees of Brazil,
[Portal Transparência](http://www.portaldatransparencia.gov.br/expulsoes/entrada).

* There are multiple pages to be scraped (pagination).
* Some targets, employees, have more than one dismissal table (containing
information like occupation, department, reasons...).
* Not implemented yet:
+ There is no data treatment yet (there are empty spaces of fields).
+ Only the first page is captured for now (it should be solved very soon).

### IptuCuritibaSpider

It is intended to scrape information about properties of
[Curitiba city](http://www2.curitiba.pr.gov.br/gtm/iptu/carnet/default.aspx).

* Not all data is been scraped, more fields need to be added.
* It's very hard to generate existent register numbers.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/boechat107/myscrapys

Awesome Lists containing this project

README