An open API service indexing awesome lists of open source software.

https://github.com/kkiapay/lulz-scraping

Scraping the data you want from a website by specifying your output in parser.yml
https://github.com/kkiapay/lulz-scraping

json parser regex scraping selector yaml

Last synced: 2 months ago
JSON representation

Scraping the data you want from a website by specifying your output in parser.yml

Awesome Lists containing this project

README

          

# lulz-scraping

## Example

Let's take the following `HTML example`:

```html




Date
N° RCCM
Raison Sociale
Statut Juridique





08/05/2019

CI-ABJ-2019-B-10428


AMADEUS ABIDJANAIS


SARL U

```

You just have to describe your `parser.yml`:

```yml
site:
- cepici
cepici:
url: https://cepici.ci/views/annonces_legales/Affichage_ajax/SearchRS.php?countInit=0
request_type: GET
parameters:
- search_rs
parser:
name: tr#contenu a
legal_form: tr#contenu > td:nth-child(4)
rccm_number: tr#contenu > td:nth-child(2)
date_of_creation: tr#contenu > td:nth-child(1)
```
Output response you will get with that `parser`

```json
[
{
"name": "AMADEUS ABIDJANAIS",
"legal_form": "SARL U",
"rccm_number": "CI-ABJ-2019-B-10428",
"date_of_creation": "08/05/2019"
}
]
```
## Contributing 🤝
> Feel free to follow the procedure to make it even more awesome!

1. Create an `issue` so we `get the discussion started`
2. Fork it!
3. Create your feature branch: `git checkout -b my-new-feature`
4. Commit your changes: `git commit -am 'Add some feature'`
5. Push to the branch: `git push origin my-new-feature`
6. Submit a pull request