https://github.com/kkiapay/lulz-scraping

Scraping the data you want from a website by specifying your output in parser.yml
https://github.com/kkiapay/lulz-scraping

json parser regex scraping selector yaml

Last synced: 3 months ago
JSON representation

Scraping the data you want from a website by specifying your output in parser.yml

Host: GitHub
URL: https://github.com/kkiapay/lulz-scraping
Owner: kkiapay
Created: 2019-05-18T17:37:48.000Z (about 7 years ago)
Default Branch: master
Last Pushed: 2023-02-15T21:37:28.000Z (over 3 years ago)
Last Synced: 2023-03-11T10:02:46.043Z (over 3 years ago)
Topics: json, parser, regex, scraping, selector, yaml
Language: Python
Homepage:
Size: 44.9 KB
Stars: 4
Watchers: 2
Forks: 0
Open Issues: 3
Metadata Files:
- Readme: README.md
- Security: SECURITY.md

Awesome Lists containing this project

README

# lulz-scraping

## Example

Let's take the following `HTML example`:

```html

Date
N° RCCM
Raison Sociale
Statut Juridique

08/05/2019

CI-ABJ-2019-B-10428

AMADEUS ABIDJANAIS

SARL U

```

You just have to describe your `parser.yml`:

```yml
site:
- cepici
cepici:
url: https://cepici.ci/views/annonces_legales/Affichage_ajax/SearchRS.php?countInit=0
request_type: GET
parameters:
- search_rs
parser:
name: tr#contenu a
legal_form: tr#contenu > td:nth-child(4)
rccm_number: tr#contenu > td:nth-child(2)
date_of_creation: tr#contenu > td:nth-child(1)
```
Output response you will get with that `parser`

```json
[
{
"name": "AMADEUS ABIDJANAIS",
"legal_form": "SARL U",
"rccm_number": "CI-ABJ-2019-B-10428",
"date_of_creation": "08/05/2019"
}
]
```
## Contributing 🤝
> Feel free to follow the procedure to make it even more awesome!

1. Create an `issue` so we `get the discussion started`
2. Fork it!
3. Create your feature branch: `git checkout -b my-new-feature`
4. Commit your changes: `git commit -am 'Add some feature'`
5. Push to the branch: `git push origin my-new-feature`
6. Submit a pull request

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/kkiapay/lulz-scraping

Awesome Lists containing this project

README