https://github.com/kkiapay/lulz-scraping
Scraping the data you want from a website by specifying your output in parser.yml
https://github.com/kkiapay/lulz-scraping
json parser regex scraping selector yaml
Last synced: 2 months ago
JSON representation
Scraping the data you want from a website by specifying your output in parser.yml
- Host: GitHub
- URL: https://github.com/kkiapay/lulz-scraping
- Owner: kkiapay
- Created: 2019-05-18T17:37:48.000Z (about 7 years ago)
- Default Branch: master
- Last Pushed: 2023-02-15T21:37:28.000Z (over 3 years ago)
- Last Synced: 2023-03-11T10:02:46.043Z (over 3 years ago)
- Topics: json, parser, regex, scraping, selector, yaml
- Language: Python
- Homepage:
- Size: 44.9 KB
- Stars: 4
- Watchers: 2
- Forks: 0
- Open Issues: 3
-
Metadata Files:
- Readme: README.md
- Security: SECURITY.md
Awesome Lists containing this project
README
# lulz-scraping
## Example
Let's take the following `HTML example`:
```html
Date
N° RCCM
Raison Sociale
Statut Juridique
08/05/2019
CI-ABJ-2019-B-10428
AMADEUS ABIDJANAIS
SARL U
```
You just have to describe your `parser.yml`:
```yml
site:
- cepici
cepici:
url: https://cepici.ci/views/annonces_legales/Affichage_ajax/SearchRS.php?countInit=0
request_type: GET
parameters:
- search_rs
parser:
name: tr#contenu a
legal_form: tr#contenu > td:nth-child(4)
rccm_number: tr#contenu > td:nth-child(2)
date_of_creation: tr#contenu > td:nth-child(1)
```
Output response you will get with that `parser`
```json
[
{
"name": "AMADEUS ABIDJANAIS",
"legal_form": "SARL U",
"rccm_number": "CI-ABJ-2019-B-10428",
"date_of_creation": "08/05/2019"
}
]
```
## Contributing 🤝
> Feel free to follow the procedure to make it even more awesome!
1. Create an `issue` so we `get the discussion started`
2. Fork it!
3. Create your feature branch: `git checkout -b my-new-feature`
4. Commit your changes: `git commit -am 'Add some feature'`
5. Push to the branch: `git push origin my-new-feature`
6. Submit a pull request