Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/l-portet/yellow-scraper
Data scraper of french yellow pages (Pages Jaunes)
https://github.com/l-portet/yellow-scraper
extract node pages-jaunes parsing puppeteer scraper yellow-pages yellow-scraper
Last synced: about 2 months ago
JSON representation
Data scraper of french yellow pages (Pages Jaunes)
- Host: GitHub
- URL: https://github.com/l-portet/yellow-scraper
- Owner: l-portet
- Created: 2019-05-26T09:54:18.000Z (over 5 years ago)
- Default Branch: master
- Last Pushed: 2021-06-05T02:32:00.000Z (over 3 years ago)
- Last Synced: 2023-03-03T04:04:12.830Z (almost 2 years ago)
- Topics: extract, node, pages-jaunes, parsing, puppeteer, scraper, yellow-pages, yellow-scraper
- Language: JavaScript
- Homepage:
- Size: 26.4 KB
- Stars: 2
- Watchers: 0
- Forks: 0
- Open Issues: 3
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# yellow-scraper
Scrape the french yellow pages (Pages Jaunes) with puppeteer> :warning: **MAY BE DEPRECATED**: Since Pages Jaunes pages and data structure may change, this scraper won't be automatically updated.
## Installation
```bash
npm install
```## Usage
Set up the `config.js` file#### Sample config
```javascript
module.exports = {
query: {
keyword: 'luthier',
location: 'Rennes'
}, // Will search all 'luthier' businesses in 'Rennes'
headless: true, // Use chrome in headless mode
userAgent: 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36',
acceptLanguage: 'fr-FR,fr;q=0.9,en-US;q=0.8,en;q=0.7,la;q=0.6',
outputFilename: 'output',
outputFormat: 'csv', // Supported format : 'json', 'csv'
maxResults: -1, // -1 => all or N max allowed results (the scraper will stop when the limit is outreached)
puppeteerArgs: [], // Additional args for puppeteer (like proxy for example)
baseURL: 'https://www.pagesjaunes.fr', // Only target this domain if you have the proper rights
safeMode: true // Safe mode sets a delay between each query
}```
#### Run the scraper
```bash
npm start
```
## Todo
Export as Excel format (xls)## Issues
If you find an issue, feel free to contact me or open an issue on github. You can also contribute by creating a pull request.## Disclaimer
I can't be charged for any abusive usage or problem of this software. Be sure you have the proper rights before you run it.