Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/jihefel/internubel-website-scraping
This repository contains a Puppeteer-based script for scraping product details from Internubel's website.
https://github.com/jihefel/internubel-website-scraping
nodejs puppeteer scraper scraping scraping-websites
Last synced: about 1 month ago
JSON representation
This repository contains a Puppeteer-based script for scraping product details from Internubel's website.
- Host: GitHub
- URL: https://github.com/jihefel/internubel-website-scraping
- Owner: Jihefel
- License: mit
- Created: 2024-06-26T15:25:40.000Z (5 months ago)
- Default Branch: master
- Last Pushed: 2024-06-28T17:11:46.000Z (5 months ago)
- Last Synced: 2024-10-10T20:01:35.025Z (about 1 month ago)
- Topics: nodejs, puppeteer, scraper, scraping, scraping-websites
- Language: JavaScript
- Homepage:
- Size: 70.3 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE.md
Awesome Lists containing this project
README
# Internubel product data scraping scriptThis repository contains a Puppeteer-based script for scraping product data from Internubel's website (https://www.internubel.be).
The script logs into the site, navigates through product groups, and extracts product details including title, image, nutrition score, and nutritional information.
Data is structured and saved into JSON files categorized by product groups, sub-groups, and sub-sub-groups.
## PrerequisitesRequirements for the script:
- Node.js
- npm
- Create an account on the Internubel website if you do not already have one. (https://www.internubel.be/Register.aspx?lId=3)## Installation
1. Clone the repository
```sh
git clone https://github.com/Jihefel/Internubel-website-scraping.git
```
2. Install NPM packages
```sh
npm install
```
3. Change the **14th line** of **internubel.js** to select the language you want to use.
* Replace "francais" by one of the languages shown above (11th line) if needed : "nederlands", "english", "deutsch"
### Config .env variablesCreate your configuration file `.env` in the root directory as the following to store your credentials.
```yaml
LOGIN_EMAIL=your_email
LOGIN_PASSWORD=your_password
```Replace ```your_email``` and ```your_password``` with your Internubel login credentials.
## UsageRun the scraping script using Node.js in your terminal:
```bash
node internubel.js
```And wait for a moment...
The script will launch a Puppeteer-controlled browser, log into Internubel using provided credentials, and scrape product data into structured JSON files stored in the data directory.
## Dependencies
- [Puppeteer](https://www.npmjs.com/package/puppeteer)
- [dotenv](https://www.npmjs.com/package/dotenv)
- [Jest](https://www.npmjs.com/package/jest)## License
Distributed under the MIT License. See [MIT License](https://opensource.org/licenses/MIT) for more information.