https://github.com/gitchaell/computer-scrapping
Tool that extracts data from the pages of companies that sell computers in the city of Trujillo - Peru, exports them in an XLSX file according to a relational data model, and displays them on a Power BI dashboard.
https://github.com/gitchaell/computer-scrapping
data-analysis data-structures data-visualization database dbdiagram export-excel powerbi scrapper-script scrapping xlsx
Last synced: about 2 months ago
JSON representation
Tool that extracts data from the pages of companies that sell computers in the city of Trujillo - Peru, exports them in an XLSX file according to a relational data model, and displays them on a Power BI dashboard.
- Host: GitHub
- URL: https://github.com/gitchaell/computer-scrapping
- Owner: gitchaell
- License: mit
- Created: 2021-11-18T16:44:37.000Z (over 4 years ago)
- Default Branch: main
- Last Pushed: 2021-11-26T02:53:22.000Z (over 4 years ago)
- Last Synced: 2025-06-13T11:46:07.319Z (about 1 year ago)
- Topics: data-analysis, data-structures, data-visualization, database, dbdiagram, export-excel, powerbi, scrapper-script, scrapping, xlsx
- Language: TypeScript
- Homepage:
- Size: 4.55 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Computer Scrapping
Tool that extracts data from the pages of companies that sell computers in the city of Trujillo - Peru, exports them in an XLSX file according to a relational data model, and displays them on a Power BI dashboard.
## Objectives
* Use web scraping techniques to the selected websites to obtain the data. (minimum 03 web pages and 02 web scraping techniques)
* Build the Dashboard in Power BI that shows the dynamic analysis of the data
## Steps
### 1. Search for companies that sell computers in Trujillo
* [Falabella](https://www.falabella.com.pe/)
* [La Curacao](https://www.lacuracao.pe/)
* [Oechsle](https://www.oechsle.pe/)
* [Efe](https://www.efe.com.pe/)
* [Hiraoka](https://hiraoka.com.pe/)
* [Coolbox](https://www.coolbox.pe/)
### 2. Design of the data model
* [DB Diagram](https://dbdiagram.io/) - Data Modeling Tool

Download Data Model PDF File [here](https://raw.githubusercontent.com/MichaellAlavedraMunayco/computer-scrapping/main/.github/docs/computers.database.pdf)
### 3. Search for tools for data extraction
* Node JS - Javascript Engine
* [Puppeteer](https://www.npmjs.com/package/puppeteer) - Web page manipulation tool
* [Cheerio JS](https://www.npmjs.com/package/cheerio) - Web page querier tool like JQuery
* [Excel JS](https://www.npmjs.com/package/exceljs) - Tool for exporting extracted data to XLSX file