Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/sandrewtx08/gearbest_scraper
Seeks catalog ads from Gearbest web page, scraping catalogs information then it's storing by a sequence of SQL commands through a relational database.
https://github.com/sandrewtx08/gearbest_scraper
crawler gearbest lxml python scraper scraping sqlite3
Last synced: about 5 hours ago
JSON representation
Seeks catalog ads from Gearbest web page, scraping catalogs information then it's storing by a sequence of SQL commands through a relational database.
- Host: GitHub
- URL: https://github.com/sandrewtx08/gearbest_scraper
- Owner: sandrewTx08
- License: mit
- Created: 2021-10-30T20:53:57.000Z (about 3 years ago)
- Default Branch: master
- Last Pushed: 2021-11-25T13:03:10.000Z (about 3 years ago)
- Last Synced: 2024-11-11T15:32:32.305Z (about 2 months ago)
- Topics: crawler, gearbest, lxml, python, scraper, scraping, sqlite3
- Language: Python
- Homepage:
- Size: 69.3 KB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Gearbest_Scraper
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
![GitHub tag (latest by date)](https://img.shields.io/github/v/tag/sandrewTx08/Gearbest_Scraper)# Overview
Gearbest_Scraper is simple, intuitive and manageable, it seeks catalog ads from Gearbest web page, scraping their information then it's storing by a sequence of SQL commands through a relational database.
# Support
| | | | | | |
|---|---|---|---|---|---|
| CSV | MySQL | SQLite | Postgresql | Microsoft SQL Server | Oracle Database |
| ✅ Avaliable | ✅ Avaliable | ✅ Avaliable | ⚠️ Soon | ⚠️ Maybe soon | ⚠️ Maybe soon |# Installing
1. Download:
```bash
> git clone https://github.com/sandrewTx08/Gearbest_Scraper
```2. Move to directory
```bash
> cd Gearbest_Scraper
```3. Installing dependencies:
```bash
Gearbest_Scraper> install.bat
```__or__
```bash
Gearbest_Scraper:~$ ./install.bash
```__or__
```bash
Gearbest_Scraper> pip install -r requirements.txt
```# How to use
1. Define yours search list keywords in __configuration.json__ file
```json
{"search":{"list":["keyword_foo_1","keyword_foo_2","keyword_foo_3"]}}
```2. Execute the program
```bash
> cd Gearbest_Scraper
Gearbest_Scraper> start.bat
```__or__
```bash
Gearbest_Scraper> python main.py
```# Methods
Methods is how Gearbest_Scraper receive catalog ads.
So you can use a simple script instead parsing argument.
Windows:
```bash
Gearbest_Scraper> start.bat
```Linux:
```bash
Gearbest_Scraper:~$ ./start.bash
```Setting search method example:
```
Method: s
```__Search__ is select by default.
## Link method
This method scrape all catalogs related to main page links on painel called "Category".
The number total page is set by sum of parent and childrens links on painel menu. Overtime database get larger.Command line:
```bash
> python main.py --mode link
```## Search method
Search method uses a configuration file to set catalog targets.
The "search_list" inside the file must contain a list of keywords to be scrape like a search bar style.Command line:
```bash
> python main.py --mode search
```## Popular method
It scrape the most popular searches according web page.
Command line:
```bash
> python main.py --mode popular
```# Configuration file:
The configuration file must have the following fields:
field|key|description|
|---|---|---|
|method||settings realted to its function|
|connection|request|to request web pages|
|connection|database|database settings|## Configurations example:
Phone brands list example:
```json
{"search":{"list":["asus","huawai","lenovo","samsung","ulefone","xiaomi"]}}
```HTTP Header example:
```json
{"headers":{"User-Agent":"Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"}}
```Defining database path:
```json
{"database":{"sqlite":{"path":"C:/Users/some_user/Documents/gearbest_scraper.db"}}}
```