https://github.com/bacdong/web-crawler

Crawler website with requests library in python
https://github.com/bacdong/web-crawler

crawler-python python python-library python-requests python-spider python3

Last synced: 6 months ago
JSON representation

Crawler website with requests library in python

Host: GitHub
URL: https://github.com/bacdong/web-crawler
Owner: Bacdong
Created: 2020-06-16T11:18:59.000Z (over 5 years ago)
Default Branch: master
Last Pushed: 2020-09-07T08:24:41.000Z (over 5 years ago)
Last Synced: 2025-07-02T03:06:27.345Z (8 months ago)
Topics: crawler-python, python, python-library, python-requests, python-spider, python3
Language: Python
Size: 63.5 KB
Stars: 8
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

### Web Crawler with requests and beautifulsoup4(bs4) library in python ###

#### Installing for Development: ####
* IDE:
```
- Download python (https://www.python.org/downloads/)
- Install IDE support compile Python: VSCode, Pycharm, Sublime Text, ...
```
* Extension for IDE:
```
+ For VSCode, you need install some extension to support code python:
. HTML CSS Support
. Python
. Remote Development
+ For IDE different: Search more information on google
```
* To run:
```
- OPEN TERMINAL:
+ cd crawler
+ pip install requests // (pip3 install requests)
+ pip install beautifulsoup4 // (pip3 install beautifullsoup4)
- OPEN getData.py file:
( * If you no need save data into database:
. Comment some function use to connect database: Eg. insertData..(),...
. No need to worry about connecting and dealing with databases.
)
- Replace current available "url" variable in file with the one url address you want.

- Reopen terminal and run with: python getData.py
```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/bacdong/web-crawler

Awesome Lists containing this project

README