https://github.com/asirihewage/simplest-xpath-web-scraper
Simplest web scraper created using Python3 and MongoDB
https://github.com/asirihewage/simplest-xpath-web-scraper
data data-mining python3 scraper web webscrping
Last synced: 5 months ago
JSON representation
Simplest web scraper created using Python3 and MongoDB
- Host: GitHub
- URL: https://github.com/asirihewage/simplest-xpath-web-scraper
- Owner: asirihewage
- Created: 2022-01-30T09:15:38.000Z (over 4 years ago)
- Default Branch: main
- Last Pushed: 2022-11-21T23:43:40.000Z (over 3 years ago)
- Last Synced: 2025-06-07T23:51:01.290Z (about 1 year ago)
- Topics: data, data-mining, python3, scraper, web, webscrping
- Language: Python
- Homepage: https://w3genesis.com
- Size: 150 KB
- Stars: 2
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Simplest xpath web scraper
Simples web scraper created using Python3
- extract data using multiple xpaths from multiple urls
- save response in MongoDB
- exceptions and error handling
- only for basic web sraping work from static HTML web pages
## setup Data.py for each url with xpath
```json
{
"url": "https://www.technology.pitt.edu/blog/zoom10faq",
"xpaths": [
{
"questions": '//div[@class="field-item even"]/h2/text()',
"answers": '//div[@class="field-item even"]/p/text()',
"correct_answer": '//div[@class="field-item even"]/p[0]/text()'
}
]
}
```
## setup mongodb database connection string
```python
myclient = pymongo.MongoClient("mongodb://host:port/") # or add the connection url
mydb = myclient["database"]
mycol = mydb["collection"]
```
## install python dependancies
```commandline
pip3 install -r requirements.txt
```
## run
```commandline
python3 main.py
```
## response

### Author : Asiri Hewage