https://github.com/asirihewage/simplest-xpath-web-scraper

Simplest web scraper created using Python3 and MongoDB
https://github.com/asirihewage/simplest-xpath-web-scraper

data data-mining python3 scraper web webscrping

Last synced: 5 months ago
JSON representation

Simplest web scraper created using Python3 and MongoDB

Host: GitHub
URL: https://github.com/asirihewage/simplest-xpath-web-scraper
Owner: asirihewage
Created: 2022-01-30T09:15:38.000Z (over 4 years ago)
Default Branch: main
Last Pushed: 2022-11-21T23:43:40.000Z (over 3 years ago)
Last Synced: 2025-06-07T23:51:01.290Z (about 1 year ago)
Topics: data, data-mining, python3, scraper, web, webscrping
Language: Python
Homepage: https://w3genesis.com
Size: 150 KB
Stars: 2
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

          # Simplest xpath web scraper

Simples web scraper created using Python3

- extract data using multiple xpaths from multiple urls

- save response in MongoDB

- exceptions and error handling

- only for basic web sraping work from static HTML web pages

## setup Data.py for each url with xpath

```json

    {

        "url": "https://www.technology.pitt.edu/blog/zoom10faq",

        "xpaths": [

            {

                "questions": '//div[@class="field-item even"]/h2/text()',

                "answers": '//div[@class="field-item even"]/p/text()',

                "correct_answer": '//div[@class="field-item even"]/p[0]/text()'

            }

        ]

    }

```

## setup mongodb database connection string

```python

myclient = pymongo.MongoClient("mongodb://host:port/") # or add the connection url

mydb = myclient["database"]

mycol = mydb["collection"]

```

## install python dependancies

```commandline

pip3 install -r requirements.txt

```

## run

```commandline

python3 main.py

```

## response

 ![Simplest xpath web scraper](47dcf6e5-0d63-4824-9135-e2b4171a171f.jfif)

### Author : Asiri Hewage

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/asirihewage/simplest-xpath-web-scraper

Awesome Lists containing this project

README