https://github.com/dearopen/django-easy-scraper

Django apps to scrape data from web page easily
https://github.com/dearopen/django-easy-scraper

automation django django-rest-framework python python3 webcrawler webcrawling webscraper webscraping

Last synced: about 1 month ago
JSON representation

Django apps to scrape data from web page easily

Host: GitHub
URL: https://github.com/dearopen/django-easy-scraper
Owner: dearopen
License: mit
Created: 2020-11-17T08:16:45.000Z (over 5 years ago)
Default Branch: main
Last Pushed: 2020-11-23T19:02:01.000Z (over 5 years ago)
Last Synced: 2025-11-07T15:28:39.789Z (8 months ago)
Topics: automation, django, django-rest-framework, python, python3, webcrawler, webcrawling, webscraper, webscraping
Language: Python
Homepage: https://pypi.org/project/django-easy-scraper/
Size: 15.6 KB
Stars: 2
Watchers: 1
Forks: 1
Open Issues: 4
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          # Django Easy Scraper

An standalone django app that can be used/intigrated with both django and no-django application easily. the scraping mechanism is  on `Regular Expression` and `xpath` which is mean you can use what you are familiar with very easily.

It requires to install python `requests` modules

## Install

`pip install django-easy-scraper`

## Basic Uses

if you use regex:

```

from django_easy_scraper import scraper

class ScrapeExampleDotCom(scraper.Scraper):

    regex_fields = {

        'price': "Write Your Regex pattern for price here",

        'title': "Write your regex pattern for title here",

        # Like above way you can add as much fields/keys as you want

    }

```

if you use xpath:

```

from django_easy_scraper import scraper

class ScrapeExampleDotCom(scraper.Scraper):

    xpath_fields = {

        'price': "Write Your xpath pattern for price here",

        'title': "Write your xpath pattern for title here",

        # Like above way you can add as much fields/keys as you want

    }

```

### Scrape now

```

url = 'www.example.com/bla-bla-details-page/

data = ScrapeExampleDotCom.regex_url_scraper(url)

print(data)

```

and the response should look like this if your regex pattern are correct:

```

{

    'price': 4,

    'title': 'an scraped title',

}

```

`regex_url_scraper` method always gives you json response,

So if you add many regex pattern in `regex_fields`, it will give you response that number of dictionary key with result that you added in that dictionary.

# Multiple Sites Scraping together

You don't need to call different method for different site all the time !! Just call once and scrape all, something fun, right?

### Like you are gonna scrape three sites:

> www.example.com

> www.exampletwo.com

> www.examplethree.com

But how will those site product scrape automatically, it scares you ? 

- Wirte Regex pattern for all above site with the fields that you want to scrape:

```

from django_easy_scraper import scraper

class ScrapeExampleDotCom(scraper.Scraper):

    regex_fields = {

        'price': "Write Your Regex pattern for price here",

        'title': "Write your regex pattern for title here",

        # Like above way you can add as much fields/keys as you want

    }

class ScrapeExampleTwo(scraper.Scraper):

    regex_fields = {

        'price': "Write Your Regex pattern for price here",

        'title': "Write your regex pattern for title here",

        # Like above way you can add as much fields/keys as you want

    }

class ScrapeExampleThree(scraper.Scraper):

    regex_fields = {

        'price': "Write Your Regex pattern for price here",

        'title': "Write your regex pattern for title here",

        # Like above way you can add as much fields/keys as you want

    }

```

You have written regular expression for you all the site you are gonna scrape,

Now it's time use our `Switch` class that will route your script/class based on the site you are gonna scrape? Cool, right ? !!

It's where the magic really begins:

Just place all your class in the dictionary `switcher`.

> Important Note:

`key` name should be domain name, pure domain name, no www or http or slash, dont add anything prefix/suffix

`value` should be the class of that domain you have written for and place it's method `regex_url_scraper

```

from django_easy_scraper import switch

class Switch(switch.BaseSwitch):

    switcher = {

        'example.com': ScrapeExampleDotCom.regex_url_scraper,

        'exampletwo.com': ScrapeExampleTwo.regex_url_scraper,

        'examplethree.com': ScrapeExampleThree.regex_url_scraper,

    }

```

> If you use xpath, you have pass `xpath_scraper` instead of `regex_url_scraper`

So you have done routing your script/class based on the url it gets.

### Get response data as python dictionary like above site:

```

url = 'Any of site you have written class for the site and added in switch class'

response = Switch.get_data(url=url, raise_exception=False)

print(response) # Will give you an object of data that you trying to scrape

```

Switch class is giving you facilities to route your scraping class automacally based whatever site link pass to it's `get_data` method.

`get_data` method's `raise_exception` is it handle if you want to raise excepiton when your expected fields not found

### Got an issue ?

Please open an issue on our github repo: https://github.com/dearopen/django-easy-scraper

Don't forget to star to this project if you like this.

Happy Scraping !!

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/dearopen/django-easy-scraper

Awesome Lists containing this project

README