Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/iam-mhaseeb/djano-anti-crawler

A light weight anti crawler app for Django.
https://github.com/iam-mhaseeb/djano-anti-crawler

django django-application django-library python

Last synced: 11 days ago
JSON representation

A light weight anti crawler app for Django.

Awesome Lists containing this project

README

        

# Djano-Anti-Crawler

[![PyPI version](https://badge.fury.io/py/django-anti-crawler.svg)](https://badge.fury.io/py/django-anti-crawler)

A light weight anti crawler Django app which blocks the IP addresses which sends too many hits to your application.
You can decide the number of hits that are allowed per IP address in defined time.

Installation
-----------
1. Install via `pip`

```
pip install django-anti-crawler
```

Quick start
-----------

1. Add `django_anti_crawler` to your INSTALLED_APPS setting like this::

```
INSTALLED_APPS = [
...
'django_anti_crawler',
]
```

2. Add `'DjangoAntiCrawlerMiddleware'` to your middleware classes in `settings.py` file::

```
MIDDLEWARE = [
'django_anti_crawler.middlewares.DjangoAntiCrawlerMiddleware',
'django.middleware.security.SecurityMiddleware',
'django.contrib.sessions.middleware.SessionMiddleware',
'django.middleware.common.CommonMiddleware',
'django.middleware.csrf.CsrfViewMiddleware',
'django.contrib.auth.middleware.AuthenticationMiddleware',
'django.contrib.messages.middleware.MessageMiddleware',
'django.middleware.clickjacking.XFrameOptionsMiddleware',
]
```

`DjangoAntiCrawlerMiddleware` can be and should be first middleware as we need to make sure IP check is the very first
thing in processing request.

3. If cache settings are not defined in your settings.py file, then you need to add below lines to your settings.py file::

```
CACHES = {
'default': {
'BACKEND': 'django.core.cache.backends.db.DatabaseCache',
'LOCATION': 'cache_table',
}
}
```

Make sure you have database settings configured. Run the below command to create cache_table in database::

```
python manage.py createcachetable
```

You may choose whatever cache backend you want to use.

4. (optional) Set variables in MAX_ALLOWED_HITS_PER_IP and IP_HITS_TIMEOUT in settings.py file::

MAX_ALLOWED_HITS_PER_IP = 2000 # max allowed hits per IP_TIMEOUT time from an IP. Default 2000.
IP_HITS_TIMEOUT = 60 # timeout in seconds for IP in cache. Default 60.

5. (optional) Set variable ANTI_CRAWLER_WHITELIST_BOTS and IP_HITS_TIMEOUT in settings.py file to whitelist specific bots or Ips::

To test on local system, set these values to very low, e.g. IP_HITS_TIMEOUT = 30 and MAX_ALLOWED_HITS_PER_IP = 2.
Restart the server and send requests frequently. After two requests you will start receiving 403 error.
If not defined in settings file, default values will be used.

## Authors

* **Muhammad Haseeb** - *Initial work* - [Muhammad Haseeb](https://github.com/iam-mhaseeb)

## Licensing
The project is [MIT Licenced](LICENSE).