https://github.com/ahmedshahriar/bd-medicine-scraper
Scrapy-Django PostgreSQL integrated API with Proxy IP configuration that scrapes all medicine data (meds, prices, generics, companies, indications) from Bangladesh (30k+ pages)
https://github.com/ahmedshahriar/bd-medicine-scraper
django django-rest-framework drug manufacturer medicine medicine-database postgresql proxy-ip python python3 rest-api scrapy web-scraping
Last synced: 7 months ago
JSON representation
Scrapy-Django PostgreSQL integrated API with Proxy IP configuration that scrapes all medicine data (meds, prices, generics, companies, indications) from Bangladesh (30k+ pages)
- Host: GitHub
- URL: https://github.com/ahmedshahriar/bd-medicine-scraper
- Owner: ahmedshahriar
- License: apache-2.0
- Created: 2021-01-06T14:43:58.000Z (almost 5 years ago)
- Default Branch: dev
- Last Pushed: 2024-10-09T01:02:05.000Z (about 1 year ago)
- Last Synced: 2025-04-01T01:01:41.661Z (8 months ago)
- Topics: django, django-rest-framework, drug, manufacturer, medicine, medicine-database, postgresql, proxy-ip, python, python3, rest-api, scrapy, web-scraping
- Language: Python
- Homepage:
- Size: 133 KB
- Stars: 53
- Watchers: 1
- Forks: 13
- Open Issues: 11
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# bd-medicine-scraper
[](https://www.python.org/)  [](https://www.kaggle.com/ahmedshahriarsakib/bangladesh-medicine-analytics) [](https://github.dev/ahmedshahriar/bd-medicine-scraper)
## Overview
Welcome to the bd-medicine-scraper repository!
In this project, I scraped Medicine data (from [medex.com.bd](https://medex.com.bd)) using **scrapy** and integrated it with **Django REST Framework**. The data is stored in a **PostgreSQL** database. I designed the scraper in a way to keep the relations between models.
I also customized the django admin panels, added additional features such as -
- auto complete lookup relational fields
- custom filtering (alphabetical, model property)
- bulk actions (export to csv)
Other Customizations:
- custom scrapy command to run scrapy spiders from django command line. (ex- `python manage.py `)
- custom django commands
- to export models to csv. (`python manage.py `)
```
python manage.py export_medicine_data /home/ahmed/Desktop/medicine_data.csv
- to export generic monograph PDFs
```
python manage.py export_generics_monograph
I also added proxy configuration to scrapy.
## Run
Create a python virtual environment and run these commands from root directory-
```
pip insrall -r requirements.txt
```
This will run the django app-
```
python manage.py runserver
```
NB: Migrate before running the app
```
python manage.py makemigrations && python manage.py migrate
```
To run all spiders-
```
python run_crawler.py
```
To run a specific spider-
```
python manage.py
```
ex - `python manage.py med`
## Data Analytics
### Dataset
The scraped dataset is available in kaggle -
- [Assorted Medicine Dataset of Bangladesh](https://www.kaggle.com/ahmedshahriarsakib/assorted-medicine-dataset-of-bangladesh)
The dataset has 6 CSV files -
Here is a list of the CSV files with their featured columns:
1. medicine.csv (21k+ entries)
- brand name
- medicine type (allopathic or herbal)
- generic
- strength
- manufacturer
- package container (unit price and pack info)
- Package Size (unit price)
2. manufacturer.csv (245 entries)
- name
3. indication.csv (2k+ entries)
- name
4. generic.csv (~1700-1800 entries)
- name
- monographic link (PDF URL)
- drug class
- indication
- generic details such as "Indication description", "Pharmacology description", "Dosage & Administration description" etc.
5. drug class.csv (~400 entries)
- name
6. dosage form.csv (~120 entries)
- name
### Analytics
[Bangladesh Medicine Analytics - Notebook on Kaggle](https://www.kaggle.com/ahmedshahriarsakib/bangladesh-medicine-analytics)
## Tests
Workflow script - [django-ci.yml](https://github.com/ahmedshahriar/bd-medicine-scraper/blob/dev/.github/workflows/django-ci.yml)
Run the tests using:
```
coverage run --omit='*/venv/*' manage.py test
```
or
```
python manage.py test
```
Check the coverage
```
coverage html
```
## Built With
```
Django==3.2.12
djangorestframework==3.12.2
django-admin-autocomplete-filter==0.7.1
django-filter==21.1
coverage==6.2
Scrapy==2.4.1
scrapy-djangoitem==1.1.1
psycopg2==2.9.3
```
## Preview



