https://github.com/jrodal98/paginated-table-extractor
A python script that automates the extraction of data from paginated tables.
https://github.com/jrodal98/paginated-table-extractor
data-extraction selenium-python selenium-webdriver table-extraction webscraping
Last synced: 2 months ago
JSON representation
A python script that automates the extraction of data from paginated tables.
- Host: GitHub
- URL: https://github.com/jrodal98/paginated-table-extractor
- Owner: jrodal98
- Created: 2018-06-12T18:59:57.000Z (about 8 years ago)
- Default Branch: master
- Last Pushed: 2022-07-06T20:10:42.000Z (almost 4 years ago)
- Last Synced: 2025-02-16T22:19:01.303Z (over 1 year ago)
- Topics: data-extraction, selenium-python, selenium-webdriver, table-extraction, webscraping
- Language: Python
- Homepage:
- Size: 5.53 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Paginated-Table-Extractor
A python script that automates the extraction of data from paginated tables.

The above gif shows a table with 10,678 instances over 108 pages being extracted into a pandas dataframe in less than 10 seconds. This was the code that produced that result:
```python
simple_df = read_paginated_table(
"https://cavdailyonline.github.io/facultysalarygryphon/",
'#data-table-container',
'#data-table-container_wrapper > div.dataTables_paginate.paging_bootstrap.pagination > ul > li.next > a',
show_more_option='#data-table-container_length > label > select > option:nth-child(4)',
delay=0)
```
## Download instructions
1) Clone this repository.
```bash
git clone https://github.com/jrodal98/Paginated-Table-Extractor.git
```
2) Install python dependencies. Something similar to this should do the job.
```bash
cd Paginated-Table-Extractor
pip3 install -r requirements.txt
```
3) Install chromedriver [here](http://chromedriver.chromium.org/downloads). Depending on your operating system, you might have to add it to your path, which is left as an exercise to the reader. If the script complains about not being able to find the driver but you installed it, then you need to add it to your path.
4) **optional**: Run test.py to make sure that everything is working properly and to get a feel for how to use the script.