{"id":18486859,"url":"https://github.com/jrodal98/paginated-table-extractor","last_synced_at":"2026-04-11T20:40:14.320Z","repository":{"id":53297254,"uuid":"137115403","full_name":"jrodal98/Paginated-Table-Extractor","owner":"jrodal98","description":"A python script that automates the extraction of data from paginated tables.","archived":false,"fork":false,"pushed_at":"2022-07-06T20:10:42.000Z","size":5797,"stargazers_count":0,"open_issues_count":1,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-02-16T22:19:01.303Z","etag":null,"topics":["data-extraction","selenium-python","selenium-webdriver","table-extraction","webscraping"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/jrodal98.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2018-06-12T18:59:57.000Z","updated_at":"2019-07-04T23:01:03.000Z","dependencies_parsed_at":"2022-08-28T17:32:19.811Z","dependency_job_id":null,"html_url":"https://github.com/jrodal98/Paginated-Table-Extractor","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jrodal98%2FPaginated-Table-Extractor","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jrodal98%2FPaginated-Table-Extractor/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jrodal98%2FPaginated-Table-Extractor/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jrodal98%2FPaginated-Table-Extractor/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/jrodal98","download_url":"https://codeload.github.com/jrodal98/Paginated-Table-Extractor/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254036843,"owners_count":22003654,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-extraction","selenium-python","selenium-webdriver","table-extraction","webscraping"],"created_at":"2024-11-06T12:49:54.028Z","updated_at":"2025-10-16T00:20:41.588Z","avatar_url":"https://github.com/jrodal98.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Paginated-Table-Extractor\nA python script that automates the extraction of data from paginated tables.\n\n![](simple.gif)\n\nThe above gif shows a table with 10,678 instances over 108 pages being extracted into a pandas dataframe in less than 10 seconds.  This was the code that produced that result:\n\n```python\nsimple_df = read_paginated_table(\n    \"https://cavdailyonline.github.io/facultysalarygryphon/\", \n    '#data-table-container',\n    '#data-table-container_wrapper \u003e div.dataTables_paginate.paging_bootstrap.pagination \u003e ul \u003e li.next \u003e a',\n    show_more_option='#data-table-container_length \u003e label \u003e select \u003e option:nth-child(4)',\n    delay=0)\n```\n\n## Download instructions\n1) Clone this repository.\n```bash\ngit clone https://github.com/jrodal98/Paginated-Table-Extractor.git\n```\n2) Install python dependencies.  Something similar to this should do the job.\n```bash\ncd Paginated-Table-Extractor\npip3 install -r requirements.txt\n```\n3) Install chromedriver [here](http://chromedriver.chromium.org/downloads).  Depending on your operating system, you might have to add it to your path, which is left as an exercise to the reader.  If the script complains about not being able to find the driver but you installed it, then you need to add it to your path. \n\n4) **optional**: Run test.py to make sure that everything is working properly and to get a feel for how to use the script.\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjrodal98%2Fpaginated-table-extractor","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjrodal98%2Fpaginated-table-extractor","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjrodal98%2Fpaginated-table-extractor/lists"}