{"id":24762012,"url":"https://github.com/alecxcode/table-parser","last_synced_at":"2025-03-23T13:17:04.701Z","repository":{"id":226329007,"uuid":"343893511","full_name":"alecxcode/table-parser","owner":"alecxcode","description":"Python Table Parser (data extraction)","archived":false,"fork":false,"pushed_at":"2021-03-06T09:19:28.000Z","size":8,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-01-28T19:20:06.476Z","etag":null,"topics":["automation","data","extraction","python","robotic-process-automation"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/alecxcode.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2021-03-02T19:47:00.000Z","updated_at":"2022-05-23T19:22:17.000Z","dependencies_parsed_at":"2024-03-09T04:16:03.569Z","dependency_job_id":null,"html_url":"https://github.com/alecxcode/table-parser","commit_stats":null,"previous_names":["alecxcode/table-parser"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alecxcode%2Ftable-parser","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alecxcode%2Ftable-parser/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alecxcode%2Ftable-parser/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alecxcode%2Ftable-parser/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/alecxcode","download_url":"https://codeload.github.com/alecxcode/table-parser/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245104526,"owners_count":20561380,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["automation","data","extraction","python","robotic-process-automation"],"created_at":"2025-01-28T19:20:13.491Z","updated_at":"2025-03-23T13:17:04.656Z","avatar_url":"https://github.com/alecxcode.png","language":"Python","readme":"# Table Parser\r\n\r\nThis simple Python program is a parser example, which reads HTML table and some other related pages. It extracts data from table cells and other pages into a list and a dictionary and stores in SQLite database. Hope it will be useful for someone, as this task seems to be common enough.  \r\nThe parser just gets the raw html page, then finds all `\u003cTD\u003e` tags, then fetches data from them, then loads pages related to these data, parses them, retrieves additional data from these pages (RegEx and BeautifulSoup examples are present), saves everything to DB.  \r\nYou can find cookie saving and loading, log in to the site by POST request and some other useful code snippets here.  \r\nThe license for the software is **BSD-like**. You can use it almost without limitations.\r\n\r\n## Program requirements\r\n\r\nPython 3.8 or 3.9 will do. Other versions was not tested.  \r\nYou need to install the following packages:\r\n\r\n* BeautifulSoup (4.9 or newer tested)\r\n* Requests (2.24 or newer tested)\r\n\r\nHow to install them? Through pip:  \r\n```\r\npip install bs4 requests\r\n```\r\n\r\n## How to use this software  \r\nYou should carefully read the code and **change it** to your needs.  \r\nThis code is an abstract example.  \r\nThere are some test files in the directory `testfiles`, you can upload them to a test website to see how the program works with them.  \r\n\r\n#### The following is the dictionary of elements and their types:\r\n```python\r\nelements_dict[elem_ID]['elem_ID'] # integer\r\nelements_dict[elem_ID]['name1'] # string\r\nelements_dict[elem_ID]['name2'] # string\r\nelements_dict[elem_ID]['name3'] # string\r\nelements_dict[elem_ID]['date_and_time'] # datetime (in DB text)\r\nelements_dict[elem_ID]['somecontent'] # string\r\nelements_dict[elem_ID]['somelinks'] # list (in DB text)\r\nelements_dict[elem_ID]['somedata'] # string\r\n```\r\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Falecxcode%2Ftable-parser","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Falecxcode%2Ftable-parser","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Falecxcode%2Ftable-parser/lists"}