{"id":21694386,"url":"https://github.com/andre-seiji/python-data-web-scraping-example","last_synced_at":"2026-04-17T02:33:57.044Z","repository":{"id":174490821,"uuid":"389665858","full_name":"Andre-Seiji/Python-data-Web-Scraping-example","owner":"Andre-Seiji","description":"Web Scraping html, pandas DataFrame conversion, data validation and export to Excel file. A COVID-19 database was used as an example. ","archived":false,"fork":false,"pushed_at":"2021-07-27T12:29:37.000Z","size":650,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-03-20T14:28:38.881Z","etag":null,"topics":["covid-19","export-to-excel","html","pandas-dataframe","python","selenium","webscraping"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Andre-Seiji.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-07-26T14:41:32.000Z","updated_at":"2021-07-30T15:38:12.000Z","dependencies_parsed_at":"2023-07-04T19:35:55.991Z","dependency_job_id":null,"html_url":"https://github.com/Andre-Seiji/Python-data-Web-Scraping-example","commit_stats":null,"previous_names":["andre-seiji/python-data-web-scraping-example"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/Andre-Seiji/Python-data-Web-Scraping-example","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Andre-Seiji%2FPython-data-Web-Scraping-example","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Andre-Seiji%2FPython-data-Web-Scraping-example/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Andre-Seiji%2FPython-data-Web-Scraping-example/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Andre-Seiji%2FPython-data-Web-Scraping-example/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Andre-Seiji","download_url":"https://codeload.github.com/Andre-Seiji/Python-data-Web-Scraping-example/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Andre-Seiji%2FPython-data-Web-Scraping-example/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31912513,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-16T18:22:33.417Z","status":"online","status_checked_at":"2026-04-17T02:00:06.879Z","response_time":62,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["covid-19","export-to-excel","html","pandas-dataframe","python","selenium","webscraping"],"created_at":"2024-11-25T18:28:11.366Z","updated_at":"2026-04-17T02:33:56.363Z","avatar_url":"https://github.com/Andre-Seiji.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Python data Web Scraping example\n The main goal of this project is to do a validation of a COVID-19 database, checking if the values are correct for each region.\n In order to do that, the code has four objectives: Web Scraping of a COVID-19 database, a pandas DataFrame conversion, data validation and an export to Excel file. \n \n# 1. Web Scrapping: \nWith Selenium the code access the web page: https://worldometers.info/coronavirus/. Cookies elements must be accepted. \n\n![Webpage](https://user-images.githubusercontent.com/87708237/127035286-5972f207-899c-45c4-97d1-ce65bd57fbcd.JPG)\n(COVID-19 DATABASE)\n\n# 2. Pandas DataFrame conversion: \nFor each region (Europe, North America, Asia, South America, Africa and Oceania) the code has to search the entire html until a table is found. This table is then converted through pandas extension. This was the most difficult because it was necessary to modify the table and its values, changing 'NaN' values to zeros and converting non-numeric objects to numeric.\n\n# 3. Data validation: \nThe validation test is to verify if the sum of all the countries values of a region is the same as the total value of that region. This process is done with all columns.\n\n![Validation criteria](https://user-images.githubusercontent.com/87708237/127041123-e84fbbc8-9a91-4a3d-8e17-d7bc8ffa1753.jpg)\n(Validation test)\n\n# 4. Export to Excel file: \nThe code was written with google colab. The image below shows where the Excel file can be downloaded. \n\n![Export_to_Excel](https://user-images.githubusercontent.com/87708237/127042603-28a8ce7c-98cd-41ea-a492-216095b2c362.jpg)\n\n\n\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fandre-seiji%2Fpython-data-web-scraping-example","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fandre-seiji%2Fpython-data-web-scraping-example","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fandre-seiji%2Fpython-data-web-scraping-example/lists"}