{"id":19002997,"url":"https://github.com/shah0150/grab_data","last_synced_at":"2025-07-30T19:33:41.603Z","repository":{"id":124233428,"uuid":"93647363","full_name":"shah0150/grab_data","owner":"shah0150","description":null,"archived":false,"fork":false,"pushed_at":"2017-06-07T15:15:08.000Z","size":1,"stargazers_count":0,"open_issues_count":0,"forks_count":1,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-02-21T13:44:00.057Z","etag":null,"topics":["python","scrapping-python"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/shah0150.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2017-06-07T14:54:50.000Z","updated_at":"2017-06-07T15:21:23.000Z","dependencies_parsed_at":null,"dependency_job_id":"e869feea-800b-480e-9137-cc0fb5e56514","html_url":"https://github.com/shah0150/grab_data","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/shah0150/grab_data","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/shah0150%2Fgrab_data","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/shah0150%2Fgrab_data/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/shah0150%2Fgrab_data/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/shah0150%2Fgrab_data/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/shah0150","download_url":"https://codeload.github.com/shah0150/grab_data/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/shah0150%2Fgrab_data/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":267928985,"owners_count":24167431,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-07-30T02:00:09.044Z","response_time":70,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["python","scrapping-python"],"created_at":"2024-11-08T18:17:38.204Z","updated_at":"2025-07-30T19:33:41.436Z","avatar_url":"https://github.com/shah0150.png","language":"Python","readme":"# Easy way to scrap data \n\nLet’s say you are searching the web for some raw data you need for a project and you stumble across a webpage \n\nBut the bad news is that the data lives inside a web page and there’s no API that you can use to grab the raw data. So now you have to waste 30 minutes throwing together a crappy script to download and parse out the data. It’s not hard, but it’s a waste of time that you could spend on something useful. And somehow 30 minutes always ends up being 2 hours.\n\nLuckily, there’s a super simple answer. The Pandas library has a built-in method to extract tabular data from html pages called read_html():\n\n```python\nimport pandas as pd\n\ntables = pd.read_html(\"http://apps.sandiego.gov/sdfiredispatch/\")\n\nprint(tables[0])\n```\nIt’s that simple! Pandas will find any significant html tables on the page and return each one as a new DataFrame object.\n\nTo upgrade our program from toy to real, let’s tell Pandas that row 0 of the table has column headers and ask it to convert text-based dates into time objects:\n\n```python\nimport pandas as pd\n\ncalls_df, = pd.read_html(\"http://apps.sandiego.gov/sdfiredispatch/\", header=0, parse_dates=[\"Call Date\"])\n\nprint(calls_df)\n```\n\nAnd how that the data lives in a DataFrame, the world is yours. Wish the data was available as json records? That’s just one more line of code!\n\n```python\nimport pandas as pd\n\ncalls_df, = pd.read_html(\"http://apps.sandiego.gov/sdfiredispatch/\", header=0, parse_dates=[\"Call Date\"])\n\nprint(calls_df.to_json(orient=\"records\", date_format=\"iso\"))\n```\n\nYou can even save the data right to a CSV or XLS file:\n\n```python\nimport pandas as pd\n\ncalls_df, = pd.read_html(\"http://apps.sandiego.gov/sdfiredispatch/\", header=0, parse_dates=[\"Call Date\"])\n\ncalls_df.to_csv(\"calls.csv\", index=False)\n```\nNone of this is rocket science or anything, but I use it so often that I thought it was worth sharing. Have fun!\n\n\n# Question you might have\n\ncalls_df, = pd.read_html(…`\n\nWhat is the purpose of the comma after the variable name?\n\n--\u003e This is tuple unpacking. The expression on the right side of the equals sign returns a tuple of values, and we can easily unpack it into variables without having to write an extra line of code.\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fshah0150%2Fgrab_data","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fshah0150%2Fgrab_data","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fshah0150%2Fgrab_data/lists"}