{"id":16992806,"url":"https://github.com/jamesturk/spatula","last_synced_at":"2025-04-05T01:04:33.103Z","repository":{"id":41117565,"uuid":"82637261","full_name":"jamesturk/spatula","owner":"jamesturk","description":"A modern Python library for writing maintainable web scrapers.","archived":false,"fork":false,"pushed_at":"2024-07-10T07:18:10.000Z","size":1321,"stargazers_count":247,"open_issues_count":9,"forks_count":11,"subscribers_count":6,"default_branch":"main","last_synced_at":"2025-03-29T00:05:07.788Z","etag":null,"topics":["hacktoberfest","python3","scraping"],"latest_commit_sha":null,"homepage":"https://jamesturk.github.io/spatula/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/jamesturk.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"docs/contributing.md","funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":"docs/code_of_conduct.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null},"funding":{"github":["jamesturk"]}},"created_at":"2017-02-21T04:49:00.000Z","updated_at":"2025-03-21T18:47:34.000Z","dependencies_parsed_at":"2024-10-30T17:02:55.243Z","dependency_job_id":"5344a3a6-d3cb-4ef8-99c4-9fc4557810b4","html_url":"https://github.com/jamesturk/spatula","commit_stats":{"total_commits":288,"total_committers":4,"mean_commits":72.0,"dds":0.04166666666666663,"last_synced_commit":"9ff3678740eec19ae528aafc5f4f12d41c2f35b8"},"previous_names":[],"tags_count":21,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jamesturk%2Fspatula","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jamesturk%2Fspatula/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jamesturk%2Fspatula/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jamesturk%2Fspatula/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/jamesturk","download_url":"https://codeload.github.com/jamesturk/spatula/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247271519,"owners_count":20911587,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["hacktoberfest","python3","scraping"],"created_at":"2024-10-14T03:30:15.763Z","updated_at":"2025-04-05T01:04:33.082Z","avatar_url":"https://github.com/jamesturk.png","language":"Python","funding_links":["https://github.com/sponsors/jamesturk"],"categories":[],"sub_categories":[],"readme":"# Overview\n\n*spatula* is a modern Python library for writing maintainable web scrapers.\n\nSource: [https://github.com/jamesturk/spatula](https://github.com/jamesturk/spatula)\n\nDocumentation: [https://jamesturk.github.io/spatula/](https://jamesturk.github.io/spatula/)\n\nIssues: [https://github.com/jamesturk/spatula/issues](https://github.com/jamesturk/spatula/issues)\n\n[![PyPI badge](https://badge.fury.io/py/spatula.svg)](https://badge.fury.io/py/spatula)\n[![Test badge](https://github.com/jamesturk/spatula/workflows/Test%20\u0026%20Lint/badge.svg)](https://github.com/jamesturk/spatula/actions?query=workflow%3A%22Test+%26+Lint%22)\n\n## Features\n\n- **Page-oriented design**: Encourages writing understandable \u0026 maintainable scrapers.\n- **Not Just HTML**: Provides built in [handlers for common data formats](https://jamesturk.github.io/spatula/reference/#pages) including CSV, JSON, XML, PDF, and Excel.  Or write your own.\n- **Fast HTML parsing**: Uses `lxml.html` for fast, consistent, and reliable parsing of HTML.\n- **Flexible Data Model Support**: Compatible with `dataclasses`, `attrs`, `pydantic`, or bring your own data model classes for storing \u0026 validating your scraped data.\n- **CLI Tools**: Offers several [CLI utilities](https://jamesturk.github.io/spatula/cli/) that can help streamline development \u0026 testing cycle.\n- **Fully Typed**: Makes full use of Python 3 type annotations.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjamesturk%2Fspatula","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjamesturk%2Fspatula","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjamesturk%2Fspatula/lists"}