{"id":18639095,"url":"https://github.com/zseta/scrapy-templates","last_synced_at":"2025-06-26T00:03:12.895Z","repository":{"id":37444935,"uuid":"344554877","full_name":"zseta/scrapy-templates","owner":"zseta","description":null,"archived":false,"fork":false,"pushed_at":"2021-03-04T17:38:47.000Z","size":6,"stargazers_count":20,"open_issues_count":0,"forks_count":2,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-06-26T00:02:09.245Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/zseta.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2021-03-04T17:27:18.000Z","updated_at":"2024-12-31T13:59:12.000Z","dependencies_parsed_at":"2022-08-19T17:01:00.445Z","dependency_job_id":null,"html_url":"https://github.com/zseta/scrapy-templates","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/zseta/scrapy-templates","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zseta%2Fscrapy-templates","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zseta%2Fscrapy-templates/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zseta%2Fscrapy-templates/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zseta%2Fscrapy-templates/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/zseta","download_url":"https://codeload.github.com/zseta/scrapy-templates/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zseta%2Fscrapy-templates/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":261973721,"owners_count":23238585,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-07T05:45:01.882Z","updated_at":"2025-06-26T00:03:12.773Z","avatar_url":"https://github.com/zseta.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# scrapy-templates\nScrapy spider templates for different kinds of websites.\n\n## How to figure out which template you need?\n\nThe quickest way is to figure out your **crawling logic** first, what requests you \nneed to make to get to the data, in terms of behaviour \u0026 deepness:\n\n### Behaviour\n\nBehaviour describes what the spider does on the level it's on at the moment. \nThis can be:\n\n- **Extraction** (extract data fields)\n- **Following** (making a request to go deeper in the website eg. from `example.com` \n  to `example.com/page`)\n- **Pagination** (making a request to paginate - spider stays on the same level eg.\n  from `example.com/page/1` to `example.com/page/2`)\n\n### Deepness\n\nDeepness means how deep your spider is at the moment (while crawling), \nrelative to the start url.\n\nFor example, if your spider starts at `example.com` that's *level 0*, then you make a request on \nthat page to `example.com/page`, your spider is on *level 1* now. Then, if you \ngo to `example.com/page/sub-page`, that's *level 2*.\n\n## Currently available templates\n\n### -  [Ext.py](templates/Ext.py)\n1. Extract data fields (level 0)\n\n### - [ExtPag.py](templates/ExtPag.py)\n1. Extract data fields + paginate (level 0)\n\n### - [Fol_Ext.py](templates/Fol_Ext.py)\n1. Follow urls (level 0)\n2. Extract data fields (level 1)\n\n### - [Fol_ExtPag.py](templates/Fol_ExtPag.py)\n1. Follow urls (level 0)\n2. Extract data fields + paginate (level 1)\n\n### - [Fol_Fol_Ext.py](templates/Fol_Fol_Ext.py)\n1. Follow urls (level 0)\n2. Follow urls (level 1)\n3. Extract data fields + paginate (level 2)\n\n### - [Fol_FolPag_Ext.py](templates/Fol_FolPag_Ext.py)\n1. Follow urls (level 0)\n2. Follow urls + paginate (level 1)\n3. Extract data fields (level 2)\n\n### - [FolPag_Ext.py](templates/FolPag_Ext.py)\n1. Follow urls + paginate (level 0)\n2. Extract data fields (level 1)\n\n### - [sitemap.py](templates/sitemap.py)\n1. Extract data from sitemap\n\n## File naming convention\n\nEach template file name is supposed to clearly show the crawling logic of the \nspider. So once you know the crawling logic you need for the website \nand understand the naming convention of the files, you should be able to \npick your template.\n\nA template file's name contains all the behaviours a spider does, which can be:\n\n- **Extraction** --\u003e represented as `Ext` in the file name\n- **Following** --\u003e represented as `Fol` in the file name\n- **Pagination** --\u003e represented as `Pag` in the file name\n\nTwo behaviours in the file name are separated by an `_` (underscore) if the \nsecond behaviour is done one level deeper. If they are not separated by an \nunderscore, that means they happen on the same level.\n\n## How to contribute?\n\n### Submit a new template\nThe most useful thing you can do is to submit a new sipder template which \nhasn't been made yet. You can do this by:\n\n1. Fork this repo\n2. Add the new template file you created\n3. Submit a pull request according to guidelines\n\n### Request a new template to be made\nIf you have an idea for a template, and you don't feel like submitting a pull \nrequest, create an issue about it. So maybe someone will take it and implment it.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzseta%2Fscrapy-templates","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fzseta%2Fscrapy-templates","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzseta%2Fscrapy-templates/lists"}