{"id":22330445,"url":"https://github.com/tuvimen/stackexchange-scraper","last_synced_at":"2025-06-26T20:10:25.647Z","repository":{"id":74846543,"uuid":"565180643","full_name":"TUVIMEN/stackexchange-scraper","owner":"TUVIMEN","description":"A bash script for scraping stackexchange forums to json","archived":false,"fork":false,"pushed_at":"2024-05-30T18:39:45.000Z","size":175,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-03-26T07:09:29.755Z","etag":null,"topics":["bash","json","reliq","scraper","stackexchange","stackoverflow"],"latest_commit_sha":null,"homepage":"","language":"Shell","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/TUVIMEN.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-11-12T15:37:46.000Z","updated_at":"2024-09-11T12:18:30.000Z","dependencies_parsed_at":"2023-12-18T20:56:10.790Z","dependency_job_id":"a4db00a7-de27-4158-a3a3-b78be1b74c7d","html_url":"https://github.com/TUVIMEN/stackexchange-scraper","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/TUVIMEN/stackexchange-scraper","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TUVIMEN%2Fstackexchange-scraper","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TUVIMEN%2Fstackexchange-scraper/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TUVIMEN%2Fstackexchange-scraper/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TUVIMEN%2Fstackexchange-scraper/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/TUVIMEN","download_url":"https://codeload.github.com/TUVIMEN/stackexchange-scraper/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TUVIMEN%2Fstackexchange-scraper/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":262137149,"owners_count":23264675,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bash","json","reliq","scraper","stackexchange","stackoverflow"],"created_at":"2024-12-04T04:06:54.926Z","updated_at":"2025-06-26T20:10:25.639Z","avatar_url":"https://github.com/TUVIMEN.png","language":"Shell","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Archive\n\nAny further development has been transfered to [forumscraper](https://github.com/TUVIMEN/forumscraper).\n\n# stackexchange-scraper\n\nA bash script for scraping stackexchange forums in json.\n\n## Requirements\n\n - [reliq](https://github.com/TUVIMEN/reliq)\n - [jq](https://github.com/stedolan/jq)\n\n## Installation\n\n    install -m 755 stackexchange-scraper /usr/bin\n\n## Supported sites\n\nAll supported sites can be found [here](https://stackexchange.com/sites)\n\n## Supported links formats\n\n    https://example.stackexchange.com\n    https://stackoverflow.com\n    https://example.stackexchange.com/questions\n    https://example.stackexchange.com/tags\n    https://example.stackexchange.com/questions/tagged/some-tag\n    https://example.stackexchange.com/questions/19924\n    https://example.stackexchange.com/questions/19924/issue-title\n\n## Json format\n\nHere's [json](example.json) from one of the most popular issues on [stackoverflow](https://stackoverflow.com/questions/927358/how-do-i-undo-the-most-recent-local-commits-in-git)\n\n## Usage\n\n    stackexchange-scraper directory [URLS...]\n\nAll options should be specified before the directory.\n\nThe script writes every question to file in the directory (without overwriting them) and names of the files is the ids of questions.\n\nSize of the page is 50 questions.\n\nSince ids of the questions overlap with different websites it is recommended to download them into separate directories.\n\nDownload all tags to directory x with delay of 0.2 second and randomness of 3 seconds\n\n    stackexchange-scraper -d 0.2 -r 3 x 'https://stackoverflow.com/tags'\n\nDownload all questions tagged python to directory x while using 5 processes\n\n    stackexchange-scraper -p 5 x 'https://stackoverflow.com/questions/tagged/python'\n\nDownload all questions starting from 50th page to 75th page to directory x\n\n    stackexchange-scraper -f 50 -l 75 x 'https://stackoverflow.com'\n\nDownload some questions to directory x\n\n    stackexchange-scraper x 'https://stackoverflow.com/questions/3737139/reference-what-does-this-symbol-mean-in-php' 'https://stackoverflow.com/questions/60174/how-can-i-prevent-sql-injection-in-php' 'https://stackoverflow.com/questions/391005/how-can-i-add-html-and-css-into-pdf'\n\nGet some help\n\n    stackexchange-scraper -h\n\n## Issues\n\n### Too Many Requests\n\nBe warned that you should limit your requests to all of the websites from stackexchange family as the protection counts requests for all of them.\n\n### Large directory feature is not enabled on this filesystem\n\nStoring a lot of files in a single directory (around 7 milion) will cause bash to show errors of lack of space (even when there is space and free inodes).\n\nDmesg will show you something like this:\n\n    [979260.318526] EXT4-fs warning (device sda1): ext4_dx_add_entry:2516: Directory (ino: 89391105) index full, reach max htree level :2\n    [979260.318528] EXT4-fs warning (device sda1): ext4_dx_add_entry:2520: Large directory feature is not enabled on this filesystem\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftuvimen%2Fstackexchange-scraper","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftuvimen%2Fstackexchange-scraper","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftuvimen%2Fstackexchange-scraper/lists"}