{"id":23726006,"url":"https://github.com/voidful/exam-tw-crawler","last_synced_at":"2025-08-04T01:37:44.249Z","repository":{"id":156327280,"uuid":"632959277","full_name":"voidful/exam-tw-crawler","owner":"voidful","description":null,"archived":false,"fork":false,"pushed_at":"2023-04-26T13:41:06.000Z","size":3,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-05-23T17:08:54.641Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/voidful.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-04-26T13:34:03.000Z","updated_at":"2023-04-26T13:39:31.000Z","dependencies_parsed_at":null,"dependency_job_id":"9ed1c916-5670-4538-8e21-e3d18987d669","html_url":"https://github.com/voidful/exam-tw-crawler","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/voidful/exam-tw-crawler","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/voidful%2Fexam-tw-crawler","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/voidful%2Fexam-tw-crawler/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/voidful%2Fexam-tw-crawler/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/voidful%2Fexam-tw-crawler/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/voidful","download_url":"https://codeload.github.com/voidful/exam-tw-crawler/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/voidful%2Fexam-tw-crawler/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":268638018,"owners_count":24282465,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-03T02:00:12.545Z","response_time":2577,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-12-31T00:18:08.235Z","updated_at":"2025-08-04T01:37:44.234Z","avatar_url":"https://github.com/voidful.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Web Scraper for Exam Papers from exam.naer.edu.tw\n\nThis is a Python script for scraping exam papers from the website exam.naer.edu.tw. The script uses the `requests`, `bs4`, and `tqdm` libraries to send HTTP requests, parse HTML, and display a progress bar, respectively.\n\n## Requirements\n\n- Python 3.x\n- `requests`\n- `bs4` (BeautifulSoup)\n- `tqdm`\n\nYou can install the required libraries by running the following command:\n\n```\npip install -r requirements.txt\n```\n\n## Usage\n\n1. Clone this repository:\n\n```\ngit clone https://github.com/your_username/exam-papers-scraper.git\ncd exam-papers-scraper\n```\n\n2. Run the script:\n\n```\npython scraper.py\n```\n\nThe script will start scraping exam papers from page 1 to page 4767 (as of September 2021) and save the results in a JSON file named `data.json`. If the file already exists, the script will resume scraping from the last page recorded in the file.\n\nYou can change the number of pages to scrape by modifying the `total_pages` variable in the script.\n\n## Output\n\nThe scraped data will be saved in a JSON file named `data.json`. Each record in the file contains the following fields:\n\n- \"縣市\": The city or county where the school is located.\n- \"學校名稱\": The name of the school.\n- \"年級\": The grade level.\n- \"學年度\": The academic year.\n- \"領域/群科\": The domain or group of the exam paper.\n- \"科目\": The subject of the exam paper.\n- \"種類\": The type of the exam paper.\n- \"版本\": The version of the exam paper.\n- \"點閱率\": The number of views of the exam paper.\n- \"下載試卷\": The download link for the exam paper.\n- \"下載答案\": The download link for the answer paper.\n- \"page\": The page number where the record was found.\n\n## License\n\nThis script is licensed under the [MIT License](LICENSE). Feel free to use, modify, and distribute the script as long as you include the original license file.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvoidful%2Fexam-tw-crawler","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fvoidful%2Fexam-tw-crawler","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvoidful%2Fexam-tw-crawler/lists"}