https://github.com/mccranky83/aistudy-docs-crawler

上海市中小学数字教学系统爬虫
https://github.com/mccranky83/aistudy-docs-crawler

crawler hoarding puppeteer

Last synced: about 1 year ago
JSON representation

上海市中小学数字教学系统爬虫

Host: GitHub
URL: https://github.com/mccranky83/aistudy-docs-crawler
Owner: Mccranky83
Archived: true
Created: 2024-07-23T14:04:36.000Z (almost 2 years ago)
Default Branch: main
Last Pushed: 2024-10-23T12:57:49.000Z (over 1 year ago)
Last Synced: 2025-02-16T00:23:10.282Z (over 1 year ago)
Topics: crawler, hoarding, puppeteer
Language: JavaScript
Homepage:
Size: 1.16 MB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

          # AiStudy Documents Crawler



> Due to the official notice that `本旧版过渡使用期至2025年7月31日`, I'm archiving this project.

This project encompasses a sophisticated web crawler engineered to systematically acquire educational resources from the [上海市中小学数字教学系统](https://sz-api.ai-study.net/).

> The crawler leverages Puppeteer, a Node.js library, to simulate human-like interactions with the Chromium browser, enabling the efficient extraction of download links.

> Subsequently, the tool employs the `curl` command-line utility to facilitate the recursive downloading of these resources to the local system.



  



## Installation

```bash

# Clone the repository

npm i # Installs project dependencies, including compatible Chrome

npm run start # Executes the start script, which runs `app/start.js`

```

## Examples

**Crawl first**


prompt> npm run start

Directly download or crawl first? (d/C) 

Run in headless mode? (Y/n) 

subjectIndex [1-17]: 1

Crawl documents or answer sheets? (D/a) 

subjectIndex [1-2]: 1

Startup grade [Default: 0]: 

Offset [Default: 100]: 

Startup semester [Default: 0]: 

Offset [Default: 100]: 

Startup unit [Default: 0]: 

Offset [Default: 100]: 

Startup course [Default: 0]: 

Offset [Default: 100]: 



**Direct download**


prompt> npm run start

Directly download or crawl first? (d/C) d

劳动 - 6.json

sitemapName:



**No available linkmaps**


prompt> npm run start

Directly download or crawl first? (d/C) d

No linkmaps available!

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/mccranky83/aistudy-docs-crawler

Awesome Lists containing this project

README