{"id":29244172,"url":"https://github.com/zeeklog/csdn-crawler","last_synced_at":"2026-03-03T19:01:35.218Z","repository":{"id":46142062,"uuid":"515100188","full_name":"zeeklog/csdn-crawler","owner":"zeeklog","description":"A Node.js Crawler for csdn.com. node 爬虫，csdn 爬虫, 爬取csdn 用户的全部文章。代码仅用于测试和交流学习，请勿用于不良用途。","archived":false,"fork":false,"pushed_at":"2024-08-23T02:00:54.000Z","size":138,"stargazers_count":2,"open_issues_count":0,"forks_count":1,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-08-09T04:05:29.259Z","etag":null,"topics":["crawl-user-article","csdn","csdn-crawler","csdn-docs","csdnspider","node-spider"],"latest_commit_sha":null,"homepage":"","language":"JavaScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/zeeklog.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-07-18T08:38:15.000Z","updated_at":"2024-08-23T02:00:57.000Z","dependencies_parsed_at":"2024-10-23T13:54:34.476Z","dependency_job_id":null,"html_url":"https://github.com/zeeklog/csdn-crawler","commit_stats":null,"previous_names":["zeeklog/csdn-crawler","ethwillupto10000/csdn-crawler"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/zeeklog/csdn-crawler","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zeeklog%2Fcsdn-crawler","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zeeklog%2Fcsdn-crawler/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zeeklog%2Fcsdn-crawler/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zeeklog%2Fcsdn-crawler/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/zeeklog","download_url":"https://codeload.github.com/zeeklog/csdn-crawler/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zeeklog%2Fcsdn-crawler/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":30056056,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-03-03T18:21:05.932Z","status":"ssl_error","status_checked_at":"2026-03-03T18:20:59.341Z","response_time":61,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["crawl-user-article","csdn","csdn-crawler","csdn-docs","csdnspider","node-spider"],"created_at":"2025-07-03T21:08:59.080Z","updated_at":"2026-03-03T19:01:35.197Z","avatar_url":"https://github.com/zeeklog.png","language":"JavaScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"# A Nodejs Crawler for crawling user's article from csdn.com.\n\u003e Only for Node.js Application, not work on browser.\n\n- Offer `options.username` will return you the user's article list(default length is 5);\n- Upload the Article's image to your own Qiniu Cloud Server when you offer the config: `options.qiniu\u003cobject\u003e`\n- Offer `options.page`, `options.size` can limit the page and size config for api\n\n### 为什么写这个？ / Why would I code this?\n\n- \u003e I want some data to fill my database for big-data's test, but it seems hard to me to write it myself(because I am so lazy).\n- \u003e May be so many coder face the same things like me. So, let me make this job become easier.\n- \u003e WARN: This repo is only for test and study, do not use this to run Pressure-Test on csdn.com.\n  \u003e And CSDN is Sucks!\n\n### 实现原理 / How to fuck this site\n\n```shell\n# dependencies\n cheerio\n html-to-md\n pinyin\n request-promise\n \n # 使用request-primose获取目标文档\n # 通过cheerio解析HTML文档，获取文章内容\n # 使用html-to-md 解析HTML内容， 转为md\n # 使用pinyin生成文章alias\n \n```\n\n### 使用指南 / Usages\n\n#### 1、Fill you own config\n```javascript\n// Example:\nconst options = {\n    username: 'weixin_45534242', // target username\n    page: 1, // the page index you are crawling\n    size: 5, // page size\n    link: '', // the user center article list api, you can find it on csdn.com using: F12\n    businessType: 'blog', // crawl article type. only support 'blog' now.\n    sleepTime: null, // Unit is: ms. sleep time when you crawling the data, it may save your ip from blocking.\n    supportImageType: ['jpg', 'png', 'jpeg', 'webp', 'gif', 'mp4', 'bmp', 'svg'], // support uplaod image\n    imagePrefixName: 'crawl-', // upload image name prefix\n    contentNodeIdentify: '#article_content', // the html id name in article node\n    qiniu: {\n        zone: '', // Your qiniu cloud zone\n        scope: '', // Your qiniu scope name. Storage name.\n        useHttpsDomain: true, // like what you see. this is https setting\n        useCdnDomain: true, // config your cdn domain, it use on Article List Image\n        baseQiNiuCdnApi: '', // you CDN domain name\n        remoteFilePath: '/openStatic', // the folder path where you want to save img\n        isNeedWaterMark: false, // if `true`, you will need to offer qiniu image style name, write it below:\n        imageStyleSplitQuote: '\u0026', // the quote you use in image src link like: https://qiniu.com/asd.png\u0026scale-my-img\n        imageStyleName: '', // your qiniu style name\n        accessKey: '', // Qiniu cloud accessKey\n        secretKey: '', // Qiniu secretKey\n        imageBaseAlt: '' // image base alt message prefix\n    }\n}\n```\n\n#### 2、开始使用csdnCrawler / Fly your code now.\n```javascript\n// You can find this code on `./demo.js`\nconst csdnCrawler = require('./index')\nconst exampleOptions = {\n    username: 'weixin_45534242',\n    page: 1,\n    size: 5,\n    link: '',\n    businessType: 'blog',\n    sleepTime: null, // Unit is: ms\n    supportImageType: ['jpg', 'png', 'jpeg', 'webp', 'gif', 'mp4', 'bmp', 'svg'],\n    imagePrefixName: 'crawl-',\n    contentNodeIdentify: '#article_content',\n    qiniu: {}\n}\n\ncsdnCrawler(exampleOptions, data =\u003e {\n    console.log(data)\n    console.log(`==============================`)\n    console.log(`===  Demo Crawl Succeed !!!===`)\n    console.log(`==============================`)\n    console.log(`Total Data length : ${data.length}`)\n})\n```\n\n### 再次警告 / FBI WARN AGAIN( to save me from trouble)\n- Don't use this for bad purpose.\n- It may cause something bad result in CN(Maybe break the law...) and will drive you crazy.\n- Plz only use this for testing and study purpose.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzeeklog%2Fcsdn-crawler","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fzeeklog%2Fcsdn-crawler","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzeeklog%2Fcsdn-crawler/lists"}