{"id":21240588,"url":"https://github.com/a252937166/quick-selenium","last_synced_at":"2025-07-20T00:32:47.371Z","repository":{"id":40629391,"uuid":"120719513","full_name":"a252937166/quick-selenium","owner":"a252937166","description":"主要使用quick-spring和selenium两个框架爬取各种动态网页的信息","archived":false,"fork":false,"pushed_at":"2018-08-07T01:42:52.000Z","size":5205,"stargazers_count":7,"open_issues_count":0,"forks_count":7,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-07-11T01:01:28.380Z","etag":null,"topics":["crawler","quickstart","selenium"],"latest_commit_sha":null,"homepage":null,"language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/a252937166.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2018-02-08T06:15:51.000Z","updated_at":"2022-09-08T08:20:30.000Z","dependencies_parsed_at":"2022-09-09T00:31:51.922Z","dependency_job_id":null,"html_url":"https://github.com/a252937166/quick-selenium","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/a252937166/quick-selenium","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/a252937166%2Fquick-selenium","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/a252937166%2Fquick-selenium/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/a252937166%2Fquick-selenium/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/a252937166%2Fquick-selenium/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/a252937166","download_url":"https://codeload.github.com/a252937166/quick-selenium/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/a252937166%2Fquick-selenium/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":266048604,"owners_count":23868742,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["crawler","quickstart","selenium"],"created_at":"2024-11-21T00:52:01.844Z","updated_at":"2025-07-20T00:32:45.438Z","avatar_url":"https://github.com/a252937166.png","language":"Java","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003e我之前推荐过大家使用`seimiagent+seimicrawler`，但是经过我多次试验，在爬取任务过多，比如线程数超过几十的时候，`seimiagent`会经常崩溃，当然这也和启动`seimiagent`的服务器有关。\n\u003e鉴于`seimiagent`的性能不适合普通装备的爬虫爱好者，我重新写了一款`quick-spring+selenium`的最简爬虫案例，供大家参考。\n\n# 项目地址\n\nhttps://github.com/a252937166/quick-selenium.git\n\n# 项目介绍\n\n## 框架\n\n-  quick-spring\n便于在`main()`方法中使用`spring`和`mybatis`的相关语法，具体介绍详见：https://github.com/a252937166/quick-spring\n\n-  selenium\n这就不用多介绍了吧，百度一搜就知道了，用来解析网页的框架。\n\n## 结构\n\n![这里写图片描述](https://qiniu.ouyanglol.com//github/quick-selenium/20180222154513927.jpg)\n\n\u003ccenter\u003e图(1)\u003c/center\u003e\n\n比较重要的文件我都勾画出来了。\n\n- ComicCrawler.java\n控制每个网页的具体爬虫逻辑。\n\n- App.java\n爬虫启动类。\n\n- application.properties\n一些关键的配置信息，根据你自己的配置修改就行了。\n\n- chromedriver\n我这里上传的是`linux`环境的驱动器，如果是你是`windows`系统，请到http://npm.taobao.org/mirrors/chromedriver/自己下载。\n\n- config.ini\n网页驱动器的配置文件，比如你要选择哪一种驱动器，我这里选中的是`chromedriver`，因为目前根据我的测试，它要比`phantomjs`稳定一点。\n\n- quick-applicationContext.xml\n可以自己修改一些连接池的配置。\n\n# 快速启动\n\n## 修改配置文件\n\n根据自己的配置，修改好`application.properties`、`config.ini`、`quick-applicationContext.xml`的内容。\n`qiniu_cdn`这些不用管，这是我把爬到的内容上传到七牛云的时候用到的。\n\n## WebDriverPool.java\n\n找到\n\n```\nprivate static final String DEFAULT_CONFIG_FILE = \"/Users/Ouyang/Documents/myProjects/quick-selenium/src/main/resources/config.ini\";\n\n```\n修改为自己的`config.ini`的路径。\n\n## App.java\n\n```\npackage com.ouyanglol.start;\n\nimport com.ouyanglol.core.QuickBase;\nimport com.ouyanglol.crawler.ComicCrawler;\n\n\n/**\n * Package: com.ouyanglol.start\n *\n * @Author: Ouyang\n * @Date: 2018/2/2\n */\npublic class App {\n    public static void main(String[] args) throws Exception {\n        System.getProperties().setProperty(\"webdriver.chrome.driver\", \"/Users/Ouyang/Documents/myProjects/chromedriver\");\n        QuickBase quickBase = QuickBase.getInstance(\"crawler\");\n        ComicCrawler crawler = (ComicCrawler) quickBase.getQuick(\"ComicCrawler\");\n        crawler.start(\"https://manhua.dmzj.com/yiquanchaoren/\");//一拳超人爬虫开始网址\n    }\n}\n```\n\n修改\n\n```\nSystem.getProperties().setProperty(\"webdriver.chrome.driver\", \"/Users/Ouyang/Documents/myProjects/chromedriver\");\n```\n为自己的`chromedriver`的路径，如果使用`phantomjs`就不用了，`phantomjs`的配置在`config.ini`里面声明。\n\n填写自己的爬虫开始路径。\n```\ncrawler.start(\"https://manhua.dmzj.com/yiquanchaoren/\");\n```\n\n## ComicDriver.java\n\n一定要注意使用`webDriver.quit();`，根据我多次的实验，长时间启动多个webDriver，不退出的话，也容易导致驱动器崩溃。\n\n如果你们电脑配置过低，浏览器多次崩溃，不妨取消\n\n```\n//                if (i%50==0) {\n//                    webDriver.quit();\n//                    webDriver = webDriverPool.get();\n//                }\n```\n这一段的注释，每解析50个网页就启动一个新的驱动器。\n\n## ComicContentService.java\n\n```\nqiniuUtil.uploadImg(fileName,imageUrl,chapterUrl);//把图片保存到七牛云，图片的处理方式可以自己决定\n\n```\n没有七牛云的同学，可以把这段代码注释，以免报错。\n\n## comic.sql\n\n运行其中sql，初始化数据库，最后启动`App.java`中的`main()`方法就可以了。\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fa252937166%2Fquick-selenium","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fa252937166%2Fquick-selenium","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fa252937166%2Fquick-selenium/lists"}