{"id":15033240,"url":"https://github.com/jae-jae/querylist","last_synced_at":"2026-02-02T08:47:30.104Z","repository":{"id":37693081,"uuid":"48327510","full_name":"jae-jae/QueryList","owner":"jae-jae","description":":spider: The progressive PHP crawler framework!  优雅的渐进式PHP采集框架。","archived":false,"fork":false,"pushed_at":"2025-04-23T02:39:22.000Z","size":794,"stargazers_count":2679,"open_issues_count":9,"forks_count":439,"subscribers_count":74,"default_branch":"main","last_synced_at":"2025-05-11T05:49:19.814Z","etag":null,"topics":["crawler","querylist","scraper","spider"],"latest_commit_sha":null,"homepage":"https://querylist.cc","language":"PHP","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/jae-jae.png","metadata":{"files":{"readme":"README-ZH.md","changelog":null,"contributing":null,"funding":".github/FUNDING.yml","license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null},"funding":{"github":null,"patreon":null,"open_collective":"querylist","ko_fi":null,"tidelift":null,"community_bridge":null,"liberapay":null,"issuehunt":null,"otechie":null,"custom":null}},"created_at":"2015-12-20T15:58:35.000Z","updated_at":"2025-05-10T16:42:42.000Z","dependencies_parsed_at":"2024-03-26T13:25:51.222Z","dependency_job_id":"5e0c28c9-9224-4a74-a437-4c0fa23c0139","html_url":"https://github.com/jae-jae/QueryList","commit_stats":{"total_commits":144,"total_committers":14,"mean_commits":"10.285714285714286","dds":"0.42361111111111116","last_synced_commit":"87b405ecde30101ec8797c4347f05f9ee0b95eb2"},"previous_names":[],"tags_count":32,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jae-jae%2FQueryList","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jae-jae%2FQueryList/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jae-jae%2FQueryList/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jae-jae%2FQueryList/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/jae-jae","download_url":"https://codeload.github.com/jae-jae/QueryList/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":253523732,"owners_count":21921818,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["crawler","querylist","scraper","spider"],"created_at":"2024-09-24T20:20:28.664Z","updated_at":"2026-02-02T08:47:30.067Z","avatar_url":"https://github.com/jae-jae.png","language":"PHP","funding_links":["https://opencollective.com/querylist"],"categories":[],"sub_categories":[],"readme":"\u003cp align=\"center\"\u003e\n  \u003cimg width=\"150\" src=\"logo.svg\" alt=\"QueryList\"\u003e\n  \u003cbr\u003e\n  \u003cbr\u003e\n\u003c/p\u003e\n\n# QueryList  简介\n`QueryList`是一套简洁、优雅、可扩展的PHP采集工具(爬虫)，基于phpQuery。\n\n## 特性\n- 拥有与jQuery完全相同的CSS3 DOM选择器\n- 拥有与jQuery完全相同的DOM操作API\n- 拥有通用的列表采集方案\n- 拥有强大的HTTP请求套件，轻松实现如：模拟登陆、伪造浏览器、HTTP代理等意复杂的网络请求\n- 拥有乱码解决方案\n- 拥有强大的内容过滤功能，可使用jQuey选择器来过滤内容\n- 拥有高度的模块化设计，扩展性强\n- 拥有富有表现力的API\n- 拥有高质量文档\n- 拥有丰富的插件\n- 拥有专业的问答社区和交流群\n\n通过插件可以轻松实现诸如：\n- 多线程采集\n- 采集JavaScript动态渲染的页面 (PhantomJS/headless WebKit)\n- 图片本地化\n- 模拟浏览器行为，如：提交Form表单\n- 网络爬虫\n- .....\n\n## 环境要求\n- PHP \u003e= 8.1\n\n\u003e 如果你的PHP版本还停留在PHP5，或者不会使用Composer,你可以选择使用QueryList3,QueryList3支持php5.3以及手动安装。\nQueryList3 文档:http://v3.querylist.cc\n\n## 安装\n通过Composer安装:\n```\ncomposer require jaeger/querylist\n```\n\n## 使用\n\n#### 元素操作\n-  采集「昵图网」所有图片地址\n\n```php\nQueryList::get('http://www.nipic.com')-\u003efind('img')-\u003eattrs('src');\n```\n- 采集百度搜索结果\n\n```php\n$ql = QueryList::get('http://www.baidu.com/s?wd=QueryList');\n\n$ql-\u003efind('title')-\u003etext(); // 获取网站标题\n$ql-\u003efind('meta[name=keywords]')-\u003econtent; // 获取网站头部关键词\n\n$ql-\u003efind('h3\u003ea')-\u003etexts(); //获取搜索结果标题列表\n$ql-\u003efind('h3\u003ea')-\u003eattrs('href'); //获取搜索结果链接列表\n\n$ql-\u003efind('img')-\u003esrc; //获取第一张图片的链接地址\n$ql-\u003efind('img:eq(1)')-\u003esrc; //获取第二张图片的链接地址\n$ql-\u003efind('img')-\u003eeq(2)-\u003esrc; //获取第三张图片的链接地址\n// 遍历所有图片\n$ql-\u003efind('img')-\u003emap(function($img){\n\techo $img-\u003ealt;  //打印图片的alt属性\n});\n```\n- 更多用法\n\n```php\n$ql-\u003efind('#head')-\u003eappend('\u003cdiv\u003e追加内容\u003c/div\u003e')-\u003efind('div')-\u003ehtmls();\n$ql-\u003efind('.two')-\u003echildren('img')-\u003eattrs('alt'); //获取class为two元素下的所有img孩子节点\n//遍历class为two元素下的所有孩子节点\n$data = $ql-\u003efind('.two')-\u003echildren()-\u003emap(function ($item){\n    //用is判断节点类型\n    if($item-\u003eis('a')){\n        return $item-\u003etext();\n    }elseif($item-\u003eis('img'))\n    {\n        return $item-\u003ealt;\n    }\n});\n\n$ql-\u003efind('a')-\u003eattr('href', 'newVal')-\u003eremoveClass('className')-\u003ehtml('newHtml')-\u003e...\n$ql-\u003efind('div \u003e p')-\u003eadd('div \u003e ul')-\u003efilter(':has(a)')-\u003efind('p:first')-\u003enextAll()-\u003eandSelf()-\u003e...\n$ql-\u003efind('div.old')-\u003ereplaceWith( $ql-\u003efind('div.new')-\u003eclone())-\u003eappendTo('.trash')-\u003eprepend('Deleted')-\u003e...\n```\n#### 列表采集\n采集百度搜索结果列表的标题和链接:\n```php\n$data = QueryList::get('http://www.baidu.com/s?wd=QueryList')\n\t// 设置采集规则\n    -\u003erules([ \n\t    'title'=\u003earray('h3','text'),\n\t    'link'=\u003earray('h3\u003ea','href')\n\t])\n\t-\u003equery()-\u003egetData();\n\nprint_r($data-\u003eall());\n```\n采集结果:\n```\nArray\n(\n    [0] =\u003e Array\n        (\n            [title] =\u003e QueryList|基于phpQuery的无比强大的PHP采集工具\n            [link] =\u003e http://www.baidu.com/link?url=GU_YbDT2IHk4ns1tjG2I8_vjmH0SCJEAPuuZN\n        )\n    [1] =\u003e Array\n        (\n            [title] =\u003e PHP 用QueryList抓取网页内容 - wb145230 - 博客园\n            [link] =\u003e http://www.baidu.com/link?url=zn0DXBnrvIF2ibRVW34KcRVFG1_bCdZvqvwIhUqiXaS\n        )\n    [2] =\u003e Array\n        (\n            [title] =\u003e 介绍- QueryList指导文档\n            [link] =\u003e http://www.baidu.com/link?url=pSypvMovqS4v2sWeQo5fDBJ4EoYhXYi0Lxx\n        )\n        //...\n)\n```\n#### 编码转换\n```php\n// 输出编码:UTF-8,输入编码:GB2312\nQueryList::get('https://top.etao.com')-\u003eencoding('UTF-8','GB2312')-\u003efind('a')-\u003etexts();\n\n// 输出编码:UTF-8,输入编码:自动识别\nQueryList::get('https://top.etao.com')-\u003eencoding('UTF-8')-\u003efind('a')-\u003etexts();\n```\n\n#### HTTP网络操作（GuzzleHttp）\n- 携带cookie登录新浪微博\n```php\n//采集新浪微博需要登录才能访问的页面\n$ql = QueryList::get('http://weibo.com','param1=testvalue \u0026 params2=somevalue',[\n    'headers' =\u003e [\n        //填写从浏览器获取到的cookie\n        'Cookie' =\u003e 'SINAGLOBAL=546064; wb_cmtLike_2112031=1; wvr=6;....'\n    ]\n]);\n//echo $ql-\u003egetHtml();\necho $ql-\u003efind('title')-\u003etext();\n//输出: 我的首页 微博-随时随地发现新鲜事\n```\n- 使用Http代理\n```php\n$urlParams = ['param1' =\u003e 'testvalue','params2' =\u003e 'somevalue'];\n$opts = [\n\t// 设置http代理\n    'proxy' =\u003e 'http://222.141.11.17:8118',\n    //设置超时时间，单位：秒\n    'timeout' =\u003e 30,\n     // 伪造http头\n    'headers' =\u003e [\n        'Referer' =\u003e 'https://querylist.cc/',\n        'User-Agent' =\u003e 'testing/1.0',\n        'Accept'     =\u003e 'application/json',\n        'X-Foo'      =\u003e ['Bar', 'Baz'],\n        'Cookie'    =\u003e 'abc=111;xxx=222'\n    ]\n];\n$ql-\u003eget('http://httpbin.org/get',$urlParams,$opts);\n// echo $ql-\u003egetHtml();\n```\n\n- 模拟登录\n```php\n// 用post登录\n$ql = QueryList::post('http://xxxx.com/login',[\n    'username' =\u003e 'admin',\n    'password' =\u003e '123456'\n])-\u003eget('http://xxx.com/admin');\n//采集需要登录才能访问的页面\n$ql-\u003eget('http://xxx.com/admin/page');\n//echo $ql-\u003egetHtml();\n```\n\n#### Form表单操作\n模拟登陆GitHub\n```php\n// 获取QueryList实例\n$ql = QueryList::getInstance();\n//获取到登录表单\n$form = $ql-\u003eget('https://github.com/login')-\u003efind('form');\n\n//填写GitHub用户名和密码\n$form-\u003efind('input[name=login]')-\u003eval('your github username or email');\n$form-\u003efind('input[name=password]')-\u003eval('your github password');\n\n//序列化表单数据\n$fromData = $form-\u003eserializeArray();\n$postData = [];\nforeach ($fromData as $item) {\n    $postData[$item['name']] = $item['value'];\n}\n\n//提交登录表单\n$actionUrl = 'https://github.com'.$form-\u003eattr('action');\n$ql-\u003epost($actionUrl,$postData);\n//判断登录是否成功\n// echo $ql-\u003egetHtml();\n$userName = $ql-\u003efind('.header-nav-current-user\u003e.css-truncate-target')-\u003etext();\nif($userName)\n{\n    echo '登录成功!欢迎你:'.$userName;\n}else{\n    echo '登录失败!';\n}\n```\n#### Bind功能扩展\n自定义扩展一个`myHttp`方法:\n```php\n$ql = QueryList::getInstance();\n\n//绑定一个myHttp方法到QueryList对象\n$ql-\u003ebind('myHttp',function ($url){\n    // $this 为当前的QueryList对象\n    $html = file_get_contents($url);\n    $this-\u003esetHtml($html);\n    return $this;\n});\n\n//然后就可以通过注册的名字来调用\n$data = $ql-\u003emyHttp('https://toutiao.io')-\u003efind('h3 a')-\u003etexts();\nprint_r($data-\u003eall());\n```\n或者把实现体封装到class，然后这样绑定:\n```php\n$ql-\u003ebind('myHttp',function ($url){\n    return new MyHttp($this,$url);\n});\n```\n\n#### 插件使用\n- 使用PhantomJS插件采集JavaScript动态渲染的页面:\n\n```php\n// 安装时设置PhantomJS二进制文件路径 \n$ql = QueryList::use(PhantomJs::class,'/usr/local/bin/phantomjs');\n\n// 采集今日头条手机版\n$data = $ql-\u003ebrowser('https://m.toutiao.com')-\u003efind('p')-\u003etexts();\nprint_r($data-\u003eall());\n\n// 使用HTTP代理\n$ql-\u003ebrowser('https://m.toutiao.com',false,[\n\t'--proxy' =\u003e '192.168.1.42:8080',\n    '--proxy-type' =\u003e 'http'\n])\n```\n\n- 使用CURL多线程插件,多线程采集GitHub排行榜:\n\n```php\n$ql = QueryList::use(CurlMulti::class);\n$ql-\u003ecurlMulti([\n    'https://github.com/trending/php',\n    'https://github.com/trending/go',\n    //.....more urls\n])\n // 每个任务成功完成调用此回调\n -\u003esuccess(function (QueryList $ql,CurlMulti $curl,$r){\n    echo \"Current url:{$r['info']['url']} \\r\\n\";\n    $data = $ql-\u003efind('h3 a')-\u003etexts();\n    print_r($data-\u003eall());\n})\n // 每个任务失败回调\n-\u003eerror(function ($errorInfo,CurlMulti $curl){\n    echo \"Current url:{$errorInfo['info']['url']} \\r\\n\";\n    print_r($errorInfo['error']);\n})\n-\u003estart([\n\t// 最大并发数\n    'maxThread' =\u003e 10,\n    // 错误重试次数\n    'maxTry' =\u003e 3,\n]);\n\n```\n\n## 插件\n- [jae-jae/QueryList-PhantomJS](https://github.com/jae-jae/QueryList-PhantomJS): 使用PhantomJS采集JavaScript动态渲染的页面\n- [jae-jae/QueryList-CurlMulti](https://github.com/jae-jae/QueryList-CurlMulti) : Curl多线程采集\n- [jae-jae/QueryList-AbsoluteUrl](https://github.com/jae-jae/QueryList-AbsoluteUrl) : 转换URL相对路径到绝对路径\n- [jae-jae/QueryList-Rule-Google](https://github.com/jae-jae/QueryList-Rule-Google) : 谷歌搜索引擎\n- [jae-jae/QueryList-Rule-Baidu](https://github.com/jae-jae/QueryList-Rule-Baidu) : 百度搜索引擎\n\n\n查看更多的QueryList插件和基于QueryList的产品:[QueryList社区力量](https://github.com/jae-jae/QueryList-Community)\n\n## 贡献\n欢迎为QueryList贡献代码。关于贡献插件可以查看:[QueryList插件贡献说明](https://github.com/jae-jae/QueryList-Community/blob/master/CONTRIBUTING.md)\n\n## 寻求帮助?\n- QueryList主页: [http://querylist.cc](http://querylist.cc/)\n- QueryList文档: [http://doc.querylist.cc](http://doc.querylist.cc/)\n- QueryList问答:[http://wenda.querylist.cc](http://wenda.querylist.cc/)\n- QueryList交流QQ群:123266961 \u003ca target=\"_blank\" href=\"http://shang.qq.com/wpa/qunwpa?idkey=a1b248ae30b3f711bdab4f799df839300dc7fed54331177035efa0513da027f6\"\u003e\u003cimg border=\"0\" src=\"http://pub.idqqimg.com/wpa/images/group.png\" alt=\"cafeEX\" title=\"cafeEX\"\u003e\u003c/a\u003e\n- GitHub:https://github.com/jae-jae/QueryList\n- Git@OSC:http://git.oschina.net/jae/QueryList\n\n## Author\nJaeger \u003cJaegerCode@gmail.com\u003e\n\n## Lisence\nQueryList is licensed under the license of MIT. See the LICENSE for more details.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjae-jae%2Fquerylist","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjae-jae%2Fquerylist","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjae-jae%2Fquerylist/lists"}