{"id":13799928,"url":"https://github.com/gsh199449/spider","last_synced_at":"2025-05-13T08:32:32.061Z","repository":{"id":50680514,"uuid":"74628476","full_name":"gsh199449/spider","owner":"gsh199449","description":"A configurable web spider with a easy-to-use web console","archived":false,"fork":false,"pushed_at":"2018-08-21T05:26:31.000Z","size":15139,"stargazers_count":987,"open_issues_count":4,"forks_count":484,"subscribers_count":122,"default_branch":"master","last_synced_at":"2024-04-16T18:13:31.392Z","etag":null,"topics":["cralwer","gatherplatform","spider","text-mining","web-console"],"latest_commit_sha":null,"homepage":"","language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/gsh199449.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2016-11-24T01:48:03.000Z","updated_at":"2024-04-11T11:36:40.000Z","dependencies_parsed_at":"2022-08-28T20:21:21.297Z","dependency_job_id":null,"html_url":"https://github.com/gsh199449/spider","commit_stats":null,"previous_names":[],"tags_count":8,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gsh199449%2Fspider","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gsh199449%2Fspider/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gsh199449%2Fspider/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gsh199449%2Fspider/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/gsh199449","download_url":"https://codeload.github.com/gsh199449/spider/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":225190743,"owners_count":17435482,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cralwer","gatherplatform","spider","text-mining","web-console"],"created_at":"2024-08-04T00:01:07.290Z","updated_at":"2024-11-18T14:31:21.792Z","avatar_url":"https://github.com/gsh199449.png","language":"Java","funding_links":[],"categories":[],"sub_categories":[],"readme":"# 欢迎使用 Gather Platform 数据采集与分析平台\n\n------\n\n[Readme in English](https://github.com/gsh199449/spider/tree/master/doc/README-en.md)\n\n欢迎加入 `Gather Platform交流` QQ群 : 206264662\n\n**详细使用方法请参考 [在线文档](https://gsh199449.github.io/gather_platform_pages/)**\n \n[![Build Status](https://travis-ci.org/gsh199449/spider.svg?branch=master)](https://travis-ci.org/gsh199449/spider)\n\nGather Platform 数据抓取平台是一套基于[Webmagic](https://github.com/code4craft/webmagic)内核的,具有Web任务配置和任务管理界面的数据采集与搜索平台.具有以下功能\n\n\u003e * 根据配置的模板进行数据采集，支持**Ajax网页采集**\n\u003e * 在不配置采集模板的情况下自动检测网页正文,自动抽取文章发布时间\n\u003e * 动态字段抽取与静态字段植入\n\u003e * 已抓取数据的管理,包括:搜索,增删改查,按照新的数据模板重新抽取数据\n\u003e * 对采集的数据进行NLP处理,包括:抽取关键词,抽取摘要,抽取实体词\n\u003e * 含有相关文章推荐，文章中人物、地点之间的关联关系分析\n\n5分钟即可部署完毕,半分钟即可完成一个爬虫,开始数据采集.\n不需要进行任何编码就可以完成一个功能强大的爬虫.\n\n\u003cimg src=\"https://github.com/gsh199449/spider/blob/master/doc/imgs/show.gif?raw=true\" alt=\"show\"/\u003e\n\n## Windows/Mac/Linux 全平台支持\n\n本系统需要如下依赖:\n\n - JDK 8 及以上\n - Tomcat 8.3 及以上\n\n可选依赖组件:\n\n  - Elasticsearch 5.0\n\n## 部署、使用方法、二次开发手册、常见问题等全部迁移至[在线文档](https://gsh199449.github.io/gather_platform_pages/)\n\n## 联系我\n\n邮箱: 63388@qq.com\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgsh199449%2Fspider","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgsh199449%2Fspider","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgsh199449%2Fspider/lists"}