{"id":19292408,"url":"https://github.com/foolin/scrago","last_synced_at":"2025-02-24T00:18:28.473Z","repository":{"id":148654496,"uuid":"92047902","full_name":"foolin/scrago","owner":"foolin","description":"An simpe, fast, extensible crawl page framework for golang","archived":false,"fork":false,"pushed_at":"2018-04-19T05:58:25.000Z","size":24,"stargazers_count":4,"open_issues_count":0,"forks_count":0,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-01-05T20:28:09.583Z","etag":null,"topics":["crawler","go","scrago","scrapy"],"latest_commit_sha":null,"homepage":"","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/foolin.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2017-05-22T11:50:37.000Z","updated_at":"2021-06-11T18:09:41.000Z","dependencies_parsed_at":null,"dependency_job_id":"f8651c32-572e-48e0-a626-ef1b2b9344c0","html_url":"https://github.com/foolin/scrago","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/foolin%2Fscrago","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/foolin%2Fscrago/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/foolin%2Fscrago/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/foolin%2Fscrago/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/foolin","download_url":"https://codeload.github.com/foolin/scrago/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":240395983,"owners_count":19794618,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["crawler","go","scrago","scrapy"],"created_at":"2024-11-09T22:30:38.342Z","updated_at":"2025-02-24T00:18:28.432Z","avatar_url":"https://github.com/foolin.png","language":"Go","funding_links":[],"categories":[],"sub_categories":[],"readme":"# scrago\n\nScrago is an simpe, fast, extensible crawl page framework for golang.\n\n\n# Install\n\n```\n go get github.com/foolin/scrago\n```\n\n# Document\n\n[Godoc](https://godoc.org/github.com/foolin/scrago \"go document\")\n\n# Exmaple\n\n\n### Step 1：\n```go\n\ntype ExampModel struct {\n\tTitle string `scrago:\"title\"`\n\tName string `scrago:\"#main\u003e.intro\u003eh2::text()\"`\n\tDescription string `scrago:\"#main\u003e.intro\u003ep::html()\"`\n\tIntro string  `scrago:\"#main\u003e.intro::outerHtml()\"`\n\tKeywords []string  `scrago:\"#main .keywords::GetMyKeywords()\"`\n}\n\nfunc (e *ExampModel) GetMyKeywords(s *goquery.Selection) ([]string, error) {\n\tv := s.Text()\n\tif v == \"\"{\n\t\treturn nil, fmt.Errorf(\"not found keywords!\")\n\t}\n\tarr := strings.Split(v, \",\")\n\tfor i := 0; i \u003c len(arr); i++{\n\t\tarr[i] = strings.TrimSpace(arr[i])\n\t}\n\treturn arr, nil\n}\n\n```\n\n### Step 2:\n```go\n\nfunc main()  {\n\texamp := ExampModel{}\n\ts := scrago.New()\n\terr := s.HttpGetParser(\"https://raw.githubusercontent.com/foolin/scrago/master/example/data/example.html\", \u0026examp)\n\tif err != nil {\n\t\tlog.Fatal(err)\n\t}else{\n\t\tprintjson(examp)\n\t}\n}\n\nfunc printjson(v interface{})  {\n\tenc := json.NewEncoder(os.Stdout)\n\tenc.SetEscapeHTML(false)\n\tenc.SetIndent(\"\", \"    \")\n\tenc.Encode(v)\n}\n\n```\n\n### Step 3:\nExecute result：\n\n```json\n\n{\n    \"Title\": \"Scrago exmaples\",\n    \"Name\": \"Scrago framework\",\n    \"Description\": \"An open source and collaborative framework for extracting the data you need from websites.\\n            In a \u003cb\u003efast\u003c/b\u003e, \u003cb\u003esimple\u003c/b\u003e, yet extensible way.\",\n    \"Intro\": \"\u003cdiv class=\\\"intro\\\"\u003e\\n        \u003ch2\u003eScrago framework\u003c/h2\u003e\\n        \u003cp\u003eAn open source and collaborative framework for extracting the data you need from websites.\\n            In a \u003cb\u003efast\u003c/b\u003e, \u003cb\u003esimple\u003c/b\u003e, yet extensible way.\u003c/p\u003e\\n        \u003cdiv class=\\\"keywords\\\"\u003eScrago, Scrap, Spider, Crawl, GoLang, Simple, Easy\u003c/div\u003e\\n    \u003c/div\u003e\",\n    \"Keywords\": [\n        \"Scrago\",\n        \"Scrap\",\n        \"Spider\",\n        \"Crawl\",\n        \"GoLang\",\n        \"Simple\",\n        \"Easy\"\n    ]\n}\n\n```\n\nOrigin page：\n```html\n\u003c!doctype html\u003e\n\u003chtml class=\"no-js\" lang=\"\"\u003e\n\n\u003chead\u003e\n    \u003cmeta charset=\"utf-8\"\u003e\n    \u003ctitle\u003eScrago exmaples\u003c/title\u003e\n\u003c/head\u003e\n\n\u003cbody\u003e\n\u003cdiv id=\"header\"\u003e\n    \u003cdiv class=\"container\"\u003e\n        \u003cdiv class=\"clearfix\"\u003e\n            \u003cdiv class=\"logo\"\u003e\n                \u003ca href=\"https://github.com/foolin/scrago\" title=\"Scrago exmaple\"\u003e\n                    \u003ch1 title=\"Scrago exmaple - crawl framework for go\"\u003eScrago exmaple\u003c/h1\u003e\n                \u003c/a\u003e\n            \u003c/div\u003e\n        \u003c/div\u003e\n    \u003c/div\u003e\n\u003c/div\u003e\n\n\u003cdiv class=\"navlink\"\u003e\n    \u003cdiv class=\"container\"\u003e\n        \u003cul class=\"clearfix\"\u003e\n            \u003cli \u003e\u003ca href=\"/\"\u003eIndex\u003c/a\u003e\u003c/li\u003e\n            \u003cli \u003e\u003ca href=\"/list/web\" title=\"web site\"\u003eWeb page\u003c/a\u003e\u003c/li\u003e\n            \u003cli \u003e\u003ca href=\"/list/pc\" title=\"pc page\"\u003ePc Page\u003c/a\u003e\u003c/li\u003e\n            \u003cli \u003e\u003ca href=\"/list/mobile\" title=\"mobile page\"\u003eMobile Page\u003c/a\u003e\u003c/li\u003e\n        \u003c/ul\u003e\n    \u003c/div\u003e\n\u003c/div\u003e\n\n\u003cdiv id=\"main\"\u003e\n    \u003cdiv class=\"intro\"\u003e\n        \u003ch2\u003eScrago framework\u003c/h2\u003e\n        \u003cp\u003eAn open source and collaborative framework for extracting the data you need from websites.\n            In a \u003cb\u003efast\u003c/b\u003e, \u003cb\u003esimple\u003c/b\u003e, yet extensible way.\u003c/p\u003e\n        \u003cdiv class=\"keywords\"\u003eScrago, Scrap, Spider, Crawl, GoLang, Simple, Easy\u003c/div\u003e\n    \u003c/div\u003e\n    \u003cdiv class=\"typelist\"\u003e\n        \u003cul\u003e\n            \u003cli data-type=\"bool\"\u003etrue\u003c/li\u003e\n            \u003cli data-type=\"int\"\u003e123\u003c/li\u003e\n            \u003cli data-type=\"float\"\u003e45.6\u003c/li\u003e\n            \u003cli data-type=\"string\"\u003ehello\u003c/li\u003e\n            \u003cli data-type=\"array\"\u003e\n                \u003col\u003e\n                    \u003cli\u003eAa\u003c/li\u003e\n                    \u003cli\u003eBb\u003c/li\u003e\n                    \u003cli\u003eCc\u003c/li\u003e\n                \u003c/ol\u003e\n            \u003c/li\u003e\n        \u003c/ul\u003e\n    \u003c/div\u003e\n\n\u003c/div\u003e\n\n\u003c/body\u003e\n\u003c/html\u003e\n```\n\n# Struct tag\nBetween selector and function use \"::\" symbol segmentation\n```go\n`scrago:\"selector::function\"`\n\n```\n* selector:\n  Css selector, sea more：github.com/PuerkitoBio/goquery\n\n* function:\n  Get data function，default is text()。\n\n  1.Inner function：\n  - text() get text value.\n  - html() get html vlaue.\n  - outerHtml() get outer html value.\n  - attr(xxx) get attribute value, eg：attr(href)。\n\n  2.Write custom function：\n```go\n\nfunc (e *ExampModel) MyFunc(s *goquery.Selection) (MyReturnType, error) {\n    //todo\n    return ReturnValue, nil\n}\n\n```\n\n   eg：\n```go\n\ntype ExampModel struct {\n    TextField string `scrago:\"#xxx\"`\n    TextField2 string `scrago:\".xxx::text()\"`\n    Link string `scrago:\"a::attr(href)\"`\n    MyField string  `scrago:\"#xxx::MyFunc()\"`\n}\n\nfunc (e *ExampModel) MyFunc(s *goquery.Selection) (String, error) {\n    //todo\n    return s.Text(), nil\n}\n\n```\n\n\n# Exmaples\n * [Simple](https://github.com/foolin/scrago/tree/master/example/simple \"Simple Example\")\n * [Parser](https://github.com/foolin/scrago/tree/master/example/parser \"Parser Example\")\n * [Quotesbot](https://github.com/foolin/scrago/tree/master/example/quotesbot \"Quotesbot Example\")\n\n# Relative\n * github.com/PuerkitoBio/goquery","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffoolin%2Fscrago","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffoolin%2Fscrago","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffoolin%2Fscrago/lists"}