{"id":34293061,"url":"https://github.com/cmiles74/scraper","last_synced_at":"2026-03-11T21:31:57.924Z","repository":{"id":20121398,"uuid":"23391281","full_name":"cmiles74/scraper","owner":"cmiles74","description":"A simple web scraper built around the JavaFX WebEngine","archived":false,"fork":false,"pushed_at":"2021-02-15T17:09:57.000Z","size":51,"stargazers_count":13,"open_issues_count":0,"forks_count":2,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-12-20T15:02:26.312Z","etag":null,"topics":["clojure","java","javafx","javascript","scraper"],"latest_commit_sha":null,"homepage":"","language":"Clojure","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"epl-1.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/cmiles74.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2014-08-27T13:49:57.000Z","updated_at":"2021-04-18T12:17:28.000Z","dependencies_parsed_at":"2022-08-30T10:51:04.735Z","dependency_job_id":null,"html_url":"https://github.com/cmiles74/scraper","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/cmiles74/scraper","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cmiles74%2Fscraper","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cmiles74%2Fscraper/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cmiles74%2Fscraper/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cmiles74%2Fscraper/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/cmiles74","download_url":"https://codeload.github.com/cmiles74/scraper/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cmiles74%2Fscraper/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":30401949,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-03-11T21:02:20.017Z","status":"ssl_error","status_checked_at":"2026-03-11T20:59:32.667Z","response_time":84,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["clojure","java","javafx","javascript","scraper"],"created_at":"2025-12-17T03:10:24.098Z","updated_at":"2026-03-11T21:31:57.919Z","avatar_url":"https://github.com/cmiles74.png","language":"Clojure","funding_links":["https://www.buymeacoffee.com/cmiles74"],"categories":[],"sub_categories":[],"readme":"# Scraper\n\nThis project provides a web scraping library built around the JavaFX\n[WebEngine][0], which in turn is built on top of [WebKit][1]. The goal of\nthis project is to provide an robust and easy-to-use web scraper that\ndoesn't require an external binary in order to function. With the\nintroduction of Java 8, this is finally beginning to seem feasible.\n\nIf you find this code useful in any way, please feel free to...\n\n\u003ca href=\"https://www.buymeacoffee.com/cmiles74\" target=\"_blank\"\u003e\u003cimg src=\"https://www.buymeacoffee.com/assets/img/custom_images/orange_img.png\" alt=\"Buy Me A Coffee\" style=\"height: 41px !important;width: 174px !important;box-shadow: 0px 3px 2px 0px rgba(190, 190, 190, 0.5) !important;-webkit-box-shadow: 0px 3px 2px 0px rgba(190, 190, 190, 0.5) !important;\" \u003e\u003c/a\u003e\n\n# Usage\n\nIt's still early days yet, this project hasn't reached the point where\nwe're releasing builds of the library. Still, you can checkout the\nproject and build it yourself.\n\n````clojure\n[com.nervestaple/scraper \"0.1.0-SNAPSHOT\"]\n````\n\nProbably more fun is to check out the project and then interact with\nit directly via the REPL.\n\n    $ cd scraper\n    $ lein repl\n\nFrom there it's easy to get a handle on a WebEngine instance and\nscrape out some content.\n\n````\nuser\u003e (def we (scraper/get-web-engine))\n\n#'user/we\n\nuser\u003e (scraper/load-url we \"http://twitch.nervestaple.com\")\n{:state :ready}\n\nuser\u003e (scraper/load-artoo we)\n{:state :ready}\n\nuser\u003e (scraper/scrape we \"h1\" {:title \"text\"})\n\n{\"title\" \"Bishop: Makes Your Web Service Shiny\"} {\"title\" \"Why Is My Web Service\nAPI Crappy?\"} {\"title\" \"All Your HBase Are Belong to Clojure\"}) ({\"title\" \"Work\nIn Progress\"} {\"title\" \"Linux Is All About Choices\"} {\"title\" \"Real Life Web App\nIntegration Testing (IT) with Spring\"} {\"title\" \"Bishop: Makes Your Web Service\nShiny\"} {\"title\" \"Why Is My Web Service API Crappy?\"} {\"title\" \"All Your HBase\nAre Belong to Clojure\"})\n````\n\nAs you can see in the example above, the [Artoo.js][2] JavaScript\nscraping library is injected into the loaded page in order to make\nyour scraping easier. You are welcome! ;-)\n\nIf you're interested in being able to see the content that your\nWebEngine instance is loading, you can get a handle on a WebView\ninstead. This will bring up a new window displaying the WebView.\n\n````\nuser\u003e (def wv (scraper/get-web-view))\n\n#'user/wv\n\nuser\u003e (def we (:web-engine wv))\n\n#'user/we\n````\n\nWork on the project continues, but this should be enough to get you\nstarted.\n\n----\n\n[0]:\nhttp://docs.oracle.com/javafx/2/api/javafx/scene/web/WebEngine.html \"Web Engine API\"\n[1]: http://en.wikipedia.org/wiki/WebKit \"WebKit\"\n[2]: http://medialab.github.io/artoo \"Artoo.js\"\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcmiles74%2Fscraper","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcmiles74%2Fscraper","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcmiles74%2Fscraper/lists"}