Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/7bridges-eu/shelob
Clojure scraping framework wanna-be
https://github.com/7bridges-eu/shelob
clojure framework jsoup scraping selenium
Last synced: 2 days ago
JSON representation
Clojure scraping framework wanna-be
- Host: GitHub
- URL: https://github.com/7bridges-eu/shelob
- Owner: 7bridges-eu
- License: apache-2.0
- Created: 2019-07-09T13:16:54.000Z (over 5 years ago)
- Default Branch: master
- Last Pushed: 2022-10-08T10:15:09.000Z (about 2 years ago)
- Last Synced: 2024-06-15T05:02:50.813Z (5 months ago)
- Topics: clojure, framework, jsoup, scraping, selenium
- Language: Clojure
- Homepage:
- Size: 146 KB
- Stars: 5
- Watchers: 3
- Forks: 1
- Open Issues: 1
-
Metadata Files:
- Readme: README.adoc
- License: LICENSE
Awesome Lists containing this project
README
Shelob
------Shelob wraps https://www.seleniumhq.org/[Selenium] to let you browse a website
and scrape its contents.Rationale
~~~~~~~~~Selenium automates web browsing, primarily for testing and administration
purposes.Shelob wraps Selenium to make it more idiomatic and adherent to the Clojure way
of coding, and exposes facilities to scrape web pages.Version
~~~~~~~image:https://img.shields.io/clojars/v/eu.7bridges/shelob.svg[link="https://clojars.org/eu.7bridges/shelob"]
Example
~~~~~~~A simple DuckDuckGo search:
* Type "clojure" on the search field
* Click on the magnify glass to perform the search
* Retrieve the URLs of the visible results[source,clojure]
----
(require '[shelob.core :as sh])
(require '[shelob.browser :as shb])
(require '[shelob.scraper :as shs])(def context
{:driver-options {:browser :firefox}
:pool-size 2
:init-messages [{:msg :go :url "https://duckduckgo.com/html/"}]})(defn scrape-result
[document]
(println (map
#(shs/attribute % "href")
(shs/select document ".result__a"))))(defn example
[]
(sh/init context)
(let [msg [{:msg :fill
:locator (shb/by-css-selector "#search_form_input_homepage")
:text "Clojure"}
{:msg :click :locator (shb/by-css-selector "#search_button_homepage")}
{:msg :wait-for :condition
(shb/presence-of-element-located
(shb/by-css-selector ".serp__results"))}]]
(sh/send-message context scrape-result msg))
(sh/stop))
----Running `(example)` results in:
[source,clojure]
----
user> (https://clojure.org/ https://en.wikipedia.org/wiki/Clojure https://github.com/clojure/clojure https://www.reddit.com/r/Clojure/ https://clojuredocs.org/ https://clojuredocs.org/clojure.core/when https://www.braveclojure.com/ https://repl.it/languages/clojure https://www.zhihu.com/question/21446061 https://leiningen.org/ https://clojure.github.io/clojure/ https://github.com/clojure https://www.tutorialspoint.com/clojure/clojure_basic_syntax.htm https://learnxinyminutes.com/docs/clojure/ https://cursive-ide.com/ https://clojurescript.org/ https://www.tutorialspoint.com/clojure/clojure_loops.htm http://www.clojurekoans.com/ https://www.braveclojure.com/clojure-for-the-brave-and-true/ https://marketplace.visualstudio.com/items?itemName=avli.clojure https://www.youtube.com/user/ClojureTV https://www.slant.co/options/1538/~clojure-review https://www.amazon.com/Clojure-Programming-Practical-Lisp-World/dp/1449394701 https://kimh.github.io/clojure-by-example/ https://ja.wikipedia.org/wiki/Clojure http://www.4clojure.
com/ https://developer.mozilla.org/en-US/docs/Web/JavaScript/Closures https://www.clojure.org/guides/getting_started https://en.wikibooks.org/wiki/Clojure_Programming)
----Exception management
~~~~~~~~~~~~~~~~~~~~By default, Shelob prints out exceptions on standard output and continues the
crawling process; you can customise exception management by passing a custom
exception management function to `send-message`, as such:[source,clojure]
----....
(defn exception-custom-fn
"Prints out exception with a custom message"
[_source e]
(println "An exception occurred, this is a custom message -" (.getMessage e)))(defn example
[]
(sh/init context)
(let [msg [...]
(sh/send-message context scrape-result exception-custom-fn msg))
(sh/stop))
----License
~~~~~~~Copyright © 2019 7bridges s.r.l. — Distributed under the Apache License
2.0.