{"id":21569884,"url":"https://github.com/iterable/scalasoup","last_synced_at":"2026-02-08T08:03:16.551Z","repository":{"id":156431960,"uuid":"629753125","full_name":"Iterable/scalasoup","owner":"Iterable","description":"A pure, typeful, idiomatic Scala wrapper around JSoup. (Fork of danielnixon/scalasoup)","archived":false,"fork":false,"pushed_at":"2024-12-10T22:40:36.000Z","size":92,"stargazers_count":1,"open_issues_count":0,"forks_count":1,"subscribers_count":38,"default_branch":"master","last_synced_at":"2025-03-30T18:04:23.963Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Scala","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Iterable.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-04-19T00:45:40.000Z","updated_at":"2024-12-10T22:40:41.000Z","dependencies_parsed_at":"2024-04-22T20:30:47.577Z","dependency_job_id":"09b6ec7d-6b63-4a9a-a7ba-2969c2508c48","html_url":"https://github.com/Iterable/scalasoup","commit_stats":null,"previous_names":[],"tags_count":3,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Iterable%2Fscalasoup","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Iterable%2Fscalasoup/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Iterable%2Fscalasoup/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Iterable%2Fscalasoup/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Iterable","download_url":"https://codeload.github.com/Iterable/scalasoup/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":251747988,"owners_count":21637408,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-24T11:10:57.730Z","updated_at":"2026-02-08T08:03:16.490Z","avatar_url":"https://github.com/Iterable.png","language":"Scala","funding_links":[],"categories":[],"sub_categories":[],"readme":"ScalaSoup\n=========\n\nA pure, typeful, idiomatic Scala wrapper around [JSoup](https://jsoup.org/).\n\n_This is a fork of [danielnixon/scalasoup](https://github.com/danielnixon/scalasoup) updated to Scala `2.13.7` and JSoup `1.15.1`_\n\nWhy ScalaSoup?\n--------------\n\n1. We keep the JSoup API basically intact, unless doing so clashes with any of the below.\n2. Unlike vanilla JSoup, everything in ScalaSoup is immutable.\n3. ScalaSoup endeavours to replace all of JSoup's partial functions (those that might return null or throw an exception) with total functions (those that do neither of these barbaric things).\n    1. We've replaced all nullable return types with `Option`s. No null references, no `NullPointerException`s.\n    2. We return `Option`s instead of throwing `IndexOutOfBoundsException`s.\n    3. We encode constraints in types instead of throwing exceptions at runtime. For example, if you call JSoup's `element.remove()` on an element that doesn't have a parent, it will throw. In ScalaSoup this _won't even compile_ (more on this below).\n4. We use Scala collection types, Scala regexes, etc instead of the Java equivalents used by vanilla JSoup.\n5. We drop Java-style `get` prefixes and rename identifiers that are reserved in Scala. For example `getElementsByTag()` becomes simply `elementsByTag`, and `val` (which is a keyword) becomes `value`. JSoup uses `get` prefixes inconsistently, e.g. `Element.wholeText` vs `TextNode.getWholeText`. In ScalaSoup these are both `wholeText`.\n6. We support simple mutation by replacing setter methods with `withFoo` methods. The `withFoo` methods return a clone of the object with the modification applied. The original is left unchanged (more on this below).\n7. We don't expose the `parse` overloads that perform http requests because JSoup's built-in http client is impure and blocking.\n8. We support more complex mutation by exposing a Free Monad-based DSL (more on this below).\n\nWhy not ScalaSoup?\n------------------\n\n1. You want performance at the cost of correctness. A `def parent: Element` method that returns `null` is probably faster than an honest `def parent: Option[Element]` but I don't care.\n\nUsage\n-----\n\nAdd the dependency to your `build.sbt`:\n```scala\nlibraryDependencies += \"com.iterable\" %% \"scalasoup\" % \"0.1.0\"\n```\n\nThen import the `scalasoup` package and use the `ScalaSoup` object as your entrypoint everywhere you would have used `Jsoup`.\n```scala\nimport com.iterable.scalasoup._\n\nScalaSoup.parse(...)\n```\n\nExample\n-------\n\nLet's translate the [Wikipedia example from the JSoup homepage](https://jsoup.org/).\nThe first thing to note is that JSoup's built-in http client is impure and blocking, so ScalaSoup doesn't expose it. You probably already use a library like [play-ws](https://github.com/playframework/play-ws) or [http4s](http://http4s.org/v0.17/client/). We encourage you to keep using whatever http library you're already using. For this example we'll use http4s. \n\n```scala\nimport org.http4s.client.blaze._\nimport com.iterable.scalasoup._\n\nval httpClient = FollowRedirect[IO](maxRedirects = 3)(Http1Client[IO]().unsafeRunSync())\nval uri = \"https://en.wikipedia.org/\"\n\nval task = httpClient.expect[String](uri) map { html =\u003e\n\n  val doc = ScalaSoup.parse(html, uri)\n\n  println(doc.title)\n  val newsHeadlines = doc.select(\"#mp-itn b a\")\n  for (headline \u003c- newsHeadlines) {\n    println(s\"${headline.attr(\"title\")} ${headline.absUrl(\"href\")}\")\n  }\n}\n\ntask.unsafeRunSync()\nhttpClient.shutdownNow()\n```\n\nSimple Mutation\n---------------\n\nJSoup allows you to mutate documents and their constituent parts (elements, nodes, attributes, etc) in-place. ScalaSoup disables this in order to avoid side-effects. In ScalaSoup everything is effectively immutable. For example, JSoup's `addClass` method--which mutates the element on which it is called--is not exposed by ScalaSoup.\n\nSo what do we do instead? We could create a copy of an element, make our changes to the copy and leave the original untouched.\n\nThis approach is actually possible in vanilla JSoup. Before moving on, let's see what that might look like:\n\n```scala\ndef withAddClass(element: org.jsoup.nodes.Element, className: String): org.jsoup.nodes.Element = {\n  val updatedElement = element.clone\n  updatedElement.addClass(className)\n  updatedElement\n}\n\nval originalElement = new org.jsoup.nodes.Element(\"div\")\nval updatedElement = withAddClass(originalElement, \"foo\")\n\noriginalElement.hasClass(\"foo\") // false\nupdatedElement.hasClass(\"foo\") // true\n```\n\nThis works but it has a few flaws.\n\n1. We're fighting JSoup and it shows.\n2. Perhaps crucially, this doesn't actually prevent us from mutating an existing element.\n3. It's verbose, so it's temping to avoid this approach and just mutate existing elements.\n4. It's pretty noisy and the interesting part is obscured by machinery. This has implications for readability, maintainability, etc.\n\nLet's see the same approach using ScalaSoup:\n\n```scala\nval originalElement = Element(\"div\")\nval updatedElement = originalElement.withAddClass(\"foo\")\n\noriginalElement.hasClass(\"foo\") // false\nupdatedElement.hasClass(\"foo\") // true\n```\n\nA few observations:\n\n1. In ScalaSoup, it's impossible to call JSoup's `addClass` directly.\n2. ScalaSoup provides `withAddClass` for you. It does essentially the same thing as the method we wrote in the example above.\n3. ScalaSoup provides `withFoo` alternatives for all of JSoup's mutating methods (none of which are exposed directly).\n4. ScalaSoup calls JSoup's `addClass` under the covers, so the mutation is still happening (ScalaSoup is just a wrapper, remember). The crucial point is that the mutation is controlled such that it can only happen on a clone of an existing element and can't be directly observed. Once the modified clone is returned to you it is effectively immutable. Further changes via `withFoo` methods will create additional clones.\n5. ScalaSoup reverses JSoup's priorities. In ScalaSoup, it's easy to create modified copies of elements but difficult to mutate existing elements. In JSoup, it's difficult to create modified copies but (too) easy to mutate existing elements.\n\nThis `withFoo` approach will get you a fair way. If you need something more powerful, see the next section.\n\nMutation DSL\n------------\n\nOne limitation of the `withFoo` approach (above) is that you incur a performance penalty associated with creating clones _every time_ you call a `withFoo` method. For example, `element.withAddClass(\"foo\").withAppendElement(\"div\")` will result in _two_ clones.\n\nIt'd be nice if we could _batch_ our modifications and incur the cloning penalty only once per batch of modifications. This is exactly what ScalaSoup's mutation DSL gives us. For the curious, the mutation DSL is implemented using a [Cats Free Monad](https://typelevel.org/cats/datatypes/freemonad.html).\n\nNote that in order to use the DSL you need to add an additional dependency to your `build.sbt`:\n```scala\nlibraryDependencies += \"com.iterable\" %% \"scalasoup-dsl\" % \"0.1.0\"\n```\n\nHere's an example that makes two changes to a document, incurring the cloning cost only once.\n\n```scala\nimport com.iterable.scalasoup._\nimport com.iterable.scalasoup.dsl._\n\nval modifications = for {\n  document \u003c- modifyDocument\n  _        \u003c- document.setTitle(\"New Title\")\n  _        \u003c- document.setHtml(\"New HTML\")\n} yield ()\n\nval originalDocument = ScalaSoup.parse(...)\nval updatedDocument = originalDocument.modify(modifications) \n```\n\nSome things to observe:\n\n1. We need an additional `dsl` wildcard import.\n2. Our entry-point to the DSL is `modifyDocument`.\n3. Using a for comprehension, we assemble a _description_ of our modifications.\n4. The description of our modifications doesn't actually do anything (yet).\n5. We execute our modifications by calling `modify` on a document. It is at this point that the document is cloned.\n6. The cloned document is modified using the underlying JSoup methods and an immutable wrapper is returned to us.\n7. The DSL consistently prefixes the mutating methods with `set` (e.g. `setTitle`, `setHtml`). JSoup _almost_ never prefixes setters with `set`. These are all mutating methods in Jsoup: `setBaseUri`, `setWholeData`, `title`, `html`. In ScalaSoup these are always prefixed with `set` and _only_ appear in the DSL.\n\nHere's an example that removes `target` attributes from _all_ `a` tags. Note the use of Cats's `foldMapM` (and the additional `cats` import).\n\n```scala\nimport cats.implicits._\nimport com.iterable.scalasoup._\nimport com.iterable.scalasoup.dsl._\n\nval modifications = for {\n  document \u003c- modifyDocument\n  _        \u003c- document.selectChildren(\"a\").foldMapM(_.removeAttr(\"target\"))\n} yield document\n\nval doc = ScalaSoup.parse(\"\u003ca target=\\\"_blank\\\"\u003e\u003c/a\u003e\")\n\nval result = doc.modify(modifications)\n```\n\nHere's an example that builds one DSL program based on another:\n\n```scala\nval selectLinksProgram = for {\n  document \u003c- modifyDocument\n} yield document.selectChildren(\"a\")\n\nval modifications = for {\n  links \u003c- selectLinksProgram\n  _     \u003c- links.foldMapM(_.addClass(\"foo\"))\n} yield ()\n\nval doc = ScalaSoup.parse(\"\u003ca\u003e\u003c/a\u003e\")\nval updated = doc.modify(modifications)\n```\n\nHere's an example that builds a DSL with some accumulated value using `modifyAndAccumulate` instead of `modify`:\n\n```scala\nval modifications = for {\n   document \u003c- modifyDocument\n   target   \u003c- document.selectChildren(\"a\").foldMapM { e =\u003e\n      val originalTarget = e.attr(\"target\")\n      e.removeAttr(\"target\").map(_ =\u003e List(originalTarget))\n   }\n} yield target\n\nval doc = ScalaSoup.parse(\"\u003ca target=\\\"_blank\\\"\u003e\u003c/a\u003e\u003ca target=\\\"blah\\\"\u003e\u003c/a\u003e\")\n\nval (result, removedTargets) = doc.modifyAndAccumulate(modifications)\n```\n\nParentState\n-----------\n\nA number of methods in JSoup throw an exception if the element on which they are called doesn't have a parent.\n\nHere's an example:\n\n```scala\nval doc = org.jsoup.Jsoup.parse(\"\")\n// Throws IllegalArgumentException because you can't remove something from its parent if it _has no_ parent.\n// This should throw an IllegalStateException instead, but what matters is that it throws at all.\ndoc.remove()\n```\n\nLet's try that in ScalaSoup:\n\n```scala\nval doc = ScalaSoup.parse(\"\")\ndoc.remove\n```\n\nThis time, the invalid program _won't even compile_:\n\n```\n[error] Foo.scala:12:9: Cannot prove that this node has a parent. You can only call this method on a node with a parent.\n[error]     doc.remove\n[error] \n```\n\nScalaSoup introduces the concept of a `ParentState` [phantom type](https://blog.codecentric.de/en/2016/02/phantom-types-scala/). All `Nodes` (including `Elements`, `Documents`, etc) have a `ParentState` type parameter, which tells us _at compile time_ whether a node has a parent or not. All the methods that would throw in JSoup (like `remove`) are constrained such that you can _only_ call them on nodes that have a parent. We've eliminated an entire class of runtime exceptions!\n\nHere are some points to keep in mind:\n1. Newly constructed `Document`s, including those returned by `ScalaSoup.parse`, etc never have a parent.\n2. Clones never have a parent (because one of the things JSoup always does when cloning is clear the clone's parent).\n3. Children returned from methods like `Document.head`, `Document.body`, `Element.child`, `Element.children`, `FormElement.elements` and others always have a parent.\n4. JSoup does _not_ make elements returned by `Document.createElement` a child of the document, so they never have a parent in ScalaSoup.\n\nThis `ParentState` scheme works out reasonably well almost everywhere. There are one or two places where it isn't as simple as we'd like it to be.\n\nThe first is the set of methods including `select`, `selectFirst`, `elementsByTag`, `elementsMatchingText`, `allElements`, etc (all sans the `get` prefix from JSoup, of course).\n\nThese methods all return lists (or options) of elements. The returned value could include children of the current element _and the current element itself_.\n\nThis raises a question. What should the ParentState of the returned list be?\n\nIf the current element is known to have a parent, then we can confidently return `List[Element[ParentState.HasParent]]`. And this is actually how ScalaSoup works. For example:\n\n```scala\n// Given some element with a parent (a `body` element in this case).\nval element: Element[ParentState.HasParent] = ScalaSoup.parse(\"\u003cdiv\u003e\u003c/div\u003e\").body.get\n\n// Select all elements using a CSS wildcard. This will include the original body element and its child div.\nval results: List[Element[ParentState.HasParent]] = element.select(\"*\")\n```\n\nBut what if we don't know the parent state of the current element (or if the current element is known to not have a parent)? In that case we cannot know the parent state of the returned list. For example:\n\n```scala\n// Given some element without a parent (a document element in this case).\nval document: Document[ParentState.NoParent] = ScalaSoup.parse(\"\u003cdiv\u003e\u003c/div\u003e\")\n\n// Select all div elements in the document. We (humans) know that the returned list will only include elements with parents, but we haven't persuaded the compiler.\nval results: List[Element[_]] = document.select(\"div\")\n```\n\nThere are a couple of ways we can work around this. One is to ensure we call the method (`select` in these examples) on something we know has a parent. For example:\n\n```scala\nval document: Document[ParentState.NoParent] = ScalaSoup.parse(\"\u003cdiv\u003e\u003c/div\u003e\")\n\n// We go via the body element, which is known to have a parent.\nval results: List[Element[ParentState.HasParent]] = document.body.toList.flatMap(_.select(\"*\"))\n```\n\nAnother, perhaps nicer, solution is to use one of the new methods introduced in ScalaSoup (i.e. not present in vanilla JSoup) for this purpose.\n\nInstead of `select` we can call `selectChildren`, which is equivalent in all respects except that it will never include the current element, allowing us to know with confidence that the returned elements all have a parent.\n\n```scala\nval document: Document[ParentState.NoParent] = ScalaSoup.parse(\"\u003cdiv\u003e\u003c/div\u003e\")\n\n// Using `selectChildren`, we know that all the results will have a parent.\nval results: List[Element[ParentState.HasParent]] = document.selectChildren(\"*\")\n```\n\nFor the other methods, take a look at `selectFirstChild` in place of `selectFirst`, `allChildren` in place of `allElements`, `childrenByClass` in place of `elementsByClass` and so on.\n\nThe second issue is raised by the `parent`, `parents` and `parentNode` methods. In these cases we _don't know_ at compile time whether or not the parent has a parent of its own.\n\nThis will be annoying in cases like this:\n\n```scala\nval modifications = for {\n  doc        \u003c- modifyDocument\n  link       =  doc.selectFirstChild(\"a\")\n  linkParent =  link.flatMap(_.parent)\n  _          \u003c- linkParent.foldMapM(_.remove)\n} yield ()\n\nval document = ScalaSoup.parse(\"\u003cdiv\u003e\u003ca\u003e\u003c/a\u003e\u003c/div\u003e\")\n\nval updated = document.modify(modifications)\n```\n\nIn the above example we:\n1. select the first `a` tag,\n2. get its parent,\n3. try to remove the parent from the document.\n\nThis fails to compile because we can't prove that the `a` tag's parent has a parent of its own. Recall that we can't remove an element unless it has a parent. Doing so would manifest as an exception in JSoup.\n\nThere are a couple of workarounds:\n\nThe bluntest (and least safe) solution is to just resort to using `asInstanceOf`:\n\n```scala\nval modifications = for {\n  doc        \u003c- modifyDocument\n  link       =  doc.selectFirstChild(\"a\")\n  linkParent =  link.flatMap(_.parent.map(_.asInstanceOf[Element[ParentState.HasParent]]))\n  _          \u003c- linkParent.foldMapM(_.remove)\n} yield ()\n```\n\nA safer solution is to rewrite our program so that it no longer has to work its way back _up_ the tree of elements. In this example let's rewrite using the [`has` pseudo-class](https://developer.mozilla.org/en-US/docs/Web/CSS/:has), which is supported by JSoup:\n\n```scala\nval modifications = for {\n  doc        \u003c- modifyDocument\n  linkParent =  doc.selectFirstChild(\"div:has(a)\")\n  _          \u003c- linkParent.foldMapM(_.remove)\n} yield ()\n```\n\nPattern matching\n----------------\n\nConsider the following (flawed) JSoup program:\n\n```scala\nimport scala.jdk.CollectionConverters._\n\nval doc: org.jsoup.nodes.Document = ???\n\nval foo = doc.childNodes.asScala.map {\n  case x: org.jsoup.nodes.Element =\u003e x.html\n  case x: org.jsoup.nodes.DataNode =\u003e x.getWholeData\n}\n```\n\nCan you see the problem? The pattern match is not exhaustive. If any of the child nodes is something other than an element or a data node, we're going to throw a `MatchError` at runtime. Worse, the compiler cannot warn us about it.\n\nLet's re-write our JSoup program using ScalaSoup:\n\n```scala\nval doc: Document[_] = ???\n\nval foo = doc.childNodes.map {\n  case x: Element[_] =\u003e x.html\n  case x: DataNode[_] =\u003e x.wholeData\n}\n```\n\nThis time, we see the problem clearly:\n\n```\n[warn] Foo.scala:24:34: match may not be exhaustive.\n[warn] It would fail on the following inputs: Comment(), DocumentType(), TextNode(), XmlDeclaration()\n[warn]     val foo = doc.childNodes.map {\n[warn]                                  ^\n[warn] one warning found\n```\n\nIn ScalaSoup--unlike in JSoup--the `Node`/`Element`/`Document`/etc class hierarchy is _sealed_. This allows the compiler to determine when a match is not exhaustive.\n\nCompile-time validation of regular expressions and CSS selectors\n----------------------------------------------------------------\n\nConsider this JSoup program that finds elements matching a regex (`(foo)` in this case):\n\n```scala\nval matchingElements = org.jsoup.Jsoup.parse(\"\u003cdiv\u003efoo\u003c/div\u003e\").getElementsMatchingText(\"(foo)\")\n```\n\nBut what if we forgot to close that capturing group (`(foo`)?\n\n```scala\nval matchingElements = org.jsoup.Jsoup.parse(\"\u003cdiv\u003efoo\u003c/div\u003e\").getElementsMatchingText(\"(foo\")\n```\n\nWe get a runtime exception: `java.lang.IllegalArgumentException: Pattern syntax error: (foo`.\n\nLet's see the equivalent in ScalaSoup:\n\n```scala\nval result = ScalaSoup.parse(\"\u003cdiv\u003efoo\u003c/div\u003e\").elementsMatchingText(\"(foo\")\n```\n\nWhat happens this time? It doesn't even compile.\n\n```\n[error] Foo.scala:183:73: Regex predicate failed: Unclosed group near index 4\n[error] (foo\n[error]     ^\n[error]     val result = ScalaSoup.parse(\"\u003cdiv\u003efoo\u003c/div\u003e\").elementsMatchingText(\"(foo\")\n[error]                                                                         ^\n[error] one error found\n[error] (dsl/test:compileIncremental) Compilation failed\n```\n\nWe've eliminated yet another entire class of runtime exceptions.\n\nWhat about CSS selectors?\n\nTake this Jsoup example (note the unclosed `[`):\n\n```scala\nval result = org.jsoup.Jsoup.parse(\"\u003cdiv\u003efoo\u003c/div\u003e\").select(\"a[href\")\n```\n\nWhat happens? Another runtime exception:\n\n```scala\norg.jsoup.select.Selector$SelectorParseException: Did not find balanced marker at 'href'\n```\n\nAnd what about ScalaSoup?\n\n```scala\nval result = ScalaSoup.parse(\"\u003cdiv\u003efoo\u003c/div\u003e\").select(\"a[href\")\n```\n\nAs you guessed, it doesn't even compile:\n\n```\n[error] Foo.scala:183:59: CssSelector predicate failed: Did not find balanced marker at 'href'\n[error]     val result = ScalaSoup.parse(\"\u003cdiv\u003efoo\u003c/div\u003e\").select(\"a[href\")\n[error]                                                           ^\n[error] one error found\n[error] (dsl/test:compileIncremental) Compilation failed\n```\n\nThese compile-time checks are courtesy of the excellent [refined](https://github.com/fthomas/refined) library.\n\nThere is one big limitation: these checks rely on the regex and CSS selectors being baked in _at compile time_. For example, this won't compile:\n\n```scala\nval someSelectorFromTheOutsideWorld: String = ???\nval result = ScalaSoup.parse(\"\u003cdiv\u003efoo\u003c/div\u003e\").select(someSelectorFromTheOutsideWorld)\n```\n\nHere's the compiler error:\n\n```\n[error] Foo.scala:184:59: compile-time refinement only works with literals\n[error]     val result = ScalaSoup.parse(\"\u003cdiv\u003efoo\u003c/div\u003e\").select(someSelectorFromTheOutsideWorld)\n[error]                                                           ^\n[error] one error found\n[error] (dsl/test:compileIncremental) Compilation failed\n```\n\nScalaSoup provides a `fromString` method for these cases. It returns an `Either` containing either an error message or a valid selector:\n\n```scala\nval someSelectorFromTheOutsideWorld: String = ???\nval selectorOrError: Either[String, CssSelectorString] = CssSelectorString.fromString(someSelectorFromTheOutsideWorld)\n\nselectorOrError match {\n  case Left(errorMessage) =\u003e Nil\n  case Right(validSelector) =\u003e ScalaSoup.parse(\"\u003cdiv\u003efoo\u003c/div\u003e\").select(validSelector)\n}\n```\n\nIf you're feeling reckless, you can use `fromStringUnsafe`:\n\n```scala\nval someSelectorFromTheOutsideWorld: String = ???\n// This will throw if the selector doesn't parse.\nval validSelector: CssSelectorString = CssSelectorString.fromStringUnsafe(someSelectorFromTheOutsideWorld)\n\nScalaSoup.parse(\"\u003cdiv\u003efoo\u003c/div\u003e\").select(validSelector)\n```\n\nThere are equivalent methods for regexes: `RegexString.fromString` and `RegexString.fromStringUnsafe`.\n\nOther Differences Between JSoup and ScalaSoup\n---------------------------------------------\n\n### No null references ###\nJSoup really likes to return null references. For example:\n\n```scala\nval doc = org.jsoup.Jsoup.parse(\"\u003cdiv\u003e\u003c/div\u003e\")\nval element: org.jsoup.nodes.Element = doc.selectFirst(\"span\") // Returns null\nval spanHtml = element.html // Throws java.lang.NullPointerException\n```\n\nIn ScalaSoup, `selectFirst` is more honest: it returns an `Option[Element]`.\n\n```scala\nval doc = ScalaSoup.parse(\"\u003cdiv\u003e\u003c/div\u003e\")\nval maybeElement: Option[Element[_]] = doc.selectFirst(\"span\")\n\nmaybeElement match {\n  case Some(element) =\u003e element.html\n  case None =\u003e \"\"\n}\n```\n\nScalaSoup replaces all such nullable return types with `Option`s. \n\nIt is of course possible to call `get` on an `Option` (which will throw if `None`), putting us right back in the primitive world we just escaped. To avoid this, consider using [WartRemover](http://www.wartremover.org/) and its [OptionPartial](http://www.wartremover.org/doc/warts.html#optionpartial) wart.\n\n### No `Elements` class ###\n\nJSoup often returns an instance of its `Elements` class (a subclass of `ArrayList\u003cElement\u003e`). Scala's collection types are much richer than Java's, so in ScalaSoup we opt to use simple `List[Element]` lists. We do not expose (a wrapper around) `Elements`.\n\nConsider this method from JSoup's `Elements` class:\n\n```java\npublic List\u003cFormElement\u003e forms() {\n  ArrayList\u003cFormElement\u003e forms = new ArrayList\u003c\u003e();\n  for (Element el: this)\n  if (el instanceof FormElement)\n    forms.add((FormElement) el);\n  return forms;\n}\n```\n\nUsage might look something like this:\n\n```scala\nval document = org.jsoup.Jsoup.parse(...)\nval forms = document.getAllElements.forms\n```\n\nIn ScalaSoup we don't need this, we can simply use `collect`:\n\n```scala\nval forms = document.allChildren.collect({case f: FormElement[ParentState.HasParent] =\u003e f})\n```\n\n### No mutations to arguments ###\n\nConsider this JSoup program:\n\n```scala\nval doc1 = org.jsoup.Jsoup.parse(\"\u003cspan\u003e\u003c/span\u003e\")\nval doc2 = org.jsoup.Jsoup.parse(\"\u003cdiv\u003e\u003c/div\u003e\")\n\ndoc2.selectFirst(\"div\").replaceWith(doc1.selectFirst(\"span\"))\n```\n\nWith one call to `replaceWith`, we've managed to mutate both doc2 _and_ doc1. This is courtesy of JSoup's \"reparenting\" misfeature.\n\nLet's rewrite this in ScalaSoup:\n\n```scala\nval doc1 = ScalaSoup.parse(\"\u003cspan\u003e\u003c/span\u003e\")\nval doc2 = ScalaSoup.parse(\"\u003cdiv\u003e\u003c/div\u003e\")\n\nval modifications = for {\n  doc \u003c- modifyDocument\n  div =  doc.selectFirstChild(\"div\").get\n  _   \u003c- div.replaceWith(doc1.selectFirstChild(\"span\").get)\n} yield ()\n\nval updatedDoc2 = doc2.modify(modifications)\n```\n\nThis time around, neither doc2 _nor_ doc1 are mutated. ScalaSoup achieves this by cloning arguments to `replaceWith` and other similar methods. \n\nFuture Work\n-----------\n\n1. Publish to Sonatype.\n2. More tests.\n3. Improve this readme.\n4. Copy JavaDoc from JSoup?\n5. Use refined for URLs.\n6. Improve the DSL.\n\nLicense\n-------\nMIT\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fiterable%2Fscalasoup","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fiterable%2Fscalasoup","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fiterable%2Fscalasoup/lists"}