Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/nberlette/dql
Web Scraping with Deno: DOM + GraphQL
https://github.com/nberlette/dql
deno deno-deploy denoland dom dom-parser dql graphql graphql-scraper scraper webscraping
Last synced: 3 months ago
JSON representation
Web Scraping with Deno: DOM + GraphQL
- Host: GitHub
- URL: https://github.com/nberlette/dql
- Owner: nberlette
- License: mit
- Created: 2022-07-09T23:40:52.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2024-08-06T22:42:13.000Z (3 months ago)
- Last Synced: 2024-08-07T02:08:25.746Z (3 months ago)
- Topics: deno, deno-deploy, denoland, dom, dom-parser, dql, graphql, graphql-scraper, scraper, webscraping
- Language: TypeScript
- Homepage: https://deno.land/x/dql
- Size: 64.5 KB
- Stars: 10
- Watchers: 2
- Forks: 1
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# [๐ฆ DQL](https://deno.land/x/dql)
### _**Web Scraping with Deno ย โย DOM + GraphQL**_
---
**`DQL`** is a web scraping module for Deno and Deno Deploy that integrates the power of [**GraphQL Queries**](https://graphql.org/learn/queries) with the DOM tree of a remote webpage or HTML document fragment. This is a fork of [**DenoQL**](https://deno.land/x/denoql) with some heavy refactoring and some additional features:
- [x] Compatibility with the [**Deno Deploy**](https://deno.com/deploy) architecture
- [x] Ability to pass variables alongside all queries
- [x] New state-management class with additional methods
- [x] Modular project structure (as opposed to a mostly single-file design)
- [x] Improved types and schema structure> **Note**: _This is a work-in-progress and there is still a lot to be done._
### ๐ย [**`GraphQL Playground`**](https://dql.deno.dev)
### ๐ย [**`HackerNews Scraper`**](https://dash.deno.com/playground/dql-hn)
### ๐ย [**`Junkyard Scraper`**](https://dash.deno.com/playground/dirty-sparrow-69)
---
## `useQuery`
The primary function exported by the module is the workhorse named `useQuery`:
```ts
import { useQuery } from "https://deno.land/x/dql/mod.ts";const data = await useQuery(`query { ... }`);
```### `QueryOptions`
You can also provide a `QueryOptions` object as the second argument of `useQuery`, to further control the behavior of your query requests. All properties are optional.
```ts
const data = await useQuery(`query { ... }`, {
concurrency: 8, // passed directly to PQueue initializer
fetch_options: { // passed directly to Fetch API requests
headers: {
"Authorization": "Bearer ghp_a5025a80a24defd0a7d06b4fc215bb5635a167c6",
},
},
variables: {}, // variables defined in your queries
operationName: "", // when using multiple queries
});
```## `createServer`
With [**Deno Deploy**](https://dash.deno.com/new), you can deploy **`DQL`** with a GraphQL Playground in **only 2 lines of code**:
```ts
import { createServer } from "https://deno.land/x/dql/mod.ts";createServer(80, { endpoint: "https://dql.deno.dev" });
````๐` [Try the **GraphQL Playground** at **`dql.deno.dev`**](https://dql.deno.dev)\
`๐ฆ` [View the source code in the **`Deno Playground`**](https://dash.deno.com/playground/dql)## Command Line Usage (CLI)
```bash
deno run -A --unstable https://deno.land/x/dql/serve.ts
```#### Custom port (default is `8080`)
```bash
deno run -A https://deno.land/x/dql/serve.ts --port 3000
```> **Warning**: you need to have the [**Deno CLI**](https://deno.land) installed first.
---
## ๐ป Examples
### `๐` Junkyard Scraper ยท [**`Deno Playground ๐ฆ`**](https://dash.deno.com/playground/dirty-sparrow-69)
```ts
import { useQuery } from "https://deno.land/x/dql/mod.ts";
import { serve } from "https://deno.land/[email protected]/http/server.ts";serve(async (res: Request) =>
await useQuery(
`
query Junkyard (
$url: String
$itemSelector: String = "table > tbody > tr"
) {
vehicles: page(url: $url) {
totalCount: count(selector: $itemSelector)
nodes: queryAll(selector: $itemSelector) {
id: index
vin: text(selector: "td:nth-child(7)", trim: true)
sku: text(selector: "td:nth-child(6)", trim: true)
year: text(selector: "td:nth-child(1)", trim: true)
model: text(selector: "td:nth-child(2) > .notranslate", trim: true)
aisle: text(selector: "td:nth-child(3)", trim: true)
store: text(selector: "td:nth-child(4)", trim: true)
color: text(selector: "td:nth-child(5)", trim: true)
date: attr(selector: "td:nth-child(8)", name: "data-value")
image: src(selector: "td > a > img")
}
}
}`,
{
variables: {
"url": "http://nvpap.deno.dev/action=getVehicles&makes=BMW",
},
},
)
.then((data) => JSON.stringify(data, null, 2))
.then((json) =>
new Response(json, {
headers: { "content-type": "application/json;charset=utf-8" },
})
)
);
```### ๐ HackerNews Scraper ยท [**`Deno Playground ๐ฆ`**](https://dash.deno.com/playground/dql-hn)
```ts
import { useQuery } from "https://deno.land/x/dql/mod.ts";
import { serve } from "https://deno.land/[email protected]/http/server.ts";serve(async (res: Request) =>
await useQuery(`
query HackerNews (
$url: String = "http://news.ycombinator.com"
$rowSelector: String = "tr.athing"
) {
page(url: $url) {
title
totalCount: count(selector: $rowSelector)
nodes: queryAll(selector: $rowSelector) {
rank: text(selector: "td span.rank", trim: true)
title: text(selector: "td.title a", trim: true)
site: text(selector: "span.sitestr", trim: true)
url: href(selector: "td.title a")
attrs: next {
score: text(selector: "span.score", trim: true)
user: text(selector: "a.hnuser", trim: true)
date: attr(selector: "span.age", name: "title")
}
}
}
}`)
.then((data) => JSON.stringify(data, null, 2))
.then((json) =>
new Response(json, {
headers: { "content-type": "application/json;charset=utf-8" },
})
)
);
```## License
MIT ยฉ [**Nicholas Berlette**](https://github.com/nberlette), based on [DenoQL](https://deno.land/x/denoql).