Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/egoist/tokio
Web scraping made simple.
https://github.com/egoist/tokio
Last synced: 3 days ago
JSON representation
Web scraping made simple.
- Host: GitHub
- URL: https://github.com/egoist/tokio
- Owner: egoist
- License: mit
- Created: 2018-05-13T07:40:33.000Z (over 6 years ago)
- Default Branch: master
- Last Pushed: 2022-02-23T08:19:32.000Z (over 2 years ago)
- Last Synced: 2024-10-23T04:55:51.472Z (12 days ago)
- Language: JavaScript
- Homepage:
- Size: 64.5 KB
- Stars: 143
- Watchers: 5
- Forks: 4
- Open Issues: 3
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# tokio
[![NPM version](https://img.shields.io/npm/v/tokio.svg?style=flat)](https://npmjs.com/package/tokio) [![NPM downloads](https://img.shields.io/npm/dm/tokio.svg?style=flat)](https://npmjs.com/package/tokio) [![CircleCI](https://circleci.com/gh/egoist/tokio/tree/master.svg?style=shield)](https://circleci.com/gh/egoist/tokio/tree/master) [![donate](https://img.shields.io/badge/$-donate-ff69b4.svg?maxAge=2592000&style=flat)](https://github.com/egoist/donate) [![chat](https://img.shields.io/badge/chat-on%20discord-7289DA.svg?style=flat)](https://chat.egoist.moe)
> Web scraping made simple.
## Features
- Built on the top of [jsdom](https://github.com/jsdom/jsdom).
- It runs inline and external scripts on the page.
- You can add resource filter to not load certain external resources.
- Simple and fast, only 100 SLOC and it does not require Electron or Chromium.## Sponsor
## Install
```bash
yarn add tokio
```## Table of Contents
- [Usage](#usage)
- [API](#api)
* [new Tokio(options)](#new-tokiooptions)
+ [options](#options)
- [options.url](#optionsurl)
- [options.wait](#optionswait)
- [options.manually](#optionsmanually)
- [options.resourceFilter](#optionsresourcefilter)
- [options.requestOptions](#optionsrequestoptions)
* [tokio.fetch()](#tokiofetch)
* [tokio.query(html, opts)](#tokioqueryhtml-opts)
- [Contributing](#contributing)
- [Author](#author)## Usage
```js
const Tokio = require('tokio')const tokio = new Tokio({
url: 'https://some-website.com'
})tokio.fetch().then(html => {
console.log(html) //=> string// Query HTML with cheerio (server-side jQuery)
// https://github.com/cheeriojs/cheerio
const $ = tokio.query(html)
})
```## API
### new Tokio(options)
#### options
##### options.url
- __Type__: `string`
- __Required__: `required`The URL to fetch.
##### options.wait
- __Type__: `number` `string`
- __Default__: `50`Wait for certain time (in milliseconds) or dom element to show up.
##### options.manually
- __Type__: `boolean` `string`
Instead of using [options.wait](#options-wait), you can manually call `window.__tokio_ready__()` in your website to tell us that it's ready to be captured.
It can also be a string like `i_am_ready` so that you can call `window.i_am_ready()` instead.
##### options.resourceFilter
- __Type__: `resource => boolean`
Whether to load certain resource. Check out the [resource](https://github.com/jsdom/jsdom/blob/master/lib/old-api.md#custom-external-resource-loader) type.
##### options.requestOptions
- `proxy`: `string` A URL for a HTTP proxy to use for the requests.
- `agent`: http(s).Agent instance to use.
- `agentOptions`: The agent options; defaults to `{ keepAlive: true, keepAliveMsecs: 115000 }`, see [http api](https://nodejs.org/api/http.html) for more details.
- `strictSSL`: If `true`, requires SSL certificates be valid; defaults to `true`, see [request module](https://github.com/request/request#requestoptions-callback) for more details.
- `userAgent`: The user agent string used in requests; defaults to `Node.js (#process.platform#; U; rv:#process.version#)`
- `headers`: An object giving any headers that will be used while loading the HTML from `options.url`, if applicable.##### options.variables
Inject variables to the global scope `window`.
### tokio.fetch()
- __Type__: `() => Promise`
Fetch URL and return corresponding HTML. (JavaScript on this page will be evaluated.)
### tokio.query(html, opts)
This is basically [`cheerio.load(html, opts)`](https://github.com/cheeriojs/cheerio#loading).
## Contributing
1. Fork it!
2. Create your feature branch: `git checkout -b my-new-feature`
3. Commit your changes: `git commit -am 'Add some feature'`
4. Push to the branch: `git push origin my-new-feature`
5. Submit a pull request :D## Author
**tokio** © [egoist](https://github.com/egoist), Released under the [MIT](./LICENSE) License.
Authored and maintained by egoist with help from contributors ([list](https://github.com/egoist/tokio/contributors)).> [github.com/egoist](https://github.com/egoist) · GitHub [@egoist](https://github.com/egoist) · Twitter [@_egoistlily](https://twitter.com/_egoistlily)