https://github.com/arstgit/videox

Download HTML5 videos from a website page.
https://github.com/arstgit/videox

mediasource-extensions

Last synced: 25 days ago
JSON representation

Download HTML5 videos from a website page.

Host: GitHub
URL: https://github.com/arstgit/videox
Owner: arstgit
License: mit
Created: 2020-10-02T03:23:52.000Z (almost 5 years ago)
Default Branch: main
Last Pushed: 2020-10-08T22:58:09.000Z (almost 5 years ago)
Last Synced: 2025-05-26T23:15:25.907Z (about 2 months ago)
Topics: mediasource-extensions
Language: JavaScript
Homepage:
Size: 11.7 KB
Stars: 4
Watchers: 1
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

        # videox

Download HTML5 videos from a website page using Media Source Extensions (MSE).

Note: 

1. videox is designed for pages using Media Source Extensions (MSE) technique. For pages using other techniques, just embed a HTTP URL into video tag, for example, videox will throw an error.

2. Some pages have video ads using the same technique as the actual video content, the MSE. videox can't distingush them, it just downloads all video ads and the actual video by default. The easiest way to deal with this is using a browser with ads block extension. Alternatively you can modify this program as you need as it's just a web crawler based on puppeteer.

# Prerequisites

- chrome. Needed if the websites were providing MP4 video you wanted that is usually the case. Otherwise chromium, puppeteer downloaded automatically is enough.

# Design

[https://www.tiaoxingyubolang.com/zh/article/2020-10-09_mediasource](https://www.tiaoxingyubolang.com/zh/article/2020-10-09_mediasource)

# Usage

```js

const Videox = require('videox')

const targetUrl = 'https://www.youtube.com/watch?v=h32FxBqmu_U'

(async () = {

  const videox = new Videox({

    debug: true,

    headless: true,

    downloadBrowser: false,

    logTo: process.stdout,

    browserExecutePath: '/usr/bin/chromium',

    browserArgs: ['--no-sandbox'],

    downloadAsFile: true,

    downloadPath: path.join(__dirname, 'download'),

    checkCompleteLoopInterval: 100,

    waitForNextDataTimeout: 8000,

  })

  await videox.init()

  await videox.get(targetUrl)

  await videox.destroy()

})()

```

# API

## Class: Videox

### Event: 'data'

- `objectURL` \ The URL created from `URL.createObjectURL`, usually starts with `blob`.

- `mimeCodec` \ Corresponding mimeCodec.

- `chunk` \ The data received from page.

If `options.downloadAsFile` is specified as `false`, this event must be listened for receiving media data.

`objectURL` and `mimeCode` together identify a media file to which `chunk` corresponding.

### new Videox([options])

- `options` \

    - `debug` \ Default: false.

    - `headless` \ Default: true.

    - `downloadBrowser` \ Default: false.

    - `logTo` \ Default: process.stdout.

    - `browserExecutePath`: \ Default: '/usr/bin/chromium'.

    - `browserArgs`: \: Default: [].

    - `downloadAsFile` \ Default: true.

    - `dowloadPath` \ Default: ''.

    - `checkCompleteLoopInterval` \ The time interval  between checking whether  current download progress is commplete, in milliseconds. Default: 100,

    - `waitForNextDataTimeout`: \ The timeout waiting for next media data, in milliseconds. Default: 3000.

- `Returns`: \

Usually `dowloadBrowser` is false and `browserExecutePath` is filled with common browser path to download MP4 using browsers other than the default chromium. See `puppeteer` package for more information.

### videox.init()

- `Returns`: \

### video.get(options)

- `pageUrl` \ Required.

- `Returns`: \

### videox.destroy()

- `Returns`: \

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/arstgit/videox

Awesome Lists containing this project

README