Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/iawia002/Lulu
[Unmaintained] A simple and clean video/music/image downloader ๐พ
https://github.com/iawia002/Lulu
crawler crawling downloader python python3 scraper scraping video
Last synced: about 2 months ago
JSON representation
[Unmaintained] A simple and clean video/music/image downloader ๐พ
- Host: GitHub
- URL: https://github.com/iawia002/Lulu
- Owner: iawia002
- License: mit
- Archived: true
- Created: 2018-01-16T06:52:44.000Z (about 7 years ago)
- Default Branch: master
- Last Pushed: 2021-03-29T16:43:42.000Z (almost 4 years ago)
- Last Synced: 2024-04-14T06:37:59.089Z (9 months ago)
- Topics: crawler, crawling, downloader, python, python3, scraper, scraping, video
- Language: Python
- Homepage:
- Size: 2.31 MB
- Stars: 817
- Watchers: 48
- Forks: 140
- Open Issues: 21
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Unmaintained
Sorry for this.
For a similar project that is still actively developed, try Annie:
# Lulu
[![PyPI](https://img.shields.io/pypi/v/lulu.svg)](https://pypi.python.org/pypi/lulu/)
[![Build Status](https://travis-ci.org/iawia002/Lulu.svg?branch=master)](https://travis-ci.org/iawia002/Lulu)
[![Build status](https://ci.appveyor.com/api/projects/status/ph9ypng9g01mdr6d/branch/master?svg=true)](https://ci.appveyor.com/project/iawia002/lulu/branch/master)
[![codecov](https://codecov.io/gh/iawia002/Lulu/branch/master/graph/badge.svg)](https://codecov.io/gh/iawia002/Lulu)Lulu is a friendly [you-get](https://github.com/soimort/you-get) fork (โฌ Dumb downloader that scrapes the web).
## Why fork?
Faster updates## Installation
### PrerequisitesThe following dependencies are required and must be installed separately.
* **[Python 3.4+](https://www.python.org/downloads/)**
* **[FFmpeg](https://www.ffmpeg.org/)** (strongly recommended) or [Libav](https://libav.org/)
* (Optional) [RTMPDump](https://rtmpdump.mplayerhq.hu/)### Install via pip
$ pip3 install lulu
upgrade:
$ pip3 install -U lulu
## Get Started
Here's how you use `Lulu` to download a video from [Bilibili](https://www.bilibili.com/video/av18295259/):
```console
$ lulu https://www.bilibili.com/video/av18295259/
site: Bilibili
title: ใไธญๆๅ ซ็บงใไฟ็ฝๆฏไบบ็ๅๅญ่ถ ไนไฝ ็ๆณ่ฑก
stream:
- format: flv720
container: flv
size: 175.4 MiB (183914793 bytes)
# download-with: lulu --format=flv720 [URL]Downloading ใไธญๆๅ ซ็บงใไฟ็ฝๆฏไบบ็ๅๅญ่ถ ไนไฝ ็ๆณ่ฑก.flv ...
100% (175.4/175.4MB) โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค[1/1] 3 MB/sDownloading ใไธญๆๅ ซ็บงใไฟ็ฝๆฏไบบ็ๅๅญ่ถ ไนไฝ ็ๆณ่ฑก.cmt.xml ...
```### Download a video
When you get a video of interest, you might want to use the `--info`/`-i` option to see all available quality and formats:
```
$ lulu -i 'https://www.youtube.com/watch?v=jNQXAC9IVRw'
site: YouTube
title: Me at the zoo
streams: # Available quality and codecs
[ DEFAULT ] _________________________________
- itag: 43
container: webm
quality: medium
size: 0.5 MiB (564215 bytes)
# download-with: lulu --itag=43 [URL]- itag: 18
container: mp4
quality: medium
# download-with: lulu --itag=18 [URL]- itag: 5
container: flv
quality: small
# download-with: lulu --itag=5 [URL]- itag: 36
container: 3gp
quality: small
# download-with: lulu --itag=36 [URL]- itag: 17
container: 3gp
quality: small
# download-with: lulu --itag=17 [URL]
```The format marked with `DEFAULT` is the one you will get by default. If that looks cool to you, download it:
```
$ lulu 'https://www.youtube.com/watch?v=jNQXAC9IVRw'
site: YouTube
title: Me at the zoo
stream:
- itag: 43
container: webm
quality: medium
size: 0.5 MiB (564215 bytes)
# download-with: lulu --itag=43 [URL]Downloading zoo.webm ...
100.0% ( 0.5/0.5 MB) โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค[1/1] 7 MB/sSaving Me at the zoo.en.srt ...Done.
```(If a YouTube video has any closed captions, they will be downloaded together with the video file, in SubRip subtitle format.)
Or, if you prefer another format (mp4), just use whatever the option `lulu` shows to you:
```
$ lulu --itag=18 'https://www.youtube.com/watch?v=jNQXAC9IVRw'
```**Note:**
* At this point, format selection has not been generally implemented for most of our supported sites; in that case, the default format to download is the one with the highest quality.
* `ffmpeg` is a required dependency, for downloading and joining videos streamed in multiple parts (e.g. on some sites like Youku), and for YouTube videos of 1080p or high resolution.
* If you don't want `lulu` to join video parts after downloading them, use the `--no-merge`/`-n` option.### Download anything else
If you already have the URL of the exact resource you want, you can download it directly with:
```
$ lulu https://stallman.org/rms.jpg
Site: stallman.org
Title: rms
Type: JPEG Image (image/jpeg)
Size: 0.06 MiB (66482 Bytes)Downloading rms.jpg ...
100.0% ( 0.1/0.1 MB) โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค[1/1] 127 kB/s
```Otherwise, `lulu` will scrape the web page and try to figure out if there's anything interesting to you:
```
$ lulu http://kopasas.tumblr.com/post/69361932517
Site: Tumblr.com
Title: kopasas
Type: Unknown type (None)
Size: 0.51 MiB (536583 Bytes)Site: Tumblr.com
Title: tumblr_mxhg13jx4n1sftq6do1_1280
Type: Portable Network Graphics (image/png)
Size: 0.51 MiB (536583 Bytes)Downloading tumblr_mxhg13jx4n1sftq6do1_1280.png ...
100.0% ( 0.5/0.5 MB) โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค[1/1] 22 MB/s
```**Note:**
* This feature is an experimental one and far from perfect. It works best on scraping large-sized images from popular websites like Tumblr and Blogger, but there is really no universal pattern that can apply to any site on the Internet.
### Pause and resume a download
You may use Ctrl+C to interrupt a download.
A temporary `.download` file is kept in the output directory. Next time you run `lulu` with the same arguments, the download progress will resume from the last session. In case the file is completely downloaded (the temporary `.download` extension is gone), `lulu` will just skip the download.
To enforce re-downloading, use the `--force`/`-f` option. (**Warning:** doing so will overwrite any existing file or temporary file with the same name!)
### Multi-Thread Download
Use `-T/--thread number` option to enable multithreading to download(only works for multiple-parts video), `number` means how many threads you want to use.
### Proxy settings
You may specify an HTTP proxy for `lulu` to use, via the `--http-proxy`/`-x` option:
```
$ lulu -x 127.0.0.1:8087 'https://www.youtube.com/watch?v=jNQXAC9IVRw'
```However, the system proxy setting (i.e. the environment variable `http_proxy`) is applied by default. To disable any proxy, use the `--no-proxy` option.
**Tips:**
* If you need to use proxies a lot (in case your network is blocking certain sites), you might want to use `lulu` with [proxychains](https://github.com/rofl0r/proxychains-ng) and set `alias lulu="proxychains -q lulu"` (in Bash).
* For some websites (e.g. Youku), if you need access to some videos that are only available in mainland China, there is an option of using a specific proxy to extract video information from the site: `--extractor-proxy`/`-y`.### Load cookies
Not all videos are publicly available to anyone. If you need to log in your account to access something (e.g., a private video), it would be unavoidable to feed the browser cookies to `lulu` via the `--cookies`/`-c` option.
**Note:**
* As of now, we are supporting two formats of browser cookies: Mozilla `cookies.sqlite` and Netscape `cookies.txt`.
### Watch a video
Use the `--player`/`-p` option to feed the video into your media player of choice, e.g. `mplayer` or `vlc`, instead of downloading it:
```
$ lulu -p vlc 'https://www.youtube.com/watch?v=jNQXAC9IVRw'
```Or, if you prefer to watch the video in a browser, just without ads or comment section:
```
$ lulu -p chromium 'https://www.youtube.com/watch?v=jNQXAC9IVRw'
```**Tips:**
* It is possible to use the `-p` option to start another download manager, e.g., `lulu -p uget-gtk 'https://www.youtube.com/watch?v=jNQXAC9IVRw'`, though they may not play together very well.
### Set the path and name of downloaded file
Use the `--output-dir`/`-o` option to set the path, and `--output-filename`/`-O` to set the name of the downloaded file:
```
$ lulu -o ~/Videos -O zoo.webm 'https://www.youtube.com/watch?v=jNQXAC9IVRw'
```**Tips:**
* These options are helpful if you encounter problems with the default video titles, which may contain special characters that do not play well with your current shell / operating system / filesystem.
* These options are also helpful if you write a script to batch download files and put them into designated folders with designated names.### Reuse extracted data
Use `--url`/`-u` to get a list of downloadable resource URLs extracted from the page. Use `--json` to get an abstract of extracted data in the JSON format.
**Warning:**
* For the time being, this feature has **NOT** been stabilized and the JSON schema may have breaking changes in the future.
### Search on Google Videos and download
You can pass literally anything to `lulu`. If it isn't a valid URL, `lulu` will do a Google search and download the most relevant video for you. (It might not be exactly the thing you wish to see, but still very likely.)
```
$ lulu "Richard Stallman eats"
```## Supported Sites
| Site | URL | Videos? | Images? | Audios? |
| :--: | :-- | :-----: | :-----: | :-----: |
| **YouTube** | |โ| | |
| **Twitter** | |โ|โ| |
| VK | |โ|โ| |
| Vine | |โ| | |
| Vimeo | |โ| | |
| Vidto | |โ| | |
| Videomega | |โ| | |
| Veoh | |โ| | |
| **Tumblr** | |โ|โ|โ|
| TED | |โ| | |
| SoundCloud | | | |โ|
| SHOWROOM | |โ| | |
| Pinterest | | |โ| |
| MusicPlayOn | |โ| | |
| MTV81 | |โ| | |
| Metacafe | |โ| | |
| Magisto | |โ| | |
| Khan Academy | |โ| | |
| Internet Archive | |โ| | |
| **Instagram** | |โ|โ| |
| InfoQ | |โ| | |
| Imgur | | |โ| |
| Heavy Music Archive | | | |โ|
| **Google+** | |โ|โ| |
| Freesound | | | |โ|
| Flickr | |โ|โ| |
| FC2 Video | |โ| | |
| Facebook | |โ| | |
| eHow | |โ| | |
| Dailymotion | |โ| | |
| Coub | |โ| | |
| CBS | |โ| | |
| Bandcamp | | | |โ|
| AliveThai | |โ| | |
| **755
ใใใดใผใดใผ** | |โ|โ| |
| **niconico
ใใณใใณๅ็ป** | |โ| | |
| **163
็ฝๆ่ง้ข
็ฝๆไบ้ณไน** |
|โ| |โ|
| 56็ฝ | |โ| | |
| **AcFun** | |โ| | |
| **Baidu
็พๅบฆ่ดดๅง** | |โ|โ| |
| ็็ฑณ่ฑ็ฝ | |โ| | |
| **bilibili
ๅๅฉๅๅฉ** | |โ| | |
| Dilidili | |โ| | |
| ่ฑ็ฃ | |โ| | |
| ๆ้ฑผ | |โ| | |
| Panda
็็ซ | |โ| | |
| ๅคๅฐ่ง้ข | |โ| | |
| ้ฃ่ก็ฝ | |โ| | |
| iQIYI
็ฑๅฅ่บ | |โ| | |
| ๆฟๅจ็ฝ | |โ| | |
| ้ ท6็ฝ | |โ| | |
| ้ ท็้ณไน | | | |โ|
| ้ ทๆ้ณไน | | | |โ|
| ไน่ง็ฝ | |โ| | |
| ่ๆFM | | | |โ|
| ็งๆ | |โ| | |
| ๅฐๅ็ง | |โ| | |
| ็ๅฎข้ฆ | |โ| | |
| PPTV่ๅ | |โ| | |
| ้ฝ้ฒ็ฝ | |โ| | |
่ พ่ฎฏ่ง้ข | |โ| | |
| ไผ้น ็ดๆญ | |โ| | |
| Sina
ๆฐๆตช่ง้ข
ๅพฎๅ็งๆ่ง้ข |
|โ| | |
| Sohu
ๆ็่ง้ข | |โ| | |
| **Tudou
ๅ่ฑ** | |โ| | |
| ่พ็ฑณ | |โ| |โ|
| ้ณๅ ๅซ่ง | |โ| | |
| **้ณๆฆTai** | |โ| | |
| **Youku
ไผ้ ท** | |โ| | |
| ๆๆTV | |โ| | |
| ๅคฎ่ง็ฝ | |โ| | |
| ่ฑ็ฃ | | |โ| |
| Naver
๋ค์ด๋ฒ | |โ| | |
| ่ๆTV | |โ| | |
| ็ซ็ซTV | |โ| | |
| ๅ จๆฐ็ดๆญ | |โ| | |
| ้ณๅ ๅฎฝ้ข็ฝ | |โ| | |
| ่ฅฟ็่ง้ข | |โ| | |
| ๅฟซๆ | |โ|โ| |
| ๆ้ณ | |โ| | |
| ้พ็ ็ดๆญ | |โ| | |
| ๅๆฌกๅ | | |โ| |
| pixivision | | |โ| |For all other sites not on the list, the universal extractor will take care of finding and downloading interesting resources from the page.
## Development
### Preparation
Install [pipenv](https://github.com/pypa/pipenv):
$ pip3 install pipenv
and [fabric](https://github.com/fabric/fabric) (**Note: fabric doesn't support python3 now, install using pip2**):
$ pip install fabric
Initialize virtualenv
$ pipenv --python 3
Install all dependencies:
$ pipenv install --dev
Use the shell:
$ pipenv shell
Run the tests:
$ fab test
### Contributing
Lulu is an open source project and welcome contributions ๐
#### Note
[@iawia002](https://github.com/iawia002) has [pep8](https://www.python.org/dev/peps/pep-0008) obsessive-compulsive disorder, all code must follow [pep8](http://pep8.org) guidelines.
You can use [flake8](https://github.com/PyCQA/flake8) to check the code before submitting.
## Authors
You can find the [list of all contributors](https://github.com/iawia002/Lulu/graphs/contributors) here.
## License
MIT