Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/Nandaka/PixivUtil2
Download images from Pixiv and more!
https://github.com/Nandaka/PixivUtil2
Last synced: 2 months ago
JSON representation
Download images from Pixiv and more!
- Host: GitHub
- URL: https://github.com/Nandaka/PixivUtil2
- Owner: Nandaka
- License: bsd-2-clause
- Created: 2011-11-28T01:14:34.000Z (about 13 years ago)
- Default Branch: master
- Last Pushed: 2024-11-02T23:30:32.000Z (2 months ago)
- Last Synced: 2024-11-03T00:19:28.266Z (2 months ago)
- Language: Python
- Homepage: http://nandaka.devnull.zone/
- Size: 13.9 MB
- Stars: 2,381
- Watchers: 99
- Forks: 255
- Open Issues: 120
-
Metadata Files:
- Readme: readme.md
- Changelog: changelog.txt
- License: LICENSE
Awesome Lists containing this project
- awesome-acg - PixivUtil (Pixiv Downloader) - Downloader and tag manager for [Pixiv](https://www.pixiv.net/). [English] (Downloaders)
- awesome-hacking-lists - Nandaka/PixivUtil2 - Download images from Pixiv and more! (Python)
README
# Requirements:
- Running from Windows binary:
- minimum Windows 10 with latest updates installed.- Running from source code:
- Python 3.8.0+ (https://www.python.org/)
- Additional library listed in requirements.txt
- IDE Environment: see https://github.com/Nandaka/PixivUtil2/wiki/IDE-Enviroment-(Windows)- Dependent software
- FFmpeg (https://www.ffmpeg.org/) - used for converting ugoira to video.
- [VC++ Redistributable](https://visualstudio.microsoft.com/downloads/#microsoft-visual-c-redistributable-for-visual-studio-2019) - Needed for pyexiv2 to write XMP metadata in Windows (if enabled).# Capabilities:
- Download by member_id
- Download by image_id
- Download by tags
- Download from list (list.txt)
- Download from bookmarked artists (/bookmark.php?type=user)
including private/hidden bookmarks.
- Download from bookmarked images (/bookmark.php)
including private/hidden bookmarks.
- Download from tags list (tags.txt)
- Download new illustrations from bookmarked artist (/bookmark_new_illust.php)
- Download by Title/Caption
- Download by Tag and Member Id
- Download Member Bookmark (/bookmark.php?id=)
- Download by Group Id
- Download from supported artists (FANBOX)
- Download by artist/creator id (FANBOX)
- Download by post id (FANBOX)
- Download from followed artists (FANBOX)
- Re-encoding of all ugoira present in folder
- Batch Download from batch_job.json (experimental)
See https://github.com/Nandaka/PixivUtil2/wiki/Using-Batch-Job-(Experimental)
- Manage database:
- Show all member
- Show all downloaded images
- Export list (member_id only)
- Export list (detailed)
- Export local database (image_id)
- Show member by last downloaded date
- Show image by image_id
- Show member by member_id
- Show image by member_id
- Delete member by member_id
- Delete image by image_id
- Delete member and image (cascade deletion)
- Blacklist image by image_id
- Show all deleted member
- Export FANBOX post list
- Delete FANBOX download history by member_id
- Delete FANBOX download history by post_id
- Delete Sketch download history by member_id
- Delete Sketch download history by post_id
- Clean Up Database (remove db entry if downloaded file is missing)
- Export user bookmark (member_id) to a text files.# Docker
```sh
$ docker build -t pixivutil2 .
$ docker run -it --rm \
-v $(pwd):/workdir \
-w /workdir \
pixivutil2 \
/bin/bash -c "python PixivUtil2.py"
```# WARNING
Overusage can lead to Pixiv blocking your IP for a few hours.# FAQs
## A. Usage
```
Q1. How to paste Japanese tags to the console window?
- Click the top-left icon -> select Edit -> Paste (Cannot use Ctrl-V), if
it show up as question mark -> Change the Language for non-Unicode
program to Japanese (google it).
- or use online url encoder (http://meyerweb.com/eric/tools/dencoder/)
and paste the encoded tag back to the console.
- or paste it to tags.txt and select download by tags list. Separate each
tags with space, and separate with new line for new query.Q2. My password doesn't show up in the console!
- This is normal. The program still reads it.
- or you can put in the config.ini if not sure.Q3. I cannot login to Pixiv!
- Check your password.
- Try to login to the Pixiv Website.
- Try to use the config.ini on the [Authentication] section.
- Check your date and time setting (e.g.: https://www.timeanddate.com/)
- Disable Daylight Saving Time and try again.
- Copy your session values from browser:
1. Open Firefox.
2. Go to Pixiv website and login, remember to enable [Remember Me]
check box.
3. Press F12 to open Developer Tools, and select the Storage tab.
4. Click the Cookies and select for the pixiv.net.
5. Look for Cookie named = PHPSESSID.
6. Copy the content value. https://imgur.com/a/BppHOoQ
7. Open config.ini, go to [Authentication] section, paste the value
to cookie. https://imgur.com/VB2g3qnQ4. PixivUtil working from local terminal on Linux box but not working when I
used SSH with PuTTY!
- export LANG=en_US.UTF-8. PuTTY does not set locales right, when they are
not set, python does not know what to write (Thanks to nho!)
- ... and export PYTHONIOENCODING=utf-8, so it can create DB and populate
it properly (Thanks to Mailia!)Q5. How to delete member id from Database?
- Open the application and choose Manage Database (d) then select delete
Member by Member Id.
- Open the database (db.sqlite) directly using sqlite browser and use sql
command to delete it.
- If you are downloading using Download from List.txt (3), you can create
ignore_list.txt to skip the member id.Q6. The app doesn't download all the images! (I want to download SFW images too).
- Pixiv only allow to search up to 1000 pages if you don't have Pixiv
Premium.
- Check your pixiv website settings (refer to https://goo.gl/gQi09v),
then delete the cookie value in config.ini and retry.
- Check the value of r18mode in config.ini. Setting it to True will only
download R-18 images.Q7. The apps show square/question mark texts in the console output!
- This is because your Windows is not set to Japanese for the Regional Settings
in control panel.
- Since 20161114+ version, you need to set the console font properties to
use font with unicode support (e.g. Arial Unicode, MS Gothic).Q8. Where to get FFmpeg software? How to enable `createwebm`?
- Download the stable version of FFmpeg from https://www.ffmpeg.org/download.html.
- For Windows:
- Extract the archive to a folder.
- Open the extracted folder and open to the `/bin` folder.
- Copy the application `ffmpeg.exe` to your PixivUtil2 folder.
- For Linux:
- Install the package using your favorite package manager.Q9. The downloaded images are corrupted, how to redownload it again?
- You can delete the download history in databases by manually delete the image id
from databases (enter d, followed by 10).
- Or, you can set alwaysCheckFileSize = True and verifyimage = True in config.ini
and retry the download.
Q10. I got this error またはメールアドレス、パスワードが正しいかチェックしてください。
- Use your email address for the username, or check your password in config.iniQ11. Older windows support (e.g. Win7)?
- You can try to run from source code with the latest supported python 3.x.
See the instruction here: https://github.com/Nandaka/PixivUtil2/wiki/IDE-Enviroment-(Windows)```
## B.Bugs/Source Code/Supports
```
Q1. Where I can report bugs?
- Please report any bug to https://github.com/Nandaka/PixivUtil2/issues.Q2. Where I can support/donate to you?
- You can send it to my PayPal account (nchek2000[at]gmail[dot]com).
- or visit https://bit.ly/PixivUtilDonation.Q3. I want to use/modify the source code!
- Feel free to use/modify the source code as long you give credit to me
and make the modificated source code open.
- if you want to add feature/bug fix, you can do fork the repository in
https://github.com/Nandaka/PixivUtil2 and issue Pull Requests.Q4. I got ValueError: invalid literal for int() with base 10: ''
- Please modify _html.py from mechanize library, search for
'def unescape_charref(data, encoding):' and replace with patch in
https://pastebin.com/5bT5HFkb.Q5. I got ' module no found error'
- Download the library from the source (see links from the Requirements
section) and copy the file into your Lib\site-packages directory.
- Or use pip install (google on how to use).
```
## C.Log Messages
```
Q1: HTTPError: HTTP Error 404: Not Found
- This is because the file doesn't exist in the pixiv server, usually
because there is no big images version for the manga mode (currently the
apps will try to download the big version first then try the normal size
if failed, this is only for the manga mode and it is normal).Q2: Error at process_image(): (, WindowsError
(32, 'Prosessi ei voi kayttaa tiedostoa, koska se on toisen prosessin
kaytossa')
- The file is being used by another process (google translate). Either you
ran multiple instace of Pixiv downloader from the same folder, or there
are other processes locking the file/db.sqllite (usually from antivirus
or some sync/backup application).Q3: Error at process_image(): (,
AttributeError ("'NoneType' object has no attribute 'find'",)
- Usually this is because of failed login (cookie not valid). Try to change
your password to simple one for testing, or copy the cookie from browser:
1. Open Firefox/Chrome.
2. Login to your Pixiv.
3. On Pixiv page, press F12 and choose the Storage tab (Firefox), or
Right click on the leftmost address bar/the (i) icon (Chrome)
5. Click the View Cookies button.
6. Look for Cookie named = PHPSESSID.
7. Copy the content value.
8. Open config.ini, go to [Authentication] section, paste the value to
cookie.
- Or because Pixiv has changed the layout code, so the Pixiv
downloader cannot parse the page correctly. Please tell me by posting a
comment if this happens and include the details, such as the member/image
id, dump html, and log file (check on the application folder).Q4: URLError:
- Update version to > pixivutil20221029.
- This is because the Pixiv downloader cannot resolve the address to
download the images, please try to restart the network connection or do
ipconfig /flushdns to refresh the dns cache (windows).Q5: Error at download_image(): (, timeout('timed out',)
- This is because the Pixiv downloader didn't receive any reply for
specified time in config.ini from Pixiv. Please retry the download again
later.Q6: httperror_seek_wrapper: HTTP Error 403: request disallowed by robots.txt
- Set userobots = False in config.ini
```# Command Line Option
Please refer run with `--help` for latest information.
```
-h, --help show this help message and exit
-s STARTACTION, --startaction=STARTACTION
Action you want to load your program with:
1 - Download by member_id
(required: list of member_ids separated by space
optional: --include_sketch to also download Pixiv Sketch)
2 - Download by image_id
(required: followed by image_ids separated by space)
3 - Download by tags
(required: tags
optional: --use_wildcard_tag, --sp=START_PAGE, and --ep=END_PAGE, --start_date, --end_date)
4 - Download from list
(required: -f LIST_FILE and followed with optional tag)
5 - Download from user bookmark
(optional: -p BOOKMARK_FLAG [y/n/o] for private bookmark, --sp=START_PAGE, and --ep=END_PAGE)
6 - Download from image bookmark
(required: -p BOOKMARK_FLAG [y/n/o] for private bookmark
optional: --sp=START_PAGE, and --ep=END_PAGE, and followed with tag)
7 - Download from tags list
(required: -f LIST_FILE,
optional: --sp=START_PAGE, and --ep=END_PAGE, --start_date, --end_date)
8 - Download new illust from bookmark
(optional: --sp=START_PAGE, and --ep=END_PAGE)
9 - Download by Title/Caption
(required: title/caption
optional: --sp=START_PAGE, and --ep=END_PAGE, --start_date, --end_date)
10 - Download by Tag and Member Id
(required: member_id, followed by tags
optional: --sp=START_PAGE, and --ep=END_PAGE)
11 - Download Member's Bookmarked Images
(required: followed by member_ids separated by space)
12 - Download by Group ID
(required: Group ID, limit, and process external[y/n])
13 - Download by Manga Series ID
(required: Manga Series ID separated by space
optional: --sp=START_PAGE, and --ep=END_PAGE)
f1 - Download from supported artists (FANBOX)
(optional: End Page)
f2 - Download by artist/creator id (FANBOX)
(required: artist(digits only)/creator ids separated by space,
optional: end page)
f3 - Download by post id (FANBOX)
(required: post ids, separated with space)
f4 - Download from followed artists (FANBOX)
(optional: End Page)
f5 - Download from custom artist list (FANBOX)
(optional: End page, path to list)
b - Batch Download from batch_job.json (experimental)
(optional: --bf=BATCH_FILE)
l - Export local database image_id/post_id
(required: --up=USE_PIXIV, and --uf=USE_FANBOX, and --us=USE_SKETCH)
e - Export online bookmark
(required: -p BOOKMARK_FLAG [y/n/o] for private bookmark,
optional: --ef=EXPORT_FILENAME)
m - Export online user bookmark
(required: member_id, optional: --ef=EXPORT_FILENAME)
d - Manage database
-x, --exitwhendone Exit programm when done.
(only useful when DB-Manager)
-i, --irfanview start IrfanView after downloading images using
downloaded_on_%date%.txt
-n NUMBEROFPAGES, --numberofpages=NUMBEROFPAGES
temporarily overwrites numberOfPage set in config.ini
-c [PATH], --config [PATH] provide different config.ini
```# Error Codes
- 100 = Not Logged in.
- 1001 = User ID not exist/deleted.
- 1002 = User Account is Suspended.
- 1003 = Unknown Member Error.
- 1004 = No image found.
- 1005 = Cannot login.
- 2001 = Unknown Error in Image Page.
- 2002 = Not in MyPick List, Need Permission.
- 2003 = Public works can not be viewed by the appropriate level.
- 2004 = Image not found/already deleted.
- 2005 = Image is disabled for under 18, check your setting page (R-18/R-18G).
- 2006 = Unknown Image Error.
- 9000 = Download Failed.
- 9001 = Download Failed: Harddisk related.
- 9002 = Download Failed: Network related.
- 9005 = Server Error.# config.ini
## [Authentication]
- usernameYour pixiv username. Needed for OAuth. Please make sure the combination of username and password is valid in case of OAuth error. If you get error 103, please try changing username from pixiv ID to email address or the other way around.
- passwordYour pixiv password, in clear text! Needed for OAuth. Please make sure the combination of username and password is valid in case of OAuth error.
- cookieYour cookies for pixiv login, will be automatically updated in the login. See https://github.com/Nandaka/PixivUtil2/issues/814#issuecomment-711182644 for details.
- cookieFanboxCookie for fanbox.cc, normally no need to fill in.
- refresh_tokenUsed for OAuth refresh token to avoid relogin too many time. Automatically generated upon succesful OAuth login.
## [Pixiv]
- numberofpageNumber of page to be processed, put `0` to process all pages.
- r18modeOnly list images tagged R18, for member, member's bookmark, and search by tag. Set to `True` to enable.
- r18Type
Allow filtering for R-18 type (R-18 or R-18G)
Set `r18Type` with value `0` = both R18 and R-18G, `1` = only R18, or `2` = only R18G- dateformat
Pixiv DateTime format, leave blank to use default format (YYYY-MM-DD).
Refer to http://strftime.org/ for syntax. Quick Reference:
- %d = Day, %m = Month, %Y = Year (4 digit)
- %H = Hour (24h), %M = Minute, %S = Seconds- autoAddMember
Automatically save member id to db for all download.
- autoAddTag
Automatically add image tags for db for all downloads.
- autoAddCaption
Automatically save captions for db for all downloads.
- aiDisplayFewer
if true, filter out AI-generated images from downloading.
## [FANBOX]
- filenameFormatFanboxContentSimilar to filename format, but for files inside FANBOX posts.
- filenameFormatFanboxCoverSimilar to filename format, but for FANBOX post cover images
- filenameFormatFanboxInfoSimilar to filename format, but for info dumps.
- writeHtmlA switch to decide whether to write FANBOX posts into HTMLs or not.
- If set to `True`, article type posts will for sure be written into HTMLs, while non-article type posts are controlled with `minTextLengthForNonArticle` and `minImageCountForNonArticle`.
- If set to `False`, no post will be written into HTMLs.
- `filenameFormatFanboxInfo` will be used for filename.
- For HTML format, please refer to 'HTML Format' section
- minTextLengthForNonArticleWorks with `minImageCountForNonArticle`.
When 'writeHtml' is True, a non-article post should contain text longer than this value to be written into HTML.
- minImageCountForNonArticleWorks with `minTextLengthForNonArticle`.
When `writeHtml` is True, a non-article post should contain at least this many files/images to be written into HTML.
- useAbsolutePathsInHtmlSet to `True` to use absolute paths in HTMLs.
Set to `False` to use relative paths.
- downloadCoverWhenRestrictedSet to `True` to download FANBOX post cover images even if they are restricted.
- checkDBProcessHistory
Each FANBOX post has a updated_date value, which will be recorded/updated in database after it is processed.
- When this is `True`, the values in database would be checked when processing each post. If record is no earlier than the newly retrieved date, which means that the post has not been processed at all or changed since last time, the post would be skipped.
- When this is `False`, posts will be processed anyways.
- listPathFanboxThe list file for fanbox creators. One creator per line.
Doesn't support custom path.## [Network]
- useproxySet `True` to use proxy server, or `False` to disable it.
- proxyaddressProxy server address, use this format:
- `http://:@:` or
- `socks5://:@:` or
- `socks4://:@:`
- useragent
Browser user agent to spoof. You can check it from https://www.whatismybrowser.com/detect/what-is-my-user-agent
- userobotsDownload robots.txt for mechanize.
- timeoutTime to wait before giving up the connection, in seconds.
- retryNumber of retries.
- retrywaitWaiting time for each retry, in seconds.
- downloadDelaySet random delay up to n seconds for each image post.
Set to 0 to disable.
- checkNewVersionSet to `True` to check new releases in github.
- notifyBetaVersionSet to `False` to ignore beta releases.
- openNewVersionSet to `False` to disable opening new releases in browser.
- enableSSLVerificationEnable SSL verication, only set to `False` if you always encounter SSL Error (this disable the security)
## [Debug]
- logLevelSet log level, valid values are CRITICAL, ERROR, WARNING, INFO, DEBUG, and NOTSET
- enableDumpEnable HTML Dump. Set to False to disable.
- skipDumpFilterSkip HTML Dump based on error code (using regex format).
E.g.: 1.*|2.* => skip all HTML dump for error code 1xxx/2xxx.
- dumpMediumPageDump all medium page for debugging. Set to True to enable.
- dumpTagSearchPageDump tags search page for debugging.
- debughttpPrint http header, useful for debuggin. Set 'False' to disable.
## [IrfanView]
- IrfanViewPathSet directory where IrfanView is installed (needed to start IrfanView)
- startIrfanViewSet to `True` to start IrfanView with downloaded images when exiting pixivUtil
- This will create download-lists
- Be sure to set IrfanView to load Unicode-Plugin on startup when there are unicode-named files!
- startIrfanSlideSet to `True` to start IrfanView-Slideshow with downloaded images when exiting pixivUtil.
- This will create download-lists
- Be sure to set IrfanView to load Unicode-Plugin on startup when there are unicode-named files!
- Slideshow-options will be same as you have set in IrfanView before!
- createDownloadListsSet to `True` to automatically create download-lists.
## [Settings]
- downloadlistdirectorylist.txt path, also used for download-lists needed for `createDownloadLists` and IrfanView-Handling
If leaved blank it will create download-lists in pixivUtil-directory.
- uselistSet to `True` to parse list.txt.
This will update the DB content from the list.txt (member_id and custom folder).
- processfromdbSet `True` to use the member_id from the DB.
- rootdirectoryYour root directory for saving the images.
- downloadavatarSet to `True` to download the member avatar as 'folder.jpg'
- usesuppresstagsRemove the suppressed tags from %tags% meta for filename.
The list is taken from suppress_tags.txt, each tags is separated by new line.
- tagsLimitNumber of tags to be used for %tags% meta in filename.
Use -1 to use all tags.
- writeImageJSONSet to `True` to export the compact image information to JSON file.
The filename is following `filename(Manga)Infoformat` + .json.
If you want the original info from source, use with `writeRawJSON`.
- writeimageinfoSet to `True` to export the compact image information to text file.
The filename is following `filename(Manga)Infoformat` + .txt.
If you want the original info from source, use with `writeRawJSON`.
- writeRawJSONSet to `True` to export the original JSON untouched of the image for `writeImageJSON`.
- RawJSONFilterEnter the JSON keys which you want to filter out for `writeRawJSON`. Keys are seperated by a comma.
- includeSeriesJSONSet to `True` to export the series information to JSON. Non-series artwork doesn't have this info.
The filename is following `filenameSeriesJSON` + .json.
- writeImageXMPSet to `True` to export the image information to a .XMP sidecar file, this does not add XMP metadata to the image header.
- writeImageXMPPerImageSet to `True` to export the image information to a .XMP sidecar file, one per image in the album. The data contained within the file is the same but some software requires matching file names to detect the metadata. If set to `True`, then `writeImageXMP` is ignored.
Additionally, enabling this option will create a .XMP sidecar for every ugoira encoding enabled, and allow you to customise the name of each file using `%image_ext%`. For example, if you enable `createWebp` and `createGif`, then set your `filenameInfoFormat` to something like `%urlFilename%.%image_ext%`, then you will end up with `.gif.xmp` and `.webp.xmp` files created.
- verifyimageCheck if downloaded files are valid image or zip. Set the value to `True` to enable.
- writeUrlInDescriptionWrite all url found in the image description to a text file at the root directory. Set to `True` to enable. The list will be saved to to the application folder as url_list_.txt
- stripHTMLTagsFromCaptionRemove all HTML tags and their contents from the image caption/description when writing metadata to files. The contents of any links will be lost, so consider enabling writeUrlInDescription to retain them.
- urlBlacklistRegex
Used to filter out the url in the description using regular expression.
- dbPathUse different database.
- setLastModifiedSet last modified timestamp based on pixiv upload timestamp to the file.
- useLocalTimezoneUse local timezone in the .txt file of `writeimageinfo` and .XMP file of `writeImageXMP`.
- defaultSketchOptionSkip the "Include Pixiv Sketch" prompt when downloading by `member_id` option by using a default option. Set the value to `y` to always include sketches or `n` to exclude sketches from the download.
## [DownloadControl]
- minFileSizeSkip if file size is less than minFileSize, set `0` to disable.
- maxFileSizeSkip if file size is more than minFileSize, set `0` to disable.
- checkLastModifiedIf the last-modified timestamp of the local files is the same with the uploaded date of the artwork, it'll log "match" and skip to process the current image_id.
Require `setlastmodified = True` in config.ini to work properly
- alwaysCheckFileSizeActually, it'll always check the file size. But if `this` is false, if the `overwrite` is also false and this file is recorded in db, it'll skip to process the current image_id.
This will override the image_id checking from db (always fetch the image page to check the remote size).
- overwriteIf is true, when found file size different, it'll just delete the file (unless the backupOldFile is true), then start to re-download the image.
- backupOldFileSet to True to backup old file if the file size is different.
Old filename will be renamed to filename.unix-time.extension.
- daylastupdatedOnly process member_id which were processed at least x days since the last check.
- checkUpdatedLimitJump to the next member id if already see n-number of previously downloaded images.
`alwaysCheckFileSize` must be set to False.
- useblacklisttagsSkip image if containing blacklisted tags.
The list is taken from `blacklist_tags.txt`, each tags is separated by new line.
- useblacklisttitlesSkip image if the title contains a blacklisted character sequence.
The list is taken from `blacklist_titles.txt`, each sequence is separated by new line.
- useblacklisttitlesregexMake the title blacklist check interpret each sequence as a regular expression.
- dateDiffProcess only new images within the given date difference.
Set `0` to disable. Skip to next member id if in 'Download by Member', stop processing if in 'Download New Illust' mode.
- enableInfiniteLoopEnable infinite loop for download by tags.
Only applicable for download in descending order (newest first).
- useBlacklistMembersSkip image by member id based on `blacklist_members.txt` in the same folder of the application.
- downloadResizedDownload the medium size, rather than the original size.
- skipUnknownSizeSkip downloading if the remote size is not known when `alwaysCheckFileSize` is set to True.
- enablePostProcessing
If true, it enabled post processing cmd for every downloaded files. Default: False.- postProcessingCmd
command to execute. add %filename% to pass the downloaded filename.
**NO ERROR HANDLING AT ALL, use on your own risk.**- extensionFilter
Provide a | seperated list of acceptable file extensions to download. Eg. jpg|png|gif|ugoira
- downloadBuffer
Download buffer before it write to disk in kiloByte, default is 512kB.
You can change it based on your download speed. Mainly useful for smoother progress bar.
Usually no need to change this value.## [FFmpeg]
- ffmpegffmpeg executable path.
- ffmpegcodecCodec to be used for encoding, default is using `libvpx-vp9`.
- ffmpegExt
The file extension (container format) to use for encoding. default: `webm`.
- ffmpegparamParameter to be used to encode webm, default: `-lossless 0 -crf 15 -b 0 -vsync 0`.
- mkvcodecCodec to be used for encoding mkv, default is using `copy`.
- mkvparamParameter to be used to encode mkv, default: ` `.
- avifcodecCodec to be used for encoding avif, default is using `libaom-av1`.
- avifparamParameter to be used to encode avif, default: `-cpu-used 4 -crf 0 -row-mt 1 -tile-columns 2 -tile-rows 2 -vsync 0`.
- webpcodecCodec to be used for encoding webm, default is using `libwebp`.
- webpparamParameter to be used to encode webm, default: `-lossless 0 -compression_level 5 -quality 100 -loop 0 -vsync 0`.
## [Ugoira]
- writeugoirainfoIf set to `True`, it will write the info of ugoira frames to a `filename(Manga)Infoformat`+.zip.js file. `writeImageJSON` contains this info as well.
- createugoiraIf set to `True`, it will create .ugoira file.
This is Pixiv own format for animated images. You can use Honeyview to see the animation.
- createmkvSet to True to create mkv file (video format). The default settings is lossless(no encoding), it will pack the images in the container. Very large file size.
Required `createUgoira = True` and ffmpeg executeable.
- createwebmSet to True to create webm file (video format). The default encoding settings is lossy encoding but high quality with smallest file size.
Required `createUgoira = True` and ffmpeg executeable.
- createwebpSet to True to create webp file (image format). The default encoding settings is lossy encoding but high quality with smaller file size.
Required `createUgoira = True` and ffmpeg executeable.
- creategifSet to True to convert ugoira file to gif. The default encoding settings is lossy encoding but moderate quality with smaller file size.
Required `createUgoira = True` and ffmpeg executeable.
- createapngSet to True to convert ugoira file to animated png. The default encoding settings is lossless encoding but very large file size.
Required `createUgoira = True` and ffmpeg executeable.
- createavifSet to True to convert ugoira file to avif. The default encoding settings is lossless encoding with comparable filesizes to webp.
Required `createUgoira = True` and ffmpeg executeable.
- deleteugoiraSet to True to delete the created .ugoira after conversion.
- deleteZipFileIf set to `True`, it will delete the orignal .zip (i.e. the actual image) file.
Only active if `createUgoira = True`.## [Filename]
- filenameformatThe format for the filename, reserved/illegal character will be replaced with underscore '_', repeated space will be trimmed to single space. The filename (+full path) will be trimmed to the first 250 character (Windows limitation).
Refer to Filename Format Syntax for available format.
- filenamemangaformatSimilar to filename format, but for manga pages.
- filenameinfoformatSimilar to filename format, but for info dumps.
- filenameSeriesJSONSimilar to filename format, but for series JSON dumps.
- avatarNameFormatSimilar to filename format, but for the avatar image.
Not all formats are available.
- backgroundNameFormatSimilar to filename format, but for the background image.
Not all formats are available.
- tagsseparatorSeparator for each tag in filename, put %space% for space and %ideo_space% for ideographic space (" ").
- createmangadirCreate a directory if the imageMode is manga. The directory is created by splitting the image_id by '_pxx' pattern.
This setting is depends on %urlFilename% format.
- usetagsasdirAppend the query tags in tagslist.txt to the root directory as save folder.
- urlDumpFilenameDefine the dump filename, use python strftime() format.
Default value is 'url_list_%Y%m%d'
- filenameFormatSketchSimilar to filename format, but for Pixiv Sketch.
- customBadCharsFor sanitizing filenames with custom rules. Supports regular expressions.
For detailed syntax, please refer to 'Bad chars' section.
- customCleanUpRe
TODO.# Filename Format Syntax
Available for filenameFormat, filenameMangaFormat, avatarNameFormat, filenameInfoFormat,
filenameFormatFanboxCover, filenameFormatFanboxContent and filenameFormatFanboxInfo:
```
-> %member_token%
Member token, might change.
-> %member_id%
Member id, in number.
-> %artist%
Artist name, might change too.
-> %urlFilename%
The actual filename stored in server without the file extensions.
-> %date%
Current date in YYYYMMMDD format.
-> %date_fmt{format}%
Current date using custom format.
Use Python string format notation, refer: https://goo.gl/3UiMAb
e.g. %date_fmt{%Y-%m-%d}%
-> %image_ext%
The image's file extension (jpg, png, etc.), the "." is not included.
The correct file extension is already appended to the end of all files.
This is available if you want to add more, or want to add the image's file extension to info files etc.
```
Available for filenameFormat and filenameMangaFormat:
```
-> %image_id%
Image id, in number. (Post id for FANBOX and sketches)
-> %title%
Image title, usually in japanese character.
-> %tags%
Image tags, usually in japanese character. (not implemented for FANBOX yet)
-> %works_date%
Works date, complete with time.
-> %works_date_only%
Only the works date.
-> %works_date_fmt{}%
works date using custom format.
Use Python string format notation, refer: https://goo.gl/3UiMAb
e.g. %works_date_fmt{%Y-%m-%d}%
-> %works_res%
Image resolution, will be containing the page count if manga.
-> %works_tools%
Tools used for the image.
-> %R-18%
Append R-18/R-18 based on image tag, can be used for creating directory
by appending directory separator, e.g.: %R-18%\%image_id%.
-> %page_big%
for manga mode, add big in the filename.
-> %page_index%
for manga mode, add page number with 0-index. It will auto-pad with 0 based on the total count.
-> %page_number%
for manga mode, add page number with 1-index. It will auto-pad with 0 based on the total count.
-> %bookmark%
for bookmark mode, add 'Bookmarks' string.
-> %original_member_id%
for bookmark mode, put original member id.
-> %original_member_token%
for bookmark mode, put original member token.
-> %original_artist%
for bookmark mode, put original artist name.
-> %searchTags%
for download by tags and bookmarked images, put searched tags.
-> %bookmark_count%
Bookmark count, will have overhead except on download by tags.
-> %image_response_count%
Image respose count, will have overhead except on download by tags.
-> %manga_series_order%
the order in the manga series.
-> %manga_series_id%
original manga series id.
-> %manga_series_title%
original manga series title, different from work title.
-> %AI%
Add 'AI' for AI-generated images (aiType==2).
```
Specific for PixivSketch (option 1 if PixivSketch included, s1, and s2 ):
```
-> %sketch_member_id%
Pixiv Sketch artist id, might be different from Pixiv's artist id.
```
Specific for Fanbox:
```
-> %fanbox_name%
Fanbox name, might be different from Pixiv's artist name.
Useful if the artist is suspended from Pixiv and there is no record in the DB to avoid interuption.
```
# list.txt Format
- This file should be build in the following way, white space will be trimmed,
see example:
```
member_id1 directory1
member_id2 directory2
...
#comment - lines starting with # will be ignored
```
- member_id = in number only
- directory = path to download-directory for member_id
- %root%\directory will save directory in rootFolder specified in config.ini
\directory will save the folder in the root of your PixivUtil-drive
- C:\directory will save the folder in drive C: (change to any other
drive as you wish)
- .\directory will save the folder in same directory as PixivUtil2.exe
- directory-path can end with \ or not- Examples for list:
```
### START EXAMPLE LIST####
# this is a comment line, lines starting with # will be ignored
# here is the first member:
123456
# you can see, the line has only the member id
# usually I use it the following way:
#
# username (so I can recognize it ;) )
123456
#
# next 2 lines contain a special folder for this member
123456 .\test
123456 ".\test"
# now all images from member no. 123456 will be safed in directory "test" in the
# same directory as PixivUtil2
# as you can see you can use it with "" or without ;)
#
# next will be stored at the same partition as PixivUtil, but the directory is
# located in root-part of it
123456 \test
123456 "\test"
# this will lead to "C:\test" when pixivUtil is located on "C:\"
#
# next line uses complete path to store the files
123456 F:\new Folder\test
123456 "F:\new Folder\test"
# this will set the folder everywhere on your partitions
#
123456 %root%\special folder
123456 "%root%\special folder"
# this will set the download location to "special folder" in your rootDirectory
# given in config
http://www.pixiv.net/member.php?id=123456
http://www.pixiv.net/member_illust.php?id=123456
# also support url format.
### END EXAMPLE LIST####
```# tags.txt Format
- This file will be used as source for Download from tags list (7)
- Separate tags with space, ensure to set Use Wildcard to 'y'.
- Each line will be treated as one search.
- Save the files with UTF-8 encoding.# suppress_tags.txt Format
- This file is used for suppressing the tags from being used in %tags%.
- If matches, the tags will be removed from filename.
- Each line is one tag only.
- Save the files with UTF-8 encoding# blacklist_tags.txt Format
- This file is used for tag blacklist checking for downloading image.
- If matches, the image will be skipped.
- Each line is one tag only.
- Save the files with UTF-8 encoding# blacklist_members.txt Format
- similar to list.txt, but without custom folder.# HTML Format
- A simple default format will be used when no 'template.html' is provided.
- Urls originally in the post will be overwritten with local paths.
- Currently available syntaxes are:
```
-> %coverImage%
A 'div' tag with its 'class' set to 'cover', and a child 'img' tag with
the url to the cover image as its 'src' attribute.
-> %coverImageUrl%
Simply the url to the cover image in clear text.
-> %artistName%
Same as %artist% in 'Filename Format Syntax' in clear text.
-> %imageTitle%"
Title of the post in clear text.
-> "%worksDate%"
Published date of the post in clear text.
-> %body_text(article)%
This works for article type posts only.
A 'div' tag with its 'class' set to 'article', and the post's content,
which is already formatted HTML if the post is article, as its inner text.
-> %images(non-article)%
This works for none-article type posts only.
A 'div' tag with its 'class' set to 'non-article images', and 'a' tags
of all files in the post as its children tokens.
For each 'a' tag, its 'href' would be url to the file, and the inner text
would be an 'img' tag with its 'src' set to the url to the file if the
file's extension is 'jpg', 'jpeg', 'png' or 'bmp'. Otherwise the inner text
would simply be the url to the file.
-> %text(non-article)%
This works for none-article type posts only.
A 'div' tag with its 'class' set to 'non-article text' and all paragraphs
of text put in 'p' tags as its children tokens.
```
- If there is a 'div' tag with 'main' in its 'class' in the template, 'article' or
'non-article' would be appended to its 'class' depending on the type of the post.# Bad chars
- Originally for removing single bad chars for use between different OSs.
- Now also supports strings and regular expressions.
- The value set in option `customBadChars` would be parsed from left to right.
- Currently available syntaxes are:
```
-> %replace(your_default_replace_with)%
Use this syntax to define default value to replace with.
If this syntax gets used multiple times in the option value, the first value would be used.
If this value is not set, "_" would be used.
-> %pattern(your_pattern)%
-> %replace(your_replace_with)%
Use these two syntaxes to set groups of rules. Supports regular expression.
You should not use "default" as group names, otherwise the first replace would
be parsed as default value to replace with, while the others would be ignored.
Groups with no "pattern" would be ignored.
Groups with no "replace" use default value.
If multiple "pattern"s or "replace"s share the same group name, the last value set
would be used.
```
- Chars/string not wrapped with syntaxes above would be considered single chars
to be replaced with global replacement char/string, "_" if unset.
- When configuration file gets written to file, `customBadChars` would be
replaced with parsed valid value. Single chars would be placed first, followed by
`%replace(your_default_replace_with)%`, and each group.
- Examples:
```
# If you just want to replace some single chars with "_"
\@[]
# If you want to replace them with "@":
\@[]%replace(@)%
# If you want to replace certain words:
# This example would first replace all "maze" with "labyrinth",
# then all "labyrinth" with "nevermind"
%pattern<1>(maze)%%replace<1>(labyrinth)%%pattern<2>(labyrinth)%%replace<2>(nevermind)%
# If you want to replace characters within certain unicode range,
# then remove all continuous "_"s with a single "_":
%pattern([\U0001d400-\U0001ffff])%%pattern<1>(_+)%%replace<1>(_)%
```# Development
PixivUtil2 posesses robust test suite. To run it, one needs pytest suite:
```
pip install --user pytestpytest -v ./test_*
```# Credits/Contributor
- Nandaka (Main Developer) - https://nandaka.devnull.zone
- Yavos (Contributor)
- Joe (Contributor)
- Hamuko
- Kwang Ketcham
- woky
- a.evseev
- pixtrix
- Abram Wiebe
- Masaki Takano
- hi117
- Wildfoot
- J.Gocke
- Magnus Boman
- Abdulah Jasim
- Yifei Fu
- nixxquality
- DukeValentine
- NHOrus
- whinette
- yzaoui
- Kieri Suizahn
- amatuerCoder
- Alex
- wmjdgla
- fireattack
- Jared Shields
- DenDen047
- Baa** If I forget someone, please send me a pull request with the commit/merge id.
# License Agreement
See LICENSE.[![Run on Repl.it](https://repl.it/badge/github/Nandaka/PixivUtil2)](https://repl.it/github/Nandaka/PixivUtil2)