{"id":14396140,"url":"https://github.com/deepghs/waifuc","last_synced_at":"2026-01-17T09:55:01.361Z","repository":{"id":176612178,"uuid":"659060752","full_name":"deepghs/waifuc","owner":"deepghs","description":"Efficient Train Data Collector for Anime Waifu","archived":false,"fork":false,"pushed_at":"2024-08-24T08:05:00.000Z","size":75737,"stargazers_count":371,"open_issues_count":16,"forks_count":31,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-08-24T11:05:01.142Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"https://deepghs.github.io/waifuc/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/deepghs.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-06-27T04:14:39.000Z","updated_at":"2025-08-21T20:06:03.000Z","dependencies_parsed_at":"2024-06-28T13:27:23.503Z","dependency_job_id":"8b2f7051-1fd3-4fbd-a867-bd0e779dfd40","html_url":"https://github.com/deepghs/waifuc","commit_stats":null,"previous_names":["deepghs/waifuc"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/deepghs/waifuc","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/deepghs%2Fwaifuc","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/deepghs%2Fwaifuc/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/deepghs%2Fwaifuc/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/deepghs%2Fwaifuc/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/deepghs","download_url":"https://codeload.github.com/deepghs/waifuc/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/deepghs%2Fwaifuc/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28505565,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-17T06:57:29.758Z","status":"ssl_error","status_checked_at":"2026-01-17T06:56:03.931Z","response_time":85,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-29T01:01:20.235Z","updated_at":"2026-01-17T09:55:01.344Z","avatar_url":"https://github.com/deepghs.png","language":"Python","funding_links":[],"categories":["Downloaders"],"sub_categories":[],"readme":"# waifuc\n\n[![PyPI](https://img.shields.io/pypi/v/waifuc)](https://pypi.org/project/waifuc/)\n![PyPI - Python Version](https://img.shields.io/pypi/pyversions/waifuc)\n![Loc](https://img.shields.io/endpoint?url=https://gist.githubusercontent.com/narugo1992/847b3edfcbae29b86b8b5d8b3dfb854f/raw/loc.json)\n![Comments](https://img.shields.io/endpoint?url=https://gist.githubusercontent.com/narugo1992/847b3edfcbae29b86b8b5d8b3dfb854f/raw/comments.json)\n\n[![Code Test](https://github.com/deepghs/waifuc/workflows/Code%20Test/badge.svg)](https://github.com/deepghs/waifuc/actions?query=workflow%3A%22Code+Test%22)\n[![Package Release](https://github.com/deepghs/waifuc/workflows/Package%20Release/badge.svg)](https://github.com/deepghs/waifuc/actions?query=workflow%3A%22Package+Release%22)\n[![codecov](https://codecov.io/gh/deepghs/waifuc/branch/main/graph/badge.svg?token=XJVDP4EFAT)](https://codecov.io/gh/deepghs/waifuc)\n\n[![Discord](https://img.shields.io/discord/1157587327879745558?style=social\u0026logo=discord\u0026link=https%3A%2F%2Fdiscord.gg%2FTwdHJ42N72)](https://discord.gg/TwdHJ42N72)\n![GitHub Org's stars](https://img.shields.io/github/stars/deepghs)\n[![GitHub stars](https://img.shields.io/github/stars/deepghs/waifuc)](https://github.com/deepghs/waifuc/stargazers)\n[![GitHub forks](https://img.shields.io/github/forks/deepghs/waifuc)](https://github.com/deepghs/waifuc/network)\n![GitHub commit activity](https://img.shields.io/github/commit-activity/m/deepghs/waifuc)\n[![GitHub issues](https://img.shields.io/github/issues/deepghs/waifuc)](https://github.com/deepghs/waifuc/issues)\n[![GitHub pulls](https://img.shields.io/github/issues-pr/deepghs/waifuc)](https://github.com/deepghs/waifuc/pulls)\n[![Contributors](https://img.shields.io/github/contributors/deepghs/waifuc)](https://github.com/deepghs/waifuc/graphs/contributors)\n[![GitHub license](https://img.shields.io/github/license/deepghs/waifuc)](https://github.com/deepghs/waifuc/blob/master/LICENSE)\n\nEfficient Train Data Collector for Anime Waifu.\n\n**This project is still under development, official version will be released soon afterwards.**\n\nIf you need to use it immediately, just clone it and run `pip install .`.\n\n## Installation\n\nPyPI version is not ready now, please install waifuc with source code.\n\n```shell\npip install git+https://github.com/deepghs/waifuc.git@main#egg=waifuc\n```\n\nIf your operating environment includes available CUDA, you can use the following installation command to achieve higher\n\n```shell\npip install git+https://github.com/deepghs/waifuc.git@main#egg=waifuc[gpu]\n```\n\nIf you need to process with videos, you can install waifuc with\n\n```shell\npip install git+https://github.com/deepghs/waifuc.git@main#egg=waifuc[video]\n```\n\nFor more information about installation, you can refer\nto [Installation](https://deepghs.github.io/waifuc/main/tutorials/installation/index.html).\n\n## An Example\n\n### Quickly Get Character's Dataset\n\nGrab surtr (arknights)'s dataset for LoRA Training\n\n```python\nfrom waifuc.action import NoMonochromeAction, FilterSimilarAction, \\\n    TaggingAction, PaddingAlignAction, PersonSplitAction, FaceCountAction, FirstNSelectAction, \\\n    CCIPAction, ModeConvertAction, ClassFilterAction, RandomFilenameAction, AlignMinSizeAction\nfrom waifuc.export import TextualInversionExporter\nfrom waifuc.source import GcharAutoSource\n\nif __name__ == '__main__':\n    # data source for surtr in arknights, images from many sites will be crawled\n    # all supported games and sites can be found at\n    # https://narugo1992.github.io/gchar/main/best_practice/supported/index.html#supported-games-and-sites\n    # ATTENTION: GcharAutoSource required `git+https://github.com/deepghs/waifuc.git@main#egg=waifuc[gchar]`\n    s = GcharAutoSource('surtr')\n\n    # crawl images, process them, and then save them to directory with given format\n    s.attach(\n        # preprocess images with white background RGB\n        ModeConvertAction('RGB', 'white'),\n\n        # pre-filtering for images\n        NoMonochromeAction(),  # no monochrome, greyscale or sketch\n        ClassFilterAction(['illustration', 'bangumi']),  # no comic or 3d\n        # RatingFilterAction(['safe', 'r15']),  # filter images with rating, like safe, r15, r18\n        FilterSimilarAction('all'),  # filter duplicated images\n\n        # human processing\n        FaceCountAction(1),  # drop images with 0 or \u003e1 faces\n        PersonSplitAction(),  # crop for each person\n        FaceCountAction(1),\n\n        # CCIP, filter the character you may not want to see in dataset\n        CCIPAction(min_val_count=15),\n\n        # if min(height, weight) \u003e 800, resize it to 800\n        AlignMinSizeAction(800),\n\n        # tagging with wd14 v2, if you don't need character tag, set character_threshold=1.01\n        TaggingAction(force=True),\n\n        PaddingAlignAction((512, 512)),  # align to 512x512\n        FilterSimilarAction('all'),  # filter again\n        FirstNSelectAction(200),  # first 200 images\n        # MirrorAction(),  # mirror image for data augmentation\n        RandomFilenameAction(ext='.png'),  # random rename files\n    ).export(\n        # save to surtr_dataset directory\n        TextualInversionExporter('surtr_dataset')\n    )\n\n```\n\n### Quick Crawl Images from Websites\n\nThe following code will give you 10 images of surtr (arknights) with metadata saved.\n\n```python\nfrom waifuc.action import HeadCountAction, AlignMinSizeAction\nfrom waifuc.export import SaveExporter\nfrom waifuc.source import DanbooruSource\n\nif __name__ == '__main__':\n    source = DanbooruSource(['surtr_(arknights)', 'solo'])\n    source.attach(\n        # only 1 head,\n        HeadCountAction(1),\n\n        # if shorter side is over 640, just resize it to 640\n        AlignMinSizeAction(640),\n    )[:10].export(  # only first 10 images\n        # save images (with meta information from danbooru site)\n        SaveExporter('/data/surtr_arknights')\n    )\n\n```\n\nAnd this is what's in `/data/surtr_arknights` afterwards\n\n![img.png](assets/danbooru_crawler_example.png)\n\nSimilarly, you can crawl from pixiv with similar code, just by changing the source\n\n```python\nfrom waifuc.action import HeadCountAction, AlignMinSizeAction, CCIPAction\nfrom waifuc.export import SaveExporter\nfrom waifuc.source import PixivSearchSource\n\nif __name__ == '__main__':\n    source = PixivSearchSource(\n        'アークナイツ (surtr OR スルト OR 史尔特尔)',\n        refresh_token='use_your_own_refresh_token'\n    )\n    source.attach(\n        # only 1 head,\n        HeadCountAction(1),\n\n        # pixiv often have some irrelevant character mixed in\n        # so CCIPAction is necessary here to drop these images\n        CCIPAction(),\n\n        # if shorter side is over 640, just resize it to 640\n        AlignMinSizeAction(640),\n    )[:10].export(  # only first 10 images\n        # save images (with meta information from danbooru site)\n        SaveExporter('/data/surtr_arknights_pixiv')\n    )\n```\n\nThis is what you can get at `/data/surtr_arknights_pixiv`\n\n![pixiv example](./assets/pixiv_crawler_example.png)\n\nHere is a list of website source we currently supported\n\n| Name                                              | Import Statement                              |\n|:--------------------------------------------------|:----------------------------------------------|\n| [ATFBooruSource](https://booru.allthefallen.moe)  | from waifuc.source import ATFBooruSource      |\n| [AnimePicturesSource](https://anime-pictures.net) | from waifuc.source import AnimePicturesSource |\n| [DanbooruSource](https://danbooru.donmai.us)      | from waifuc.source import DanbooruSource      |\n| [DerpibooruSource](https://derpibooru.org)        | from waifuc.source import DerpibooruSource    |\n| [DuitangSource](https://www.duitang.com)          | from waifuc.source import DuitangSource       |\n| [E621Source](https://e621.net)                    | from waifuc.source import E621Source          |\n| [E926Source](https://e926.net)                    | from waifuc.source import E926Source          |\n| [FurbooruSource](https://furbooru.com)            | from waifuc.source import FurbooruSource      |\n| [GelbooruSource](https://gelbooru.com)            | from waifuc.source import GelbooruSource      |\n| [Huashi6Source](https://www.huashi6.com)          | from waifuc.source import Huashi6Source       |\n| [HypnoHubSource](https://hypnohub.net)            | from waifuc.source import HypnoHubSource      |\n| [KonachanNetSource](https://konachan.net)         | from waifuc.source import KonachanNetSource   |\n| [KonachanSource](https://konachan.com)            | from waifuc.source import KonachanSource      |\n| [LolibooruSource](https://lolibooru.moe)          | from waifuc.source import LolibooruSource     |\n| [PahealSource](https://rule34.paheal.net)         | from waifuc.source import PahealSource        |\n| [PixivRankingSource](https://pixiv.net)           | from waifuc.source import PixivRankingSource  |\n| [PixivSearchSource](https://pixiv.net)            | from waifuc.source import PixivSearchSource   |\n| [PixivUserSource](https://pixiv.net)              | from waifuc.source import PixivUserSource     |\n| [Rule34Source](https://rule34.xxx)                | from waifuc.source import Rule34Source        |\n| [SafebooruOrgSource](https://safebooru.org)       | from waifuc.source import SafebooruOrgSource  |\n| [SafebooruSource](https://safebooru.donmai.us)    | from waifuc.source import SafebooruSource     |\n| [SankakuSource](https://chan.sankakucomplex.com)  | from waifuc.source import SankakuSource       |\n| [TBIBSource](https://tbib.org)                    | from waifuc.source import TBIBSource          |\n| [WallHavenSource](https://wallhaven.cc)           | from waifuc.source import WallHavenSource     |\n| [XbooruSource](https://xbooru.com)                | from waifuc.source import XbooruSource        |\n| [YandeSource](https://yande.re)                   | from waifuc.source import YandeSource         |\n| [ZerochanSource](https://www.zerochan.net)        | from waifuc.source import ZerochanSource      |\n\n### 3-Stage-Cropping of Local Dataset\n\nThis code is loading images from local directory, and crop the images with 3-stage-cropping method (head, halfbody, full\nperson), and then save it to another local directory.\n\n```python\nfrom waifuc.action import ThreeStageSplitAction\nfrom waifuc.export import SaveExporter\nfrom waifuc.source import LocalSource\n\nsource = LocalSource('/your/path/contains/images')\nsource.attach(\n    ThreeStageSplitAction(),\n).export(SaveExporter('/your/output/path'))\n\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdeepghs%2Fwaifuc","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdeepghs%2Fwaifuc","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdeepghs%2Fwaifuc/lists"}