{"id":15029775,"url":"https://github.com/daniel-liu-c0deb0t/uwu","last_synced_at":"2025-05-14T19:04:14.641Z","repository":{"id":37411588,"uuid":"348541121","full_name":"Daniel-Liu-c0deb0t/uwu","owner":"Daniel-Liu-c0deb0t","description":"fastest text uwuifier in the west","archived":false,"fork":false,"pushed_at":"2024-01-02T09:38:37.000Z","size":151,"stargazers_count":1390,"open_issues_count":24,"forks_count":40,"subscribers_count":12,"default_branch":"uwu","last_synced_at":"2025-04-06T08:09:27.842Z","etag":null,"topics":["owo","simd","uwu"],"latest_commit_sha":null,"homepage":"https://crates.io/crates/uwuify","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Daniel-Liu-c0deb0t.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-03-17T01:18:00.000Z","updated_at":"2025-04-04T03:50:14.000Z","dependencies_parsed_at":"2024-01-08T07:57:22.871Z","dependency_job_id":"2b439d0f-ae7f-4d45-9df3-9f33daf23e24","html_url":"https://github.com/Daniel-Liu-c0deb0t/uwu","commit_stats":{"total_commits":35,"total_committers":6,"mean_commits":5.833333333333333,"dds":0.1428571428571429,"last_synced_commit":"17475b16636a8dd9bd25b1f2efc20c7f6caf63cc"},"previous_names":[],"tags_count":4,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Daniel-Liu-c0deb0t%2Fuwu","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Daniel-Liu-c0deb0t%2Fuwu/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Daniel-Liu-c0deb0t%2Fuwu/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Daniel-Liu-c0deb0t%2Fuwu/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Daniel-Liu-c0deb0t","download_url":"https://codeload.github.com/Daniel-Liu-c0deb0t/uwu/tar.gz/refs/heads/uwu","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248710404,"owners_count":21149185,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["owo","simd","uwu"],"created_at":"2024-09-24T20:11:36.641Z","updated_at":"2025-04-13T11:45:35.793Z","avatar_url":"https://github.com/Daniel-Liu-c0deb0t.png","language":"Rust","funding_links":[],"categories":[],"sub_categories":[],"readme":"# uwuify\nfastest text uwuifier in the west\n\ntransforms\n```\nHey, I think I really love you. Do you want a headpat?\n```\ninto\n```\nhey, (ꈍᴗꈍ) i think i weawwy wuv you. ^•ﻌ•^ do y-you want a headpat?\n```\n\nthere's an [uwu'd](README_UWU.txt) version of this readme\n\n## faq\n### what?\nu want large amounts of text uwu'd in a smol amount of time\n\n### where?\nur computer, if it has a recent x86 cpu (intel, amd) that supports sse4.1\n\n### why?\nwhy not?\n\n### how?\ntldr: 128-bit simd vectorization plus some big brain algos\n\n\u003cdetails\u003e\n\u003csummary\u003eclick for more info\u003c/summary\u003e\n\u003cp\u003e\n\nafter hours of research, i've finally understood the essence of uwu'd text\n\nthere are a few transformations:\n1. replace some words (`small` -\u003e `smol`, etc.)\n2. nya-ify (eg. `naruhodo` -\u003e `nyaruhodo`)\n3. replace `l` and `r` with `w`\n4. stutter sometimes (`hi` -\u003e `h-hi`)\n5. add a text emoji after punctuation (`,`, `.`, or `!`) sometimes\n\nthese transformation passes take advantage of sse4.1 vector intrinsics to process 16 bytes at once.\nfor string searching, i'm using a custom simd implementation of the\n[bitap](https://en.wikipedia.org/wiki/Bitap_algorithm) algorithm for matching against multiple strings.\nfor random number generation, i'm using [XorShift32](https://en.wikipedia.org/wiki/Xorshift). for most\ncharacter-level detection within simd registers, its all masking and shifting to simulate basic state\nmachines in parallel\n\nmultithreading is supported, so u can exploit all of ur cpu cores for the noble goal\nof uwu-ing massive amounts of text\n\nutf-8 is handled elegantly by simply ignoring non-ascii characters in the input\n\nunfortunately, due to both simd parallelism and multithreading, some words may not be fully uwu'd\nif they were lucky enough to cross the boundary of a simd vector or a thread's buffer.\n*they won't escape so easily next time*\n\n\u003c/p\u003e\n\u003c/details\u003e\n\n### ok i want uwu'd text, how do i run this myself?\n#### install command-line tool\n1. install rust: run `curl https://sh.rustup.rs -sSf | sh` on unix,\nor go [here](https://www.rust-lang.org/tools/install) for more options\n2. run `cargo install uwuify`\n3. run `uwuify` which will read from stdin and output to stdout. make sure u\npress ctrl + d (unix) or ctrl + z and enter (windows) after u type stuff in stdin to send an EOF\n\nif you are having trouble running `uwuify`, make sure you have `~/.cargo/bin`\nin your `$PATH`\n\nit is possible to read and write from files by specifying the input file and\noutput file, in that order. u can use `--help` for more info. pass in\n`-v` for timings\n\nthis is on crates.io [here](https://crates.io/crates/uwuify)\n\n#### include as library\n1. put `uwuify = \"^0.2\"` under `[dependencies]` in your `Cargo.toml` file\n2. the library is called `uwuifier` (slightly different from the name of the binary!)\nuse it like so:\n```rust\nuse uwuifier::uwuify_str_sse;\nassert_eq!(uwuify_str_sse(\"hello world\"), \"hewwo wowwd\");\n```\n\ndocumentation is [here](https://docs.rs/uwuify/latest/uwuifier/)\n\n#### build from this repo\n\u003cdetails\u003e\n\u003csummary\u003eclick for more info\u003c/summary\u003e\n\u003cp\u003e\n\n1. install rust\n2. run `git clone https://github.com/Daniel-Liu-c0deb0t/uwu.git \u0026\u0026 cd uwu`\n3. run `cargo run --release`\n\n##### testing\n1. run `cargo test`\n\n##### benchmarking\n1. run `mkdir test \u0026\u0026 cd test`\n\n*warning: large files of 100mb and 1gb, respectively*\n\n2. run `curl -OL http://mattmahoney.net/dc/enwik8.zip \u0026\u0026 unzip enwik8.zip`\n3. run `curl -OL http://mattmahoney.net/dc/enwik9.zip \u0026\u0026 unzip enwik9.zip`\n4. run `cd .. \u0026\u0026 ./bench.sh`\n\n\u003c/p\u003e\n\u003c/details\u003e\n\n### i don't believe that this is fast. i need proof!!1!\ntldr: can be almost as fast as simply copying a file\n\n\u003cdetails\u003e\n\u003csummary\u003eclick for more info\u003c/summary\u003e\n\u003cp\u003e\n\nraw numbers from running `./bench.sh` on a 2019 macbook pro with eight\nintel 2.3 ghz i9 cpus and 16 gb of ram are shown below. the dataset\nused is the first 100mb and first 1gb of english wikipedia. the same\ndataset is used for the [hutter prize](http://prize.hutter1.net/)\nfor text compression\n\n```\n1 thread uwu enwik8\ntime taken: 178 ms\ninput size: 100000000 bytes\noutput size: 115095591 bytes\nthroughput: 0.55992 gb/s\n\n2 thread uwu enwik8\ntime taken: 105 ms\ninput size: 100000000 bytes\noutput size: 115095591 bytes\nthroughput: 0.94701 gb/s\n\n4 thread uwu enwik8\ntime taken: 60 ms\ninput size: 100000000 bytes\noutput size: 115095591 bytes\nthroughput: 1.64883 gb/s\n\n8 thread uwu enwik8\ntime taken: 47 ms\ninput size: 100000000 bytes\noutput size: 115095591 bytes\nthroughput: 2.12590 gb/s\n\ncopy enwik8\n\nreal\t0m0.035s\nuser\t0m0.001s\nsys\t0m0.031s\n\n1 thread uwu enwik9\ntime taken: 2087 ms\ninput size: 1000000000 bytes\noutput size: 1149772651 bytes\nthroughput: 0.47905 gb/s\n\n2 thread uwu enwik9\ntime taken: 992 ms\ninput size: 1000000000 bytes\noutput size: 1149772651 bytes\nthroughput: 1.00788 gb/s\n\n4 thread uwu enwik9\ntime taken: 695 ms\ninput size: 1000000000 bytes\noutput size: 1149772651 bytes\nthroughput: 1.43854 gb/s\n\n8 thread uwu enwik9\ntime taken: 436 ms\ninput size: 1000000000 bytes\noutput size: 1149772651 bytes\nthroughput: 2.29214 gb/s\n\ncopy enwik9\n\nreal\t0m0.387s\nuser\t0m0.001s\nsys\t0m0.341s\n```\n\n*//TODO: compare with other tools*\n\n\u003c/p\u003e\n\u003c/details\u003e\n\n### why isn't this readme uwu'd?\nso its readable\n\nif u happen to find uwu'd text more readable, there's always an [uwu'd](README_UWU.txt) version\n\n### ok but why aren't there any settings i can change?!1?!!1\nfree will is an illusion\n\n### wtf this is so unprofessional how are u gonna get hired at faang now?!\ndon't worry, i've got u covered\n\n#### Title: uwu is all you need\n\n#### Abstract\n\nRecent advances in computing have made strides in parallelization, whether\nat a fine-grained level with SIMD instructions, or at a high level with multiple\nCPU cores. Taking advantage of these advances, we explore how the useful\ntask of performing an uwu transformation on plain text can be scaled up to large\ninput datasets. Our contributions in this paper are threefold: first, we present,\nto our knowledge, the first rigorous definition of uwu'd text. Second, we show\nour novel algorithms for uwu-ing text, exploiting vectorization and\nmultithreading features that are available on modern CPUs. Finally, we provide\nrigorous experimental results that show how our implementation could be the\n\"fastest in the west.\" In our benchmarks, we observe that our implementation\nwas almost as a fast as a simple file copy, which is entirely IO-bound.\nWe believe our work has potential applications in various domains, from data\naugmentation and text preprocessing for natural language processing, to\ngiving authors the ability to convey potentially wholesome or cute meme messages\nwith minimal time and effort.\n\n*// TODO: write paper*\n\n*// TODO: write more about machine learning so i get funding*\n\n### ok i need to use this for something and i need the license info\nmit license\n\n### ok but i have an issue with this or a suggestion or a question not answered here\nopen an issue, be nice\n\n### projects using this\n* [uwu-tray](https://github.com/Olaren15/uwu-tray): a tray icon to uwuify your text\n* [uwubot](https://github.com/yaahc/uwubot): discord bot for uwuifying text\n* [uwupedia](http://uwupedia.org/): the uwuified encycwopedia\n* [discord uwu webhook](https://github.com/bs2kbs2k/discord-uwu-webhook): automatically uwuifies all sent messages on discord via webhooks\n* [twent weznowor](https://twitter.com/twent_weznowor): best twitter bot ever\n* [alaia](https://github.com/TheRealKizu/Alaia/tree/master): a simple yet powerful intuitive chatbot for discord\n* [uwuify-mdbook](https://github.com/alyti/uwuify-mdbook): an mdbook pre-processor for all your uwuify needs\n* [uwu-joke](https://github.com/joshualeejunyi/uwu-joke): automatically uwuifies typed text and text copied to your clipboard\n* [discordbot (go)](https://github.com/angch/discordbot): discord (and telegram and slack) bot for fun\n* let me know if u make a project with uwuify! i appreciate u all!\n\n### references\n* https://honk.moe/tools/owo.html\n* https://github.com/IamRifki/uwuizer\n* https://github.com/deadshot465/owoify_rs\n* https://cutekaomoji.com/characters/uwu/\n* https://cutekaomoji.com/characters/owo/\n* https://cutekaomoji.com/characters/flower-girl/\n* and many more; let me know if i missed anything\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdaniel-liu-c0deb0t%2Fuwu","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdaniel-liu-c0deb0t%2Fuwu","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdaniel-liu-c0deb0t%2Fuwu/lists"}