{"id":38802120,"url":"https://github.com/schicho/vwc","last_synced_at":"2026-01-17T12:47:39.513Z","repository":{"id":49564520,"uuid":"374462280","full_name":"schicho/vwc","owner":"schicho","description":"wc (word count) rewritten in V with a significant speed up","archived":false,"fork":false,"pushed_at":"2021-06-14T15:50:32.000Z","size":15,"stargazers_count":46,"open_issues_count":0,"forks_count":1,"subscribers_count":3,"default_branch":"master","last_synced_at":"2024-06-11T16:59:16.672Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"V","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/schicho.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2021-06-06T20:57:34.000Z","updated_at":"2024-06-11T16:59:16.673Z","dependencies_parsed_at":"2022-09-15T23:50:16.428Z","dependency_job_id":null,"html_url":"https://github.com/schicho/vwc","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/schicho/vwc","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/schicho%2Fvwc","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/schicho%2Fvwc/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/schicho%2Fvwc/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/schicho%2Fvwc/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/schicho","download_url":"https://codeload.github.com/schicho/vwc/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/schicho%2Fvwc/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28508580,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-17T11:50:55.898Z","status":"ssl_error","status_checked_at":"2026-01-17T11:50:55.569Z","response_time":85,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-01-17T12:47:39.396Z","updated_at":"2026-01-17T12:47:39.478Z","avatar_url":"https://github.com/schicho.png","language":"V","funding_links":[],"categories":[],"sub_categories":[],"readme":"# vwc\n\nBeating C with 100 Lines of [V](https://vlang.io).\nA simple wc (word count) clone, designed to be faster than C.\n\nThis is my late addition to a trend from late 2019, about trying to write a simple wc clone in a few lines of code and trying to beat its performance.\n\nThe [original article](https://chrispenner.ca/posts/wc), which started the trend, rewrote it in Haskell and my implementation is based on a program written in Go by [Ajeet D'Souza](https://ajeetdsouza.github.io/blog/posts/beating-c-with-70-lines-of-go/).\n\nTo be exact, as V's syntax is by desgin very close to Go, I mostly just rewrote Ajeet D'Souza's Go code in V.\nThe original source code can be found [here](https://github.com/ajeetdsouza/blog-wc-go).\n\n## Benchmarking \u0026 comparison\n\nI am going to compare the results using GNU time, as done by others in their articles. I am comparing the performance on parsing a 100 MB and 1 GB text file, with ASCII characters only.\n\n`$ /usr/bin/time -f \"%es %MKB\" wc test.txt`\n\nFor better comparison with the Go code, I will not rely on the stats given in the original article, but will compile the Go code myself using Go 1.16.\n\nIn [#2](https://github.com/schicho/vwc/issues/2) it was noted that as my implementation only supports ASCII, I should run GNU wc with the ASCII locale `LANG=C` to give a more fair and accurate comparison.\nSetting the locale tells WC that it only needs to expect ASCII chars, thus making the program run a bit faster.\nThe benchmarked times of GNU wc have been updated using `LANG=C time wc text.txt` to set the locale.\n\nAll benchmarks will be run on my system with the following specs:\n- Intel Core i5-8265U @ 1.60 GHz @ 4 cores, 8 threads\n- 8 GB DDR4 RAM @ 2667 MHz\n- 1 TB M.2 SSD\n- Ubuntu 20.04\n\nThe V and Go code use a 16 KB buffer for reading input.\n\n## The two approaches\n\n### Single threaded\n\nThe single threaded code reads into the buffer and then counts the words in that buffer, keeping track of whether we just started a new word previously or not. D'Souza's article goes into this in a lot more detail under the section 'Splitting the input'. The code can be directly transferred from Go to V with minor adjustments.\nOnly difference is, instead of relying on system calls to get the file size, I decided to count all the bytes manually in the process.\n\nFirst we need two structs to organize our data in:\n\n```V\nstruct FileChunk {\nmut:\n\tprev_char_is_space bool\n\tbuffer             []byte\n}\n\nstruct Count {\nmut:\n\tline_count u32\n\tword_count u32\n}\n```\n\nThese are quite self-explanatory. In a `FileChunk` we store 16KB of our file, and as the last char of the previous chunk might be a space and that would mean we start a new word with the new chunk.\n\nThe `get_count()` function is where the magic happens. Here we simply read every byte and compare it to the ASCII values of different chars. Thus creating the logic of counting words and lines. V's `match` is the perfect candidate here, similar to the `switch()` of many other languages.\n\nNote that we need to declare all variables as mutable here, and need to initialize them ourselves, as required by V's design.\n\n```V\nfn get_count(chunk FileChunk) Count {\n\tmut count := Count{0, 0}\n\tmut prev_char_is_space := chunk.prev_char_is_space\n\n\tfor b in chunk.buffer {\n\t\tmatch b {\n\t\t\tnew_line {\n\t\t\t\tcount.line_count++\n\t\t\t\tprev_char_is_space = true\n\t\t\t}\n\t\t\tspace, tab, carriage_return, vertical_tab, form_feed {\n\t\t\t\tprev_char_is_space = true\n\t\t\t}\n\t\t\telse {\n\t\t\t\tif prev_char_is_space {\n\t\t\t\t\tprev_char_is_space = false\n\t\t\t\t\tcount.word_count++\n\t\t\t\t}\n\t\t\t}\n\t\t}\n\t}\n\n\treturn count\n}\n```\n\nI declared all the chars as `const` at the start of the file as they are also used in other functions.\n\n```V\nconst (\n\tbuffer_size     = 16 * 1024\n\tnew_line        = `\\n`\n\tspace           = ` `\n\ttab             = `\\t`\n\tcarriage_return = `\\r`\n\tvertical_tab    = `\\v`\n\tform_feed       = `\\f`\n)\n```\n\nThe only part that's still missing is the main function.\nHere you can see that we read the file into the buffer, counting the bytes read per read call, as in the last read we might have reached the end of the file before the buffer is full.\n\nError handling in V is similar to Go, but is done via the `or` block. This enforces proper error handling and leaves out the visual noise of checking for `err != nil`.\n\nNow we just count the words in each chunk separately and then sum up the results to finally print them on the terminal.\n\n```V\nmut total_count := Count{0, 0}\nmut byte_count := 0\nmut last_char_is_space := true\n\nmut buffer := []byte{len: buffer_size}\n\nfor {\n    nbytes := file.read(mut buffer) or {\n        match err {\n            none { // EOF 'error', just break out of the loop.\n                break\n            }\n            else {\n                println(err)\n            }\n        }\n        exit(1)\n    }\n\n    count := get_count(FileChunk{last_char_is_space, buffer[..nbytes]})\n    last_char_is_space = is_space(buffer[nbytes - 1])\n\n    total_count.line_count += count.line_count\n    total_count.word_count += count.word_count\n    byte_count += nbytes\n}\n\nprintln('$total_count.line_count $total_count.word_count $byte_count $file_path')\n```\n\nNow to the most exciting part! Comparing the results.\n\nNote: I compiled the Go programs just with `go build main.go`. For the V programs I added the `-prod` flag to get optimized builds: `v -prod vwc_chunk.v`.\nWithout the production flag, the V compiler is blazingly fast, but the builds are less optimized and the time for parsing the file is actually closer to that of GNU wc than to that of Go.\n\n| Program | File Size | Time      | Memory     |\n| ---     | ---       | ---       | ---        |\n| GNU wc  | 100 MB    |   0.40s   |  2268 KB   |\n| Go wc   | 100 MB    |   0.29s   |  1588 KB   |\n| V wc    | 100 MB    |   0.30s   |  1424 KB   |\n| GNU wc  | 1 GB      |   4.39s   |  2264 KB   |\n| Go wc   | 1 GB      |   3.26s   |  1596 KB   |\n| V wc    | 1 GB      |   3.17s   |  1476 KB   |\n\nSo as we can see, both programs can easily beat C in performance and memory use. For the most part Go and V are very close to each other in performance.\nThe only difference is binary size, where V can beat Go by a lot. (Not that it really matters, but it's still interesting to see.)\n\n```\nC  binary:  48 KB\nV  binary: 108 KB\nGo binary: 1.5 MB\n```\n\n### Multithreaded\n\nAs stated by D'Souza: \"Admittedly, a parallel wc is overkill, but let's see how far we can go\".\n\nIn terms of code, V can again borrow many and almost all lines from the Go code.\n\nThe only difference using V was that the compiler didn't just error, but often crashed using the concurrency features. I guess this is the result of V still being quite a new language and having a very small team of contributors, which are not hired by a company like Go with Google.\nAlso reference types work slightly differently in V than Go, which resulted in me being stuck with weird compiler errors and crashes. But this is probably also down to me still being new to the language and also me using V's concurrency features for the first time.\n\nIn the end I got it to work with a bit of trying and a bit of luck.\n\nOne thing needs to be said tho: V's concurrency features are not unstable. If it works, it works. But getting it to work in the first place was a bit tough.\n\n```V\nstruct Count {\nmut:\n\tline_count u32\n\tword_count u32\n\tbyte_count int\n}\n```\n\nFor the multithreaded version, I included the number of bytes into each `Count` struct. This is needed as we now read from multiple threads.\n\n```V\nstruct FileReader {\nmut:\n\tfile               os.File\n\tlast_char_is_space bool\n\tmutex              sync.Mutex\n}\n\nfn (mut file_reader FileReader) read_chunk(mut buffer []byte) ?FileChunk {\n\tfile_reader.mutex.@lock()\n\tdefer {\n\t\tfile_reader.mutex.unlock()\n\t}\n\n\tnbytes := file_reader.file.read(mut buffer) ? // Propagate error. Either EOF or read error.\n\tchunk := FileChunk{file_reader.last_char_is_space, buffer[..nbytes]}\n\tfile_reader.last_char_is_space = is_space(buffer[nbytes - 1])\n\treturn chunk\n}\n```\n\nThe `FileReader` struct is similar to the `FileChunk`, but we now have direct access to the file, and it also includes a mutex, so that multiple reading threads do not get ahead of each other and overwrite the `last_char_is_space`, and we also have consistent results on HDDs, where parallel reads are not directly possible. You can see the mutex being locked and unlocked in the `read_chunk` method. This is also another example of V's error handling, in which a possible error is just propagated to an outer function using `?`.\n\n```V\nfn file_reader_counter(mut file_reader FileReader, counts chan Count) {\n\tmut buffer := []byte{len: buffer_size}\n\tmut total_count := Count{0, 0, 0}\n\n\tfor {\n\t\tchunk := file_reader.read_chunk(mut buffer) or {\n\t\t\tmatch err {\n\t\t\t\tnone {\n\t\t\t\t\t// EOF 'error', just break out of the loop.\n\t\t\t\t\tbreak\n\t\t\t\t}\n\t\t\t\telse {\n\t\t\t\t\tprintln(err)\n\t\t\t\t}\n\t\t\t}\n\t\t\texit(1)\n\t\t}\n\n\t\tcount := get_count(chunk)\n\n\t\ttotal_count.line_count += count.line_count\n\t\ttotal_count.word_count += count.word_count\n\t\ttotal_count.byte_count += chunk.buffer.len\n\t}\n\n\tcounts \u003c- total_count\n}\n```\n\nThe `file_reader_counter` function is very similar to the main function before; the only difference is that this function is now intended to be multithreaded using coroutines. You can see the channel in the funtion header, which is used to send the results. Basically each coroutine reads into its buffer and after reading is finished, the next coroutine can read, while the other coroutine counts the words.\n\nIn the main function the only task left is to start as many coroutines as the CPU has logical cores and then collect the counts from the channel and combine the results.\nWe create one `FileReader` on the heap via the `\u0026` and create an unbuffered channel of type `Count`.\n\nThen via the `go` keyword we can start the coroutines just as in Go.\n\n```V\nmut file_reader := \u0026FileReader{file, true, sync.new_mutex()}\ncounts := chan Count{}\nnum_workers := runtime.nr_cpus()\n\nfor i := 0; i \u003c num_workers; i++ {\n    go file_reader_counter(mut file_reader, counts)\n}\n\nmut total_count := Count{0, 0, 0}\n\nfor i := 0; i \u003c num_workers; i++ {\n    count := \u003c-counts\n    total_count.line_count += count.line_count\n    total_count.word_count += count.word_count\n    total_count.byte_count += count.byte_count\n}\ncounts.close()\n\nprintln('$total_count.line_count $total_count.word_count $total_count.byte_count $file_path')\n```\n\nComparing the different implementations on the same files as before yields astonishing results:\n\n| Program        | File Size | Time      | Memory     |\n| ---            | ---       | ---       | ---        |\n| GNU wc         | 100 MB    |   0.40s   |  2268 KB   |\n| GO wc parallel | 100 MB    |   0.08s   |  1944 KB   |\n| V wc parallel  | 100 MB    |   0.09s   |  2036 KB   |\n| GNU wc         | 1 GB      |   4.39s   |  2264 KB   |\n| GO wc parallel | 1 GB      |   0.71s   |  1976 KB   |\n| V wc parallel  | 1 GB      |   0.88s   |  2032 KB   |\n\nWe can tell that Go is overall minimally faster than V and also consumes a bit less RAM. The main difference in RAM usage compared to D'Souza's article is probably that my laptop runs the programs on 8 threads, thus allocating more memory for reading the file than the 4 threads in the Go article. Furthermore, I noticed that while Go's memory usage was very consistent on each of the multiple runs I did, V's memory usage varied a bit more significantly on all attempts, almost reaching the same amount of memory usage as C.\n\n## Conclusion\n\nV itself is transpiled to C for optimized builds. You could almost say I compared C to C, but with one of the C implementations being almost like Go.\n\nOverall it was very interesting to see the comparison, not only between C and V, but also Go and V. Using the same algorithms and functions obviously resulted in very similar results, but smaller differences could still be seen. Like in the other articles, this is not meant to be a \"V is better than C\" article. I wrote a very simplified version of the complete GNU wc. The only goal was to be faster, but still have the same results as GNU wc on pure ASCII text.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fschicho%2Fvwc","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fschicho%2Fvwc","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fschicho%2Fvwc/lists"}