{"id":18840227,"url":"https://github.com/maxim2266/str","last_synced_at":"2025-05-16T17:07:16.667Z","repository":{"id":38371065,"uuid":"253861907","full_name":"maxim2266/str","owner":"maxim2266","description":"str: yet another string library for C language.","archived":false,"fork":false,"pushed_at":"2024-10-29T10:54:12.000Z","size":106,"stargazers_count":338,"open_issues_count":0,"forks_count":24,"subscribers_count":10,"default_branch":"master","last_synced_at":"2025-05-07T20:02:10.467Z","etag":null,"topics":["c","string-manipulation","strings"],"latest_commit_sha":null,"homepage":null,"language":"C","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-3-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/maxim2266.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-04-07T17:13:25.000Z","updated_at":"2025-05-03T08:46:18.000Z","dependencies_parsed_at":"2022-08-25T06:00:51.580Z","dependency_job_id":"baa68867-807b-4f25-8112-9262a4e78a85","html_url":"https://github.com/maxim2266/str","commit_stats":null,"previous_names":[],"tags_count":11,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/maxim2266%2Fstr","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/maxim2266%2Fstr/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/maxim2266%2Fstr/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/maxim2266%2Fstr/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/maxim2266","download_url":"https://codeload.github.com/maxim2266/str/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":253908834,"owners_count":21982685,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["c","string-manipulation","strings"],"created_at":"2024-11-08T02:46:55.510Z","updated_at":"2025-05-16T17:07:16.642Z","avatar_url":"https://github.com/maxim2266.png","language":"C","readme":"# str: yet another string library for C language.\n\n[![License: BSD 3 Clause](https://img.shields.io/badge/License-BSD_3--Clause-yellow.svg)](https://opensource.org/licenses/BSD-3-Clause)\n\n## Motivation\n\nBored with developing the same functionality over and over again, unsatisfied\nwith existing libraries, so decided to make the right one, once and forever. 🙂\n\n## Features\n\n* Handles both C and binary strings;\n* Light-weight references to strings: cheap to create, copy, or pass by value;\n* Support for copy and move semantics, although not enforceable by the C language;\n* String composition functions writing to memory, file descriptors, or file streams;\n* Can be compiled using `gcc` or `clang`, and linked with `libc` or `musl`.\n\n## Installation\nJust clone the project and copy (or symlink) the files `str.h` and `str.c` into your project,\nbut please respect the [license](LICENSE).\n\n## Code Examples\n\nString composition:\n\n```C\nstr s = str_null;\n\nstr_join(\u0026s, str_lit(\", \"),\n         str_lit(\"Here\"),\n         str_lit(\"there\"),\n         str_lit(\"and everywhere\"));\n\nstr_cat(\u0026s, s, str_lit(\"...\"));\n\nassert(str_eq(s, str_lit(\"Here, there, and everywhere...\")));\nstr_free(s);\n```\n\nSame as above, but writing to a file:\n\n```C\nFILE* const stream = fopen(...);\n\nint err = str_join(stream, str_lit(\", \"),\n                   str_lit(\"Here\"),\n                   str_lit(\"there\"),\n                   str_lit(\"and everywhere...\"));\n\nif(err != 0) { /* handle the error */ }\n```\n\n[Discussion](https://news.ycombinator.com/item?id=25212864) on Hacker News.\n\n## User Guide\n\n_**Disclaimer:** This is the good old C language, not C++ or Rust, so nothing can be enforced\non the language level, and certain discipline is required to make sure there is no corrupt\nor leaked memory resulting from using this library._\n\nA string is represented by the type `str` that maintains a pointer to some memory containing the\nactual string, and the length of the string. Objects of type `str` are small enough (a struct\nof a `const char*` and a `size_t`) to be cheap to create, copy (pass by value), and move. The\n`str` structure should be treated as opaque (i.e., do not attempt to directly access or modify\nthe fields in this structure).  The strings are assumed to be immutable, like those in Java or\nGo, but only by means of `const char*` pointers, so it is actually possible to modify such a\nstring, although the required type cast to `char*` offers at least some (mostly psychological)\nprotection from changing the string by mistake.\n\nThis library focusses only on handling strings, not gradually composing them like\n[StringBuffer](https://docs.oracle.com/javase/7/docs/api/java/lang/StringBuffer.html)\nclass in Java.\n\nAll string objects must be initialised before use. Uninitialised objects will cause\nundefined behaviour. Use the provided constructors, or `str_null` for empty strings.\n\nThere are two kinds of `str` objects: those actually owning the memory they point to, and\nnon-owning references. This property can be queried using `str_is_owner` and `str_is_ref`\nfunctions, otherwise such objects are indistinguishable.\n\nNon-owning string objects are safe to copy and assign to each other, as long as the memory\nthey refer to is valid. They do not need to be freed. `str_free` is a no-op for reference\nobjects. A reference object can be cheaply created from a C string, a string literal,\nor from a range of bytes.\n\nOwning objects require special treatment, in particular:\n* It is a good idea to have only one owning object per each allocated string, but such\na string can have many references to its underlying string, as long as those references do not\noutlive the owning object.\nSometimes this rule may be relaxed for code clarity, like in the above example where\nthe owning object is passed directly to a function, but only if the function does not\nstore or release the object. When in doubt pass such an object via `str_ref`.\n* Direct assignments (like `s2 = s1;`) to owning objects will certainly leak memory, use\n`str_assign` function instead. In fact, this function can assign to any string object,\nowning or not, so it can be used everywhere, just to avoid any doubt.\n* There is no automatic memory management in C, so every owning object must be released at\nsome point using either `str_free` or `str_clear` function. String objects on the stack\ncan also be declared as `str_auto` (or `const str_auto`) for automatic cleanup when the variable\ngoes out of scope.\n* An owning object can be moved to another location by using `str_move` function. The\nfunction resets its source object to an empty string.\n* Object ownership can be passed over to another object by using `str_pass` function. The\nfunction sets its source to a non-owning reference to the original string.\n\nIt is technically possible to create a reference to a string that is not\nnull-terminated. The library accepts strings without null-terminators, but every new string\nallocated by the library is guaranteed to be null-terminated.\n\n### String Construction\n\nA string object can be constructed form any C string, string literal, or a range of bytes.\nThe provided constructors are computationally cheap to apply. Depending on the constructor,\nthe new object can either own the actual string it refers to, or be a non-owning reference.\nConstructors themselves do not allocate any memory. Importantly, constructors are the only\nfunctions in this library that return a string object, while others only assign their results\nthrough a pointer to a pre-existing string. This makes constructors suitable for initialisation\nof new string objects. In all other situations one should combine construction with assignment,\nfor example:\u003cbr\u003e\n`str_assign(\u0026dest, str_acquire_chars(buff, n));`\n\n### String Object Properties\n\nQuerying a property of a string object (like the length of the string via `str_len`) is a\ncheap operation.\n\n### Assigning, Moving, and Passing String Objects\n\nC language does not allow for operator overloading, so this library provides a function\n`str_assign` that takes a string object and assigns it to the destination object, freeing\nany memory owned by the destination. It is generally recommended to use this function\neverywhere outside object initialisation.\n\nAn existing object can be moved over to another location via `str_move` function.\nThe function resets the source object to `str_null` to guarantee the correct move semantics.\nThe value returned by `str_move` may be either used to initialise a new object, or\nassigned to an existing object using `str_assign`.\n\nAn existing object can also be passed over to another location via `str_pass` function. The function\nsets the source object to be a non-owning reference to the original string, otherwise the semantics\nand usage is the same as `str_move`.\n\n### String Composition and Generic Destination\n\nString composition [functions](#string-composition) can write their results to different\ndestinations, depending on the _type_ of their `dest` parameter:\n\n* `str*`: result is assigned to the string object;\n* `int`: result is written to the file descriptor;\n* `FILE*` result is written to the file stream.\n\nThe composition functions return 0 on success, or the value of `errno` as retrieved at the point\nof failure (including `ENOMEM` on memory allocation error).\n\n### Detailed Example\n\nJust to make things more clear, here is the same code as in the example above, but with comments:\n```C\n// declare a variable and initialise it with an empty string; could also be declared as \"str_auto\"\n// to avoid explicit call to str_free() below.\nstr s = str_null;\n\n// join the given string literals around the separator (second parameter),\n// storing the result in object \"s\" (first parameter); in this example we do not check\n// the return values of the composition functions, thus ignoring memory allocation failures,\n// which is probably not the best idea in general.\nstr_join(\u0026s, str_lit(\", \"),\n         str_lit(\"Here\"),\n         str_lit(\"there\"),\n         str_lit(\"and everywhere\"));\n\n// create a new string concatenating \"s\" and a literal; the function only modifies its\n// destination object \"s\" after the result is computed, also freeing the destination\n// before the assignment, so it is safe to use \"s\" as both a parameter and a destination.\n// note: we pass a copy of the owning object \"s\" as the second parameter, and here it is\n// safe to do so because this particular function does not modify its arguments.\nstr_cat(\u0026s, s, str_lit(\"...\"));\n\n// check that we have got the expected result\nassert(str_eq(s, str_lit(\"Here, there, and everywhere...\")));\n\n// finally, free the memory allocated for the string\nstr_free(s);\n```\n\nThere are some useful [code snippets](snippets.md) provided to assist with writing code using\nthis library.\n\n## API brief\n\n`typedef struct { ... } str;`\u003cbr\u003e\nThe string object.\n\n#### String Properties\n\n`size_t str_len(const str s)`\u003cbr\u003e\nReturns the number of bytes in the string referenced by the object.\n\n`const char* str_ptr(const str s)`\u003cbr\u003e\nReturns a pointer to the first byte of the string referenced by the object. The pointer is never NULL.\n\n`const char* str_end(const str s)`\u003cbr\u003e\nReturns a pointer to the next byte past the end of the string referenced by the object.\nThe pointer is never NULL, but it is not guaranteed to point to any valid byte or location.\nFor C strings it points to the terminating null character. For any given string `s` the following\ncondition is always satisfied: `str_end(s) == str_ptr(s) + str_len(s)`.\n\n`bool str_is_empty(const str s)`\u003cbr\u003e\nReturns \"true\" for empty strings.\n\n`bool str_is_owner(const str s)`\u003cbr\u003e\nReturns \"true\" if the string object is the owner of the memory it references.\n\n`bool str_is_ref(const str s)`\u003cbr\u003e\nReturns \"true\" if the string object does not own the memory it references.\n\n#### String Construction\n\n`str_null`\u003cbr\u003e\nEmpty string constant.\n\n`str str_lit(s)`\u003cbr\u003e\nConstructs a non-owning object from a string literal. Implemented as a macro.\n\n`str str_ref(s)`\u003cbr\u003e\nConstructs a non-owning object from either a null-terminated C string, or another `str` object.\nImplemented as a macro.\n\n`str str_ref_chars(const char* const s, const size_t n)`\u003cbr\u003e\nConstructs a non-owning object referencing the given range of bytes.\n\n`str str_acquire_chars(const char* const s, const size_t n)`\u003cbr\u003e\nConstructs an owning object for the specified range of bytes. The pointer `s` should be safe\nto pass to `free(3)` function.\n\n`str str_acquire(const char* const s)`\u003cbr\u003e\nConstructs an owning object from the given C string. The string should be safe to pass to\n`free(3)` function.\n\n`str str_move(str* const ps)`\u003cbr\u003e\nSaves the given object to a temporary, resets the source object to `str_null`, and then\nreturns the saved object.\n\n`str str_pass(str* const ps)`\u003cbr\u003e\nSaves the given object to a temporary, sets the source object to be a non-owning reference to the\noriginal string, and then returns the saved object.\n\n#### String Deallocation\n\n`void str_free(const str s)`\u003cbr\u003e\nDeallocates any memory held by the owning string object. No-op for references. After a call to\nthis function the string object is in unknown and unusable state.\n\nString objects on the stack can also be declared as `str_auto` instead of `str` to deallocate\nany memory held by the string when the variable goes out of scope.\n\n#### String Modification\n\n`void str_assign(str* const ps, const str s)`\u003cbr\u003e\nAssigns the object `s` to the object pointed to by `ps`. Any memory owned by the target\nobject is freed before the assignment.\n\n`void str_clear(str* const ps)`\u003cbr\u003e\nSets the target object to `str_null` after freeing any memory owned by the target.\n\n`void str_swap(str* const s1, str* const s2)`\u003cbr\u003e\nSwaps two string objects.\n\n`int str_from_file(str* const dest, const char* const file_name)`\u003cbr\u003e\nReads the entire file (of up to 64MB by default, configurable via `STR_MAX_FILE_SIZE`) into\nthe destination string. Returns 0 on success, or the value of `errno` on error.\n\n#### String Comparison\n\n`int str_cmp(const str s1, const str s2)`\u003cbr\u003e\nLexicographically compares the two string objects, with usual semantics.\n\n`bool str_eq(const str s1, const str s2)`\u003cbr\u003e\nReturns \"true\" if the two strings match exactly.\n\n`int str_cmp_ci(const str s1, const str s2)`\u003cbr\u003e\nCase-insensitive comparison of two strings, implemented using `strncasecmp(3)`.\n\n`bool str_eq_ci(const str s1, const str s2`\u003cbr\u003e\nReturns \"true\" is the two strings match case-insensitively.\n\n`bool str_has_prefix(const str s, const str prefix)`\u003cbr\u003e\nTests if the given string `s` starts with the specified prefix.\n\n`bool str_has_suffix(const str s, const str suffix)`\u003cbr\u003e\nTests if the given string `s` ends with the specified suffix.\n\n#### String Composition\n\n`int str_cpy(dest, const str src)`\u003cbr\u003e\nCopies the source string referenced by `src` to the\n[generic](#string-composition-and-generic-destination) destination `dest`. Returns 0 on success,\nor the value of `errno` on failure.\n\n`int str_cat_range(dest, const str* src, size_t count)`\u003cbr\u003e\nConcatenates `count` strings from the array starting at address `src`, and writes\nthe result to the [generic](#string-composition-and-generic-destination) destination `dest`.\nReturns 0 on success, or the value of `errno` on failure.\n\n`int str_cat(dest, ...)`\u003cbr\u003e\nConcatenates a variable list of `str` arguments, and writes the result to the\n[generic](#string-composition-and-generic-destination) destination `dest`.\nReturns 0 on success, or the value of `errno` on failure.\n\n`int str_join_range(dest, const str sep, const str* src, size_t count)`\u003cbr\u003e\nJoins around `sep` the `count` strings from the array starting at address `src`, and writes\nthe result to the [generic](#string-composition-and-generic-destination) destination `dest`.\nReturns 0 on success, or the value of `errno` on failure.\n\n`int str_join(dest, const str sep, ...)`\u003cbr\u003e\nJoins a variable list of `str` arguments around `sep` delimiter, and writes the result to the\n[generic](#string-composition-and-generic-destination) destination `dest`.\nReturns 0 on success, or the value of `errno` on failure.\n\n#### Searching and Sorting\n\n`bool str_partition(const str src, const str patt, str* const prefix, str* const suffix)`\u003cbr\u003e\nSplits the string `src` on the first match of `patt`, assigning a reference to the part\nof the string before the match to the `prefix` object, and the part after the match to the\n`suffix` object. Returns `true` if a match has been found, or `false` otherwise, also\nsetting `prefix` to reference the entire `src` string, and clearing the `suffix` object.\nEmpty pattern `patt` never matches.\n\n`void str_sort_range(const str_cmp_func cmp, str* const array, const size_t count)`\u003cbr\u003e\nSorts the given array of `str` objects using the given comparison function. A number\nof typically used comparison functions is also provided:\n* `str_order_asc` (ascending sort)\n* `str_order_desc` (descending sort)\n* `str_order_asc_ci` (ascending case-insensitive sort)\n* `str_order_desc_ci` (descending case-insensitive sort)\n\n`const str* str_search_range(const str key, const str* const array, const size_t count)`\u003cbr\u003e\nBinary search for the given key. The input array must be sorted using `str_order_asc`.\nReturns a pointer to the string matching the key, or NULL.\n\n`size_t str_partition_range(bool (*pred)(const str), str* const array, const size_t count)`\u003cbr\u003e\nReorders the string objects in the given range in such a way that all elements for which\nthe predicate `pred` returns \"true\" precede the elements for which predicate `pred`\nreturns \"false\". Returns the number of preceding objects.\n\n`size_t str_unique_range(str* const array, const size_t count)`\u003cbr\u003e\nReorders the string objects in the given range in such a way that there are two partitions:\none where each object is unique within the input range, and another partition with all the\nremaining objects. The unique partition is stored at the beginning of the array, and is\nsorted in ascending order, followed by the partition with all remaining objects.\nReturns the number of unique objects.\n\n#### UNICODE support\n\n`for_each_codepoint(var_name, src_string)`\u003cbr\u003e\nA macro that expands to a loop iterating over the given string `src_string` (of type `str`) by UTF-32\ncode points. On each iteration the variable `var_name` (of type `char32_t`) is assigned\nthe value of the next valid UTF-32 code point from the source string. Upon exit from the loop the\nvariable has one on the following values:\n* `CPI_END_OF_STRING`: the iteration has reached the end of source string;\n* `CPI_ERR_INCOMPLETE_SEQ`: an incomplete byte sequence has been detected;\n* `CPI_ERR_INVALID_ENCODING`: an invalid byte sequence has been detected.\n\nThe source string is expected to be encoded in the _current program locale_, as set by the most\nrecent call to `setlocale(3)`.\n\nUsage pattern:\n```c\n#include \u003cuchar.h\u003e\n...\nstr s = ...\n...\nchar32_t c;\t// variable to receive UTF-32 values on each iteration\n\nfor_each_codepoint(c, s)\n{\n\t/* process c */\n}\n\nif(c != CPI_END_OF_STRING)\n{\n\t/* handle error */\n}\n```\n\n#### Tokeniser\n\nTokeniser interface provides functionality similar to `strtok(3)` function. The tokeniser\nis fully re-entrant with no hidden state, and its input string is not modified while being\nparsed.\n\n##### Typical usage:\n```C\n// declare and initialise tokeniser state\nstr_tok_state state;\n\nstr_tok_init(\u0026state, source_string, delimiter_set);\n\n// object to receive tokens\nstr token = str_null;\n\n// token iterator\nwhile(str_tok(\u0026token, \u0026state))\n{\n    /* process \"token\" */\n}\n```\n\n##### Tokeniser API\n\n`void str_tok_init(str_tok_state* const state, const str src, const str delim_set)`\u003cbr\u003e\nInitialises tokeniser state with the given source string and delimiter set. The delimiter set\nis treated as bytes, _not_ as UNICODE code points encoded in UTF-8.\n\n`bool str_tok(str* const dest, str_tok_state* const state)`\u003cbr\u003e\nRetrieves the next token and stores it in the `dest` object. Returns `true` if the token has\nbeen read, or `false` if the end of input has been reached. Retrieved token is always\na reference to a slice of the source string.\n\n`void str_tok_delim(str_tok_state* const state, const str delim_set)`\u003cbr\u003e\nChanges the delimiter set associated with the given tokeniser state. The delimiter set is\ntreated as bytes, _not_ as UNICODE code points encoded in UTF-8.\n\n## Tools\n\nAll the tools are located in `tools/` directory. Currently, there are the following tools:\n\n* `file-to-str`: The script takes a file (text or binary) and a C variable name, and\nwrites to `stdout` C source code where the variable (of type `str`) is defined\nand initialised with the content of the file.\n\n* `gen-char-class`: Generates character classification functions that do the same as their\n`isw*()` counterparts under the current locale as specified by `LC_ALL` environment variable.\nRun `tools/gen-char-class --help` for further details, or `tools/gen-char-class --space`\nto see an example of its output.\n\n## Project Status\nThe library requires at least a C11 compiler. So far has been tested on Linux Mint versions\nfrom 19.3 to 22.0, with `gcc` versions from 9.5.0 to 13.2.0 (with either `libc` or `musl`),\nand `clang` versions up to 18.1.3; it is also reported to work on ALT Linux 9.1 for Elbrus, with\n`lcc` version 1.25.09.\n","funding_links":[],"categories":["String Manipulation","压缩"],"sub_categories":["Advanced books"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmaxim2266%2Fstr","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmaxim2266%2Fstr","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmaxim2266%2Fstr/lists"}