{"id":13441263,"url":"https://github.com/antirez/sds","last_synced_at":"2025-05-14T14:06:31.124Z","repository":{"id":13884322,"uuid":"16582463","full_name":"antirez/sds","owner":"antirez","description":"Simple Dynamic Strings library for C","archived":false,"fork":false,"pushed_at":"2025-04-18T10:59:31.000Z","size":85,"stargazers_count":5049,"open_issues_count":104,"forks_count":488,"subscribers_count":142,"default_branch":"master","last_synced_at":"2025-04-19T00:34:26.571Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"C","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-2-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/antirez.png","metadata":{"files":{"readme":"README.md","changelog":"Changelog","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2014-02-06T14:58:09.000Z","updated_at":"2025-04-18T10:59:36.000Z","dependencies_parsed_at":"2023-01-13T17:39:56.556Z","dependency_job_id":"36f9dca9-9e9b-4976-86d3-4ccee5178b3b","html_url":"https://github.com/antirez/sds","commit_stats":{"total_commits":48,"total_committers":7,"mean_commits":6.857142857142857,"dds":0.125,"last_synced_commit":"a9a03bb3304030bb8a93823a9aeb03c157831ba9"},"previous_names":[],"tags_count":2,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/antirez%2Fsds","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/antirez%2Fsds/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/antirez%2Fsds/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/antirez%2Fsds/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/antirez","download_url":"https://codeload.github.com/antirez/sds/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254159176,"owners_count":22024558,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-07-31T03:01:31.755Z","updated_at":"2025-05-14T14:06:31.105Z","avatar_url":"https://github.com/antirez.png","language":"C","readme":"Simple Dynamic Strings\n===\n\n**Notes about version 2**: this is an updated version of SDS in an attempt\nto finally unify Redis, Disque, Hiredis, and the stand alone SDS versions.\nThis version is **NOT* binary compatible** with SDS verison 1, but the API\nis 99% compatible so switching to the new lib should be trivial.\n\nNote that this version of SDS may be a slower with certain workloads, but\nuses less memory compared to V1 since header size is dynamic and depends to\nthe string to alloc.\n\nMoreover it includes a few more API functions, notably `sdscatfmt` which\nis a faster version of `sdscatprintf` that can be used for the simpler\ncases in order to avoid the libc `printf` family functions performance\npenalty.\n\nHow SDS strings work\n===\n\nSDS is a string library for C designed to augment the limited libc string\nhandling functionalities by adding heap allocated strings that are:\n\n* Simpler to use.\n* Binary safe.\n* Computationally more efficient.\n* But yet... Compatible with normal C string functions.\n\nThis is achieved using an alternative design in which instead of using a C\nstructure to represent a string, we use a binary prefix that is stored\nbefore the actual pointer to the string that is returned by SDS to the user.\n\n    +--------+-------------------------------+-----------+\n    | Header | Binary safe C alike string... | Null term |\n    +--------+-------------------------------+-----------+\n             |\n             `-\u003e Pointer returned to the user.\n\nBecause of meta data stored before the actual returned pointer as a prefix,\nand because of every SDS string implicitly adding a null term at the end of\nthe string regardless of the actual content of the string, SDS strings work\nwell together with C strings and the user is free to use them interchangeably\nwith other std C string functions that access the string in read-only.\n\nSDS was a C string I developed in the past for my everyday C programming needs,\nlater it was moved into Redis where it is used extensively and where it was\nmodified in order to be suitable for high performance operations. Now it was\nextracted from Redis and forked as a stand alone project.\n\nBecause of its many years life inside Redis, SDS provides both higher level\nfunctions for easy strings manipulation in C, but also a set of low level\nfunctions that make it possible to write high performance code without paying\na penalty for using an higher level string library.\n\nAdvantages and disadvantages of SDS\n===\n\nNormally dynamic string libraries for C are implemented using a structure\nthat defines the string. The structure has a pointer field that is managed\nby the string function, so it looks like this:\n\n```c\nstruct yourAverageStringLibrary {\n    char *buf;\n    size_t len;\n    ... possibly more fields here ...\n};\n```\n\nSDS strings as already mentioned don't follow this schema, and are instead\na single allocation with a prefix that lives *before* the address actually\nreturned for the string.\n\nThere are advantages and disadvantages with this approach over the traditional\napproach:\n\n**Disadvantage #1**: many functions return the new string as value, since sometimes SDS requires to create a new string with more space, so the most SDS API calls look like this:\n\n```c\ns = sdscat(s,\"Some more data\");\n```\n\nAs you can see `s` is used as input for `sdscat` but is also set to the value\nreturned by the SDS API call, since we are not sure if the call modified the\nSDS string we passed or allocated a new one. Not remembering to assign back\nthe return value of `sdscat` or similar functions to the variable holding\nthe SDS string will result in a bug.\n\n**Disadvantage #2**: if an SDS string is shared in different places in your program you have to modify all the references when you modify the string. However most of the times when you need to share SDS strings it is much better to encapsulate them into structures with a `reference count` otherwise it is too easy to incur into memory leaks.\n\n**Advantage #1**: you can pass SDS strings to functions designed for C functions without accessing a struct member or calling a function, like this:\n\n```c\nprintf(\"%s\\n\", sds_string);\n```\n\nIn most other libraries this will be something like:\n\n```c\nprintf(\"%s\\n\", string-\u003ebuf);\n```\n\nOr:\n\n```c\nprintf(\"%s\\n\", getStringPointer(string));\n```\n\n**Advantage #2**: accessing individual chars is straightforward. C is a low level language so this is an important operation in many programs. With SDS strings accessing individual chars is very natural:\n\n```c\nprintf(\"%c %c\\n\", s[0], s[1]);\n```\n\nWith other libraries your best chance is to assign `string-\u003ebuf` (or call the function to get the string pointer) to a `char` pointer and work with this. However since the other libraries may reallocate the buffer implicitly every time you call a function that may modify the string you have to get a reference to the buffer again.\n\n**Advantage #3**: single allocation has better cache locality. Usually when you access a string created by a string library using a structure, you have two different allocations for the structure representing the string, and the actual buffer holding the string. Over the time the buffer is reallocated, and it is likely that it ends in a totally different part of memory compared to the structure itself. Since modern programs performances are often dominated by cache misses, SDS may perform better in many workloads.\n\nSDS basics\n===\n\nThe type of SDS strings is just the char pointer `char *`. However SDS defines\nan `sds` type as alias of `char *` in its header file: you should use the\n`sds` type in order to make sure you remember that a given variable in your\nprogram holds an SDS string and not a C string, however this is not mandatory.\n\nThis is the simplest SDS program you can write that does something:\n\n```c\nsds mystring = sdsnew(\"Hello World!\");\nprintf(\"%s\\n\", mystring);\nsdsfree(mystring);\n\noutput\u003e Hello World!\n```\n\nThe above small program already shows a few important things about SDS:\n\n* SDS strings are created, and heap allocated, via the `sdsnew()` function, or other similar functions that we'll see in a moment.\n* SDS strings can be passed to `printf()` like any other C string.\n* SDS strings require to be freed with `sdsfree()`, since they are heap allocated.\n\nCreating SDS strings\n---\n\n```c\nsds sdsnewlen(const void *init, size_t initlen);\nsds sdsnew(const char *init);\nsds sdsempty(void);\nsds sdsdup(const sds s);\n```\n\nThere are many ways to create SDS strings:\n\n* The `sdsnew` function creates an SDS string starting from a C null terminated string. We already saw how it works in the above example.\n* The `sdsnewlen` function is similar to `sdsnew` but instead of creating the string assuming that the input string is null terminated, it gets an additional length parameter. This way you can create a string using binary data:\n\n    ```c\n    char buf[3];\n    sds mystring;\n\n    buf[0] = 'A';\n    buf[1] = 'B';\n    buf[2] = 'C';\n    mystring = sdsnewlen(buf,3);\n    printf(\"%s of len %d\\n\", mystring, (int) sdslen(mystring));\n\n    output\u003e ABC of len 3\n    ```\n\n  Note: `sdslen` return value is casted to `int` because it returns a `size_t`\ntype. You can use the right `printf` specifier instead of casting.\n\n* The `sdsempty()` function creates an empty zero-length string:\n\n    ```c\n    sds mystring = sdsempty();\n    printf(\"%d\\n\", (int) sdslen(mystring));\n\n    output\u003e 0\n    ```\n\n* The `sdsdup()` function duplicates an already existing SDS string:\n\n    ```c\n    sds s1, s2;\n\n    s1 = sdsnew(\"Hello\");\n    s2 = sdsdup(s1);\n    printf(\"%s %s\\n\", s1, s2);\n\n    output\u003e Hello Hello\n    ```\n\nObtaining the string length\n---\n\n```c\nsize_t sdslen(const sds s);\n```\n\nIn the examples above we already used the `sdslen` function in order to get\nthe length of the string. This function works like `strlen` of the libc\nexcept that:\n\n* It runs in constant time since the length is stored in the prefix of SDS strings, so calling `sdslen` is not expensive even when called with very large strings.\n* The function is binary safe like any other SDS string function, so the length is the true length of the string regardless of the content, there is no problem if the string includes null term characters in the middle.\n\nAs an example of the binary safeness of SDS strings, we can run the following\ncode:\n\n```c\nsds s = sdsnewlen(\"A\\0\\0B\",4);\nprintf(\"%d\\n\", (int) sdslen(s));\n\noutput\u003e 4\n```\n\nNote that SDS strings are always null terminated at the end, so even in that\ncase `s[4]` will be a null term, however printing the string with `printf`\nwould result in just `\"A\"` to be printed since libc will treat the SDS string\nlike a normal C string.\n\nDestroying strings\n---\n\n```c\nvoid sdsfree(sds s);\n```\n\nThe destroy an SDS string there is just to call `sdsfree` with the string\npointer. Note that even empty strings created with `sdsempty` need to be\ndestroyed as well otherwise they'll result into a memory leak.\n\nThe function `sdsfree` does not perform any operation if instead of an SDS\nstring pointer, `NULL` is passed, so you don't need to check for `NULL` explicitly before calling it:\n\n```c\nif (string) sdsfree(string); /* Not needed. */\nsdsfree(string); /* Same effect but simpler. */\n```\n\nConcatenating strings\n---\n\nConcatenating strings to other strings is likely the operation you will end\nusing the most with a dynamic C string library. SDS provides different\nfunctions to concatenate strings to existing strings.\n\n```c\nsds sdscatlen(sds s, const void *t, size_t len);\nsds sdscat(sds s, const char *t);\n```\n\nThe main string concatenation functions are `sdscatlen` and `sdscat` that are\nidentical, the only difference being that `sdscat` does not have an explicit\nlength argument since it expects a null terminated string.\n\n```c\nsds s = sdsempty();\ns = sdscat(s, \"Hello \");\ns = sdscat(s, \"World!\");\nprintf(\"%s\\n\", s);\n\noutput\u003e Hello World!\n```\n\nSometimes you want to cat an SDS string to another SDS string, so you don't\nneed to specify the length, but at the same time the string does not need to\nbe null terminated but can contain any binary data. For this there is a\nspecial function:\n\n```c\nsds sdscatsds(sds s, const sds t);\n```\n\nUsage is straightforward:\n\n```c\nsds s1 = sdsnew(\"aaa\");\nsds s2 = sdsnew(\"bbb\");\ns1 = sdscatsds(s1,s2);\nsdsfree(s2);\nprintf(\"%s\\n\", s1);\n\noutput\u003e aaabbb\n```\n\nSometimes you don't want to append any special data to the string, but you want\nto make sure that there are at least a given number of bytes composing the\nwhole string.\n\n```c\nsds sdsgrowzero(sds s, size_t len);\n```\n\nThe `sdsgrowzero` function will do nothing if the current string length is\nalready `len` bytes, otherwise it will enlarge the string to `len` just padding\nit with zero bytes.\n\n```c\nsds s = sdsnew(\"Hello\");\ns = sdsgrowzero(s,6);\ns[5] = '!'; /* We are sure this is safe because of sdsgrowzero() */\nprintf(\"%s\\n', s);\n\noutput\u003e Hello!\n```\n\nFormatting strings\n---\n\nThere is a special string concatenation function that accepts a `printf` alike\nformat specifier and cats the formatted string to the specified string.\n\n```c\nsds sdscatprintf(sds s, const char *fmt, ...) {\n```\n\nExample:\n\n```c\nsds s;\nint a = 10, b = 20;\ns = sdsnew(\"The sum is: \");\ns = sdscatprintf(s,\"%d+%d = %d\",a,b,a+b);\n```\n\nOften you need to create SDS string directly from `printf` format specifiers.\nBecause `sdscatprintf` is actually a function that concatenates strings, all\nyou need is to concatenate your string to an empty string:\n\n\n```c\nchar *name = \"Anna\";\nint loc = 2500;\nsds s;\ns = sdscatprintf(sdsempty(), \"%s wrote %d lines of LISP\\n\", name, loc);\n```\n\nYou can use `sdscatprintf` in order to convert numbers into SDS strings:\n\n```c\nint some_integer = 100;\nsds num = sdscatprintf(sdsempty(),\"%d\\n\", some_integer);\n```\n\nHowever this is slow and we have a special function to make it efficient.\n\nFast number to string operations\n---\n\nCreating an SDS string from an integer may be a common operation in certain\nkind of programs, and while you may do this with `sdscatprintf` the performance\nhit is big, so SDS provides a specialized function.\n\n```c\nsds sdsfromlonglong(long long value);\n```\n\nUse it like this:\n\n```c\nsds s = sdsfromlonglong(10000);\nprintf(\"%d\\n\", (int) sdslen(s));\n\noutput\u003e 5\n```\n\nTrimming strings and getting ranges\n---\n\nString trimming is a common operation where a set of characters are\nremoved from the left and the right of the string. Another useful operation\nregarding strings is the ability to just take a range out of a larger\nstring.\n\n```c\nvoid sdstrim(sds s, const char *cset);\nvoid sdsrange(sds s, int start, int end);\n```\n\nSDS provides both the operations with the `sdstrim` and `sdsrange` functions.\nHowever note that both functions work differently than most functions modifying\nSDS strings since the return value is void: basically those functions always\ndestructively modify the passed SDS string, never allocating a new one, because\nboth trimming and ranges will never need more room: the operations can only\nremove characters from the original string.\n\nBecause of this behavior, both functions are fast and don't involve reallocation.\n\nThis is an example of string trimming where newlines and spaces are removed\nfrom an SDS strings:\n\n```c\nsds s = sdsnew(\"         my string\\n\\n  \");\nsdstrim(s,\" \\n\");\nprintf(\"-%s-\\n\",s);\n\noutput\u003e -my string-\n```\n\nBasically `sdstrim` takes the SDS string to trim as first argument, and a\nnull terminated set of characters to remove from left and right of the string.\nThe characters are removed as long as they are not interrupted by a character\nthat is not in the list of characters to trim: this is why the space between\n`\"my\"` and `\"string\"` was preserved in the above example.\n\nTaking ranges is similar, but instead to take a set of characters, it takes\nto indexes, representing the start and the end as specified by zero-based\nindexes inside the string, to obtain the range that will be retained.\n\n```c\nsds s = sdsnew(\"Hello World!\");\nsdsrange(s,1,4);\nprintf(\"-%s-\\n\");\n\noutput\u003e -ello-\n```\n\nIndexes can be negative to specify a position starting from the end of the\nstring, so that `-1` means the last character, `-2` the penultimate, and so forth:\n\n```c\nsds s = sdsnew(\"Hello World!\");\nsdsrange(s,6,-1);\nprintf(\"-%s-\\n\");\nsdsrange(s,0,-2);\nprintf(\"-%s-\\n\");\n\noutput\u003e -World!-\noutput\u003e -World-\n```\n\n`sdsrange` is very useful when implementing networking servers processing\na protocol or sending messages. For example the following code is used\nimplementing the write handler of the Redis Cluster message bus between\nnodes:\n\n```c\nvoid clusterWriteHandler(..., int fd, void *privdata, ...) {\n    clusterLink *link = (clusterLink*) privdata;\n    ssize_t nwritten = write(fd, link-\u003esndbuf, sdslen(link-\u003esndbuf));\n    if (nwritten \u003c= 0) {\n        /* Error handling... */\n    }\n    sdsrange(link-\u003esndbuf,nwritten,-1);\n    ... more code here ...\n}\n```\n\nEvery time the socket of the node we want to send the message to is writable\nwe attempt to write as much bytes as possible, and we use `sdsrange` in order\nto remove from the buffer what was already sent.\n\nThe function to queue new messages to send to some node in the cluster will\nsimply use `sdscatlen` in order to put more data in the send buffer.\n\nNote that the Redis Cluster bus implements a binary protocol, but since SDS\nis binary safe this is not a problem, so the goal of SDS is not just to provide\nan high level string API for the C programmer but also dynamically allocated\nbuffers that are easy to manage.\n\nString copying\n---\n\nThe most dangerous and infamus function of the standard C library is probably\n`strcpy`, so perhaps it is funny how in the context of better designed dynamic\nstring libraries the concept of copying strings is almost irrelevant. Usually\nwhat you do is to create strings with the content you want, or concatenating\nmore content as needed.\n\nHowever SDS features a string copy function that is useful in performance\ncritical code sections, however I guess its practical usefulness is limited\nas the function never managed to get called in the context of the 50k\nlines of code composing the Redis code base.\n\n```c\nsds sdscpylen(sds s, const char *t, size_t len);\nsds sdscpy(sds s, const char *t);\n```\n\nThe string copy function of SDS is called `sdscpylen` and works like that:\n\n```c\ns = sdsnew(\"Hello World!\");\ns = sdscpylen(s,\"Hello Superman!\",15);\n```\n\nAs you can see the function receives as input the SDS string `s`, but also\nreturns an SDS string. This is common to many SDS functions that modify the\nstring: this way the returned SDS string may be the original one modified\nor a newly allocated one (for example if there was not enough room in the\nold SDS string).\n\nThe `sdscpylen` will simply replace what was in the old SDS string with the\nnew data you pass using the pointer and length argument. There is a similar\nfunction called `sdscpy` that does not need a length but expects a null\nterminated string instead.\n\nYou may wonder why it makes sense to have a string copy function in the\nSDS library, since you can simply create a new SDS string from scratch\nwith the new value instead of copying the value in an existing SDS string.\nThe reason is efficiency: `sdsnewlen` will always allocate a new string\nwhile `sdscpylen` will try to reuse the existing string if there is enough\nroom to old the new content specified by the user, and will allocate a new\none only if needed.\n\nQuoting strings\n---\n\nIn order to provide consistent output to the program user, or for debugging\npurposes, it is often important to turn a string that may contain binary\ndata or special characters into a quoted string. Here for quoted string\nwe mean the common format for String literals in programming source code.\nHowever today this format is also part of the well known serialization formats\nlike JSON and CSV, so it definitely escaped the simple goal of representing\nliterals strings in the source code of programs.\n\nAn example of quoted string literal is the following:\n\n```c\n\"\\x00Hello World\\n\"\n```\n\nThe first byte is a zero byte while the last byte is a newline, so there are\ntwo non alphanumerical characters inside the string.\n\nSDS uses a concatenation function for this goal, that concatenates to an\nexisting string the quoted string representation of the input string.\n\n```c\nsds sdscatrepr(sds s, const char *p, size_t len);\n```\n\nThe `scscatrepr` (where `repr` means *representation*) follows the usualy\nSDS string function rules accepting a char pointer and a length, so you can\nuse it with SDS strings, normal C strings by using strlen() as `len` argument,\nor binary data. The following is an example usage:\n\n```c\nsds s1 = sdsnew(\"abcd\");\nsds s2 = sdsempty();\ns[1] = 1;\ns[2] = 2;\ns[3] = '\\n';\ns2 = sdscatrepr(s2,s1,sdslen(s1));\nprintf(\"%s\\n\", s2);\n\noutput\u003e \"a\\x01\\x02\\n\"\n```\n\nThis is the rules `sdscatrepr` uses for conversion:\n\n* `\\` and `\"` are quoted with a backslash.\n* It quotes special characters `'\\n'`, `'\\r'`, `'\\t'`, `'\\a'` and `'\\b'`.\n* All the other non printable characters not passing the `isprint` test are quoted in `\\x..` form, that is: backslash followed by `x` followed by two digit hex number representing the character byte value.\n* The function always adds initial and final double quotes characters.\n\nThere is an SDS function that is able to perform the reverse conversion and is\ndocumented in the *Tokenization* section below.\n\nTokenization\n---\n\nTokenization is the process of splitting a larger string into smaller strings.\nIn this specific case, the split is performed specifying another string that\nacts as separator. For example in the following string there are two substrings\nthat are separated by the `|-|` separator:\n\n```\nfoo|-|bar|-|zap\n```\n\nA more common separator that consists of a single character is the comma:\n\n```\nfoo,bar,zap\n```\n\nIn many progrems it is useful to process a line in order to obtain the sub\nstrings it is composed of, so SDS provides a function that returns an\narray of SDS strings given a string and a separator.\n\n```c\nsds *sdssplitlen(const char *s, int len, const char *sep, int seplen, int *count);\nvoid sdsfreesplitres(sds *tokens, int count);\n```\n\nAs usually the function can work with both SDS strings or normal C strings.\nThe first two arguments `s` and `len` specify the string to tokenize, and the\nother two arguments `sep` and `seplen` the separator to use during the\ntokenization. The final argument `count` is a pointer to an integer that will\nbe set to the number of tokens (sub strings) returned.\n\nThe return value is a heap allocated array of SDS strings.\n\n```c\nsds *tokens;\nint count, j;\n\nsds line = sdsnew(\"Hello World!\");\ntokens = sdssplitlen(line,sdslen(line),\" \",1,\u0026count);\n\nfor (j = 0; j \u003c count; j++)\n    printf(\"%s\\n\", tokens[j]);\nsdsfreesplitres(tokens,count);\n\noutput\u003e Hello\noutput\u003e World!\n```\n\nThe returned array is heap allocated, and the single elements of the array\nare normal SDS strings. You can free everything calling `sdsfreesplitres`\nas in the example. Alternativey you are free to release the array yourself\nusing the `free` function and use and/or free the individual SDS strings\nas usually.\n\nA valid approach is to set the array elements you reused in some way to\n`NULL`, and use `sdsfreesplitres` to free all the rest.\n\nCommand line oriented tokenization\n---\n\nSplitting by a separator is a useful operation, but usually it is not enough\nto perform one of the most common tasks involving some non trivial string\nmanipulation, that is, implementing a **Command Line Interface** for a program.\n\nThis is why SDS also provides an additional function that allows you to split\narguments provided by the user via the keyboard in an interactive manner, or\nvia a file, network, or any other mean, into tokens.\n\n```c\nsds *sdssplitargs(const char *line, int *argc);\n```\n\nThe `sdssplitargs` function returns an array of SDS strings exactly like\n`sdssplitlen`. The function to free the result is also identical, and is\n`sdsfreesplitres`. The difference is in the way the tokenization is performed.\n\nFor example if the input is the following line:\n\n```\ncall \"Sabrina\"    and \"Mark Smith\\n\"\n```\n\nThe function will return the following tokens:\n\n* \"call\"\n* \"Sabrina\"\n* \"and\"\n* \"Mark Smith\\n\"\n\nBasically different tokens need to be separated by one or more spaces, and\nevery single token can also be a quoted string in the same format that\n`sdscatrepr` is able to emit.\n\nString joining\n---\n\nThere are two functions doing the reverse of tokenization by joining strings\ninto a single one.\n\n```c\nsds sdsjoin(char **argv, int argc, char *sep, size_t seplen);\nsds sdsjoinsds(sds *argv, int argc, const char *sep, size_t seplen);\n```\n\nThe two functions take as input an array of strings of length `argc` and\na separator and its length, and produce as output an SDS string consisting\nof all the specified strings separated by the specified separator.\n\nThe difference between `sdsjoin` and `sdsjoinsds` is that the former accept\nC null terminated strings as input while the latter requires all the strings\nin the array to be SDS strings. However because of this only `sdsjoinsds` is\nable to deal with binary data.\n\n```c\nchar *tokens[3] = {\"foo\",\"bar\",\"zap\"};\nsds s = sdsjoin(tokens,3,\"|\",1);\nprintf(\"%s\\n\", s);\n\noutput\u003e foo|bar|zap\n```\n\nError handling\n---\n\nAll the SDS functions that return an SDS pointer may also return `NULL` on\nout of memory, this is basically the only check you need to perform.\n\nHowever many modern C programs handle out of memory simply aborting the program\nso you may want to do this as well by wrapping `malloc` and other related\nmemory allocation calls directly.\n\nSDS internals and advanced usage\n===\n\nAt the very beginning of this documentation it was explained how SDS strings\nare allocated, however the prefix stored before the pointer returned to the\nuser was classified as an *header* without further details. For an advanced\nusage it is better to dig more into the internals of SDS and show the\nstructure implementing it:\n\n```c\nstruct sdshdr {\n    int len;\n    int free;\n    char buf[];\n};\n```\n\nAs you can see, the structure may resemble the one of a conventional string\nlibrary, however the `buf` field of the structure is different since it is\nnot a pointer but an array without any length declared, so `buf` actually\npoints at the first byte just after the `free` integer. So in order to create\nan SDS string we just allocate a piece of memory that is as large as the\n`sdshdr` structure plus the length of our string, plus an additional byte\nfor the mandatory null term that every SDS string has.\n\nThe `len` field of the structure is quite obvious, and is the current length\nof the SDS string, always computed every time the string is modified via\nSDS function calls. The `free` field instead represents the amount of free\nmemory in the current allocation that can be used to store more characters.\n\nSo the actual SDS layout is this one:\n\n    +------------+------------------------+-----------+---------------\\\n    | Len | Free | H E L L O W O R L D \\n | Null term |  Free space   \\\n    +------------+------------------------+-----------+---------------\\\n                 |\n                 `-\u003e Pointer returned to the user.\n\nYou may wonder why there is some free space at the end of the string, it\nlooks like a waste. Actually after a new SDS string is created, there is no\nfree space at the end at all: the allocation will be as small as possible to\njust hold the header, string, and null term. However other access patterns\nwill create extra free space at the end, like in the following program:\n\n```c\ns = sdsempty();\ns = sdscat(s,\"foo\");\ns = sdscat(s,\"bar\");\ns = sdscat(s,\"123\");\n```\n\nSince SDS tries to be efficient it can't afford to reallocate the string every\ntime new data is appended, since this would be very inefficient, so it uses\nthe **preallocation of some free space** every time you enlarge the string.\n\nThe preallocation algorithm used is the following: every time the string\nis reallocated in order to hold more bytes, the actual allocation size performed\nis two times the minimum required. So for instance if the string currently\nis holding 30 bytes, and we concatenate 2 more bytes, instead of allocating 32\nbytes in total SDS will allocate 64 bytes.\n\nHowever there is an hard limit to the allocation it can perform ahead, and is\ndefined by `SDS_MAX_PREALLOC`. SDS will never allocate more than 1MB of\nadditional space (by default, you can change this default).\n\nShrinking strings\n---\n\n```c\nsds sdsRemoveFreeSpace(sds s);\nsize_t sdsAllocSize(sds s);\n```\n\nSometimes there are class of programs that require to use very little memory.\nAfter strings concatenations, trimming, ranges, the string may end having\na non trivial amount of additional space at the end.\n\nIt is possible to resize a string back to its minimal size in order to hold\nthe current content by using the function `sdsRemoveFreeSpace`.\n\n```c\ns = sdsRemoveFreeSpace(s);\n```\n\nThere is also a function that can be used in order to get the size of the\ntotal allocation for a given string, and is called `sdsAllocSize`.\n\n```c\nsds s = sdsnew(\"Ladies and gentlemen\");\ns = sdscat(s,\"... welcome to the C language.\");\nprintf(\"%d\\n\", (int) sdsAllocSize(s));\ns = sdsRemoveFreeSpace(s);\nprintf(\"%d\\n\", (int) sdsAllocSize(s));\n\noutput\u003e 109\noutput\u003e 59\n```\n\nNOTE: SDS Low level API use cammelCase in order to warn you that you are playing with the fire.\n\nManual modifications of SDS strings\n---\n\n    void sdsupdatelen(sds s);\n\nSometimes you may want to hack with an SDS string manually, without using\nSDS functions. In the following example we implicitly change the length\nof the string, however we want the logical length to reflect the null terminated\nC string.\n\nThe function `sdsupdatelen` does just that, updating the internal length\ninformation for the specified string to the length obtained via `strlen`.\n\n```c\nsds s = sdsnew(\"foobar\");\ns[2] = '\\0';\nprintf(\"%d\\n\", sdslen(s));\nsdsupdatelen(s);\nprintf(\"%d\\n\", sdslen(s));\n\noutput\u003e 6\noutput\u003e 2\n```\n\nSharing SDS strings\n---\n\nIf you are writing a program in which it is advantageous to share the same\nSDS string across different data structures, it is absolutely advised to\nencapsulate SDS strings into structures that remember the number of references\nof the string, with functions to increment and decrement the number of references.\n\nThis approach is a memory management technique called *reference counting* and\nin the context of SDS has two advantages:\n\n* It is less likely that you'll create memory leaks or bugs due to non freeing SDS strings or freeing already freed strings.\n* You'll not need to update every reference to an SDS string when you modify it (since the new SDS string may point to a different memory location).\n\nWhile this is definitely a very common programming technique I'll outline\nthe basic ideas here. You create a structure like that:\n\n```c\nstruct mySharedString {\n    int refcount;\n    sds string;\n}\n```\n\nWhen new strings are created, the structure is allocated and returned with\n`refcount` set to 1. The you have two functions to change the reference count\nof the shared string:\n\n* `incrementStringRefCount` will simply increment `refcount` of 1 in the structure. It will be called every time you add a reference to the string on some new data structure, variable, or whatever.\n* `decrementStringRefCount` is used when you remove a reference. This function is however special since when the `refcount` drops to zero, it automatically frees the SDS string, and the `mySharedString` structure as well.\n\nInteractions with heap checkers\n---\n\nBecause SDS returns pointers into the middle of memory chunks allocated with\n`malloc`, heap checkers may have issues, however:\n\n* The popular Valgrind program will detect SDS strings are *possibly lost* memory and never as *definitely lost*, so it is easy to tell if there is a leak or not. I used Valgrind with Redis for years and every real leak was consistently detected as \"definitely lost\".\n* OSX instrumentation tools don't detect SDS strings as leaks but are able to correctly handle pointers pointing to the middle of memory chunks.\n\nZero copy append from syscalls\n----\n\nAt this point you should have all the tools to dig more inside the SDS\nlibrary by reading the source code, however there is an interesting pattern\nyou can mount using the low level API exported, that is used inside Redis\nin order to improve performances of the networking code.\n\nUsing `sdsIncrLen()` and `sdsMakeRoomFor()` it is possible to mount the\nfollowing schema, to cat bytes coming from the kernel to the end of an\nsds string without copying into an intermediate buffer:\n\n```c\noldlen = sdslen(s);\ns = sdsMakeRoomFor(s, BUFFER_SIZE);\nnread = read(fd, s+oldlen, BUFFER_SIZE);\n... check for nread \u003c= 0 and handle it ...\nsdsIncrLen(s, nread);\n```\n\n`sdsIncrLen` is documented inside the source code of `sds.c`.\n\nEmbedding SDS into your project\n===\n\nThis is as simple as copying the following files inside your\nproject:\n\n* sds.c\n* sds.h\n* sdsalloc.h\n\nThe source code is small and every C99 compiler should deal with\nit without issues.\n\nUsing a different allocator for SDS\n===\n\nInternally sds.c uses the allocator defined into `sdsalloc.h`. This header\nfile just defines macros for malloc, realloc and free, and by default libc\n`malloc()`, `realloc()` and `free()` are used. Just edit this file in order\nto change the name of the allocation functions.\n\nThe program using SDS can call the SDS allocator in order to manipulate\nSDS pointers (usually not needed but sometimes the program may want to\ndo advanced things) by using the API exported by SDS in order to call the\nallocator used. This is especially useful when the program linked to SDS\nis using a different allocator compared to what SDS is using.\n\nThe API to access the allocator used by SDS is composed of three functions: `sds_malloc()`, `sds_realloc()` and `sds_free()`.\n\nCredits and license\n===\n\nSDS was created by Salvatore Sanfilippo and is released under the BDS two clause license. See the LICENSE file in this source distribution for more information.\n\nOran Agra improved SDS version 2 by adding dynamic sized headers in order to\nsave memory for small strings and allow strings greater than 4GB.\n","funding_links":[],"categories":["C","Miscellaneous","HarmonyOS","String Manipulation","字符串操作","排序","多项混杂","String Manipulation ##","Members"],"sub_categories":["Windows Manager","Advanced books","模板库","多项混杂","Web Frameworks ###"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fantirez%2Fsds","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fantirez%2Fsds","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fantirez%2Fsds/lists"}