{"id":19458457,"url":"https://github.com/postgrespro/libblobstamper","last_synced_at":"2026-02-28T12:38:08.540Z","repository":{"id":65644713,"uuid":"395090594","full_name":"postgrespro/libblobstamper","owner":"postgrespro","description":"Framework for Structure Aware Fuzzing. Allows to build own stamps that would convert pulp-data that came from fuzzer to data with structure you need","archived":false,"fork":false,"pushed_at":"2025-07-09T06:11:49.000Z","size":217,"stargazers_count":17,"open_issues_count":1,"forks_count":2,"subscribers_count":9,"default_branch":"master","last_synced_at":"2025-10-01T23:57:07.430Z","etag":null,"topics":["fuzzing","sdl","security","structure-aware-fuzzing"],"latest_commit_sha":null,"homepage":"","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/postgrespro.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2021-08-11T19:02:39.000Z","updated_at":"2025-07-09T06:11:52.000Z","dependencies_parsed_at":"2025-07-22T07:33:41.673Z","dependency_job_id":"8b6bc80a-0c33-4dc8-b93a-c8f6273f1731","html_url":"https://github.com/postgrespro/libblobstamper","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/postgrespro/libblobstamper","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/postgrespro%2Flibblobstamper","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/postgrespro%2Flibblobstamper/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/postgrespro%2Flibblobstamper/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/postgrespro%2Flibblobstamper/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/postgrespro","download_url":"https://codeload.github.com/postgrespro/libblobstamper/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/postgrespro%2Flibblobstamper/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29934307,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-28T12:34:16.884Z","status":"ssl_error","status_checked_at":"2026-02-28T12:34:13.721Z","response_time":90,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["fuzzing","sdl","security","structure-aware-fuzzing"],"created_at":"2024-11-10T17:27:09.197Z","updated_at":"2026-02-28T12:38:08.519Z","avatar_url":"https://github.com/postgrespro.png","language":"C++","funding_links":[],"categories":[],"sub_categories":[],"readme":"# LibBlobStamper\n\nTool for building structured data out of random input.\n\nWhen you do fuzzing testing of a project that is complex enough to have\nsyntax parser, and core functionality behind it, your fuzzer will probably\nspend a lot of cpu working with syntax parser. Fuzzing syntax parser is a\ngood thing, but it does not really help to fuzz core functionality. One of the\ngeneral approaches here may be is to generate such input data that is syntactically\ncorrect from the perspective of syntax parser, and meanwhile totally random form \nthe perspective of core functionality.\n\nLibBlobStamper is a tool for building convertors form random binary data\ninto random data with required syntax.\n\nLibBlomStamper is written in C++, can be used for generating both text data\nwith desired syntax, and structured binary data (e.g. C-structures). You can\nbuild this convertor right into you binary test-unit, so this conversion will be\nopaque both for fuzzer, or DSE-tools.\n\n\n## Overview\n\n### Blob\n\nBlob is a chunk of binary data that presumably came fuzzer, it is considered to be random.\nBlob data is used by Stamps for constructing structured values (syntactically correct strings or C-structures).\nWhen Stamp uses chunk of Blob's data, this data is removed from the Blob.\nYou can use stamp on a Blob as many times as you like until you are out of Blob data.\n\n### Stamps\n\nStamp is a C++ object that \"bites\" chunk of binary data from Blob and converts it\ninto certain structured representation (text string with syntax that is provided\nby stamp or C-structure)\n```\n  char data[] =\"abcdefghijk\";\n  auto blob = std::make_shared\u003cBlob\u003e(data, strlen(data));  // Blob with \"random\" data\n\n  StampArithm\u003cshort int\u003e stamp;  // Stamp for getting short integer (both string and value representations)\n\n  std::string s;\n  short int i;\n\n  s = stamp.ExtractStr(blob);    // bite short int data from blob and save it to a string. Will get \"25185\"\n  i = stamp.ExtractValue(blob);  // bite short int data from blob and save it to short int vriable. Will get 25699\n```\n\nAs you can see Stamps can extract values of various type.\nEach extracted type is provided with proper extract method:\n\n* `ExtractStr` returns `std::string` value.\nThat string will be formatted according to the syntax that is implemented in this extract method\n\n* `ExtractValue` returns \"value\" of C-structure or of another C-variable.\nIn the example above it is value of `short int`-variable.\n\n* `ExtractPValue` same as `ExtractValue`, but returns pointer to the value.\nOr, more precisely, `sized_ptr\u003cT\u003e` pointer (see below (FIXME not written yet))\n\n* `ExtractBin` same as `ExtractPValue` but returns extracted structure as array of characters (`std::vector\u003cchar\u003e`).\nYou will can work with it as binary buffer or cast it to desired type manually.\n\nStamp must have at least one of the extract method implemented.\nIn your own stamp you will probably implement only extract methods you need, either string or binary.\n`StampArithm\u003cT\u003e` has all of them, but this seems to be an exceptional case.\n\n#### Stamp Sizes\n\nAmount of data that can be consumed by Stamp is called Stamp Size.\nDepending on Min Stamp Size and Max Stamp Size, Stamps cam be divided into three groups:\n\n* **Fixed Size Stamps**: Stamp consumes fixed amount of data (Min Stamp Size == Max Stamp Size). \nFor example `StampArithm\u003cT\u003e` stamp always consumes `sizeof(T)` bytes.\n\n* **Variated Size Stamp**: Min Stamp Size != Max Stamp Size.\nFor example stamp that generates string with random Latin letters 3 to 16 character long.\nIt consumes 3..16 bytes and \"normalizes\" them to Latin character bytes.\n\n* **Unbounded Size Stamp**: Stamp that has Min Size, but will consume any amount of data if provided. \n\nMin and Max Stamp Sizes are available via `minSize()` and `maxSize()` methods.\nFor Unbound Size Stamps `maxSize()` is set to `-1`.\n\nAlso please note, that stamps are greedy, they will try to consume all data they can.\nE.g. Unbounded Size Stamp will consume all data from the Blob.\nVariated Size Stamp will try to eat `MaxSize()` bytes, but will be satisfied with anything grater or equal to `minSize()`.\n\nTo limit Stamps appetite you should use Galleys.\n\n### Galleys\n\nGalley is a way to squeeze several Stamps into one object.\nYou can think about LibBlobStamper's Galley as about letterpress galley: you have several stamps, you put them into a galley, and now you have one bigger stamp.\nYou would definitely need Galley if you want to split Blob data between several Unbounded Stamps.\nEach Stamp tries to use all data, and Galley is the way to divide available data between Stamps.\nFor Variated Stamps story is the same: they must not always get all data they want.\n\nThere are two types of Galleys in LibBlobStamper now: GalleyVector and GalleySet.\n\n#### Galley Vector\n\nGalley Vector is used to slice all Blob data into parts using one selected stamp.\nFor Fixed Size Stamp, blob will be chopped to parts that fits the Stamp, and all these parts will be fed to the Stamp.\nFor Variated and Unbounded Stamps Galley will use tricky algorithm to decide how to split the Blob data (the algorithm will be discussed later) and then will apply target stamp to each data chunk.\nGalley will return `std::vector\u003cstd::string\u003e` or `std::vercor\u003cT\u003e`, depending on what extract type you are going to use.\n\nExample:\n\n```\n  char data[] =\"abcdefghijk\";\n  auto blob1 = std::make_shared\u003cBlob\u003e(data, strlen(data));  // Blob with \"random\" data\n  auto blob2 = std::make_shared\u003cBlob\u003e(data, strlen(data));  // Another Blob with same data\n\n  auto stamp = std::make_shared\u003cStampArithm\u003cshort int\u003e\u003e();  // Stamp for short integer data (both string and value)\n\n  GalleyVectorStr galley_s(stamp);\n  GalleyVectorV\u003cshort int\u003e galley_v(stamp);\n\n  std::vector\u003cstd::string\u003e res_s = galley_s.ExtractStrVector(blob1);\n  std::vector\u003cshort int\u003e   res_v = galley_v.ExtractValuesVector(blob2);\n```\n\n#### Galley Set\n\nGalley Set allows simultaneously apply stamps of different types.\nSame as Galley Vector it uses tricky algorithm to divide Blob Data between stamps, but in this case these are different Stamps.\n\nFor now Galley Set works with String and Binary extracted types.\nIt is not quite clear how to implement Galley with Values extracted type using C++ facilities.\n\nExample:\n\n```\n  char data[] =\"abcdefghijk\";\n  auto blob = std::make_shared\u003cBlob\u003e(data, strlen(data));  // Blob with \"random\" data\n\n  auto stamp_i = std::make_shared\u003cStampArithm\u003cshort int\u003e\u003e();  // Stamp for short integer data (both string and value)\n  auto stamp_f = std::make_shared\u003cStampArithm\u003cfloat\u003e\u003e();      // Stamp for float numeric data (both string and value)\n\n  GalleySetStr galley({stamp_i, stamp_f});\n\n  std::vector\u003cstd::string\u003e res = galley.ExtractStrSet(blob);\n```\n\n#### Creating Stamps from Galleys\n\nGalleys and Stamps inherit same base class, so you can make Stamp from a Galley by implementing appropriate Extract method.\nThis will be explained below in \"Creating Stamps\" section.\n\n### Recursion\n\nLibBlobStamper have been designed keeping in mind that it should be able to create strings with nested syntax (e.g. arithmetic expressions).\nThis work is still in progress, it is quite raw to be documented properly, but you can explore `examples/exampleZZ.cpp` to see current status of Stamp recursion.\n\n\n## Creating your own Stamps\n\nGeneral idea: you should inherit from base class that provides Extract method we need (e.g. inherit from `StampBaseStr` to get `ExtractStr()`). \nImplement `minSize()` and `maxSize()` methods, and Extract method you've chosen.\n\nNormally you will seldom need to work with raw blob data.\nMost probably you will combine existing basic stamps to create complex one.\n\n### Creating String Stamp\n\nLet's imagine you need to generate stamp for complex numbers `a + ib` (e.g. `12+ 3i`). \nLet's imagine that `a` and `b` are not really big integers.\nTo build this stamp we will use two arithmetic stamps that will give us text representation of `short int`, and we will combine them the way we want.\n\nClass definition will look like this.\nWe define stamps we will use while building string right inside the class.\n\n```\nclass ComplexIntStamp: public StampBaseStr\n{\n  protected:\n    StampArithm\u003cshort int\u003e stampA, stampB;\n  public:\n    virtual int minSize() override;\n    virtual int maxSize() override;\n    virtual std::string ExtractStr(std::shared_ptr\u003cBlob\u003e blob) override;\n};\n```\nActually here we can have one `StampArithm\u003cshort int\u003e` stamp, and apply it two times.  \nBut for making an example more clear, we will explicitly declare both stamps.\n\nAs we are going to apply each stamp only once, we can calculate min and max\nsizer of our new stamp as sum of min and max sizes of stamps we have used:\n\n```\nint ComplexIntStamp::minSize()\n{\n  return stampA.minSize() + stampB.minSize();\n}\n\nint ComplexIntStamp::maxSize()\n{\n  return stampA.maxSize() + stampB.maxSize();\n}\n```\n\nNow we will implement Extract Method.\nWe just extract two values with `stampA` and `stampB` and combine them into string we want.\n\n```\nstd::string ComplexIntStamp::ExtractStr(std::shared_ptr\u003cBlob\u003e blob)\n{\n  std::string A, B;\n  A = stampA.ExtractStr(blob);\n  B = stampB.ExtractStr(blob);\n  return A + \" + \" + B + \"i\";\n}\n```\n\nNow you can use your stamp the way any stamp is used:\n\n```\nint main()\n{\n  char data[] = \"abcdef\";\n  auto blob = std::make_shared\u003cBlob\u003e(data, strlen(data));\n  ComplexIntStamp stamp;\n\n  std::string s = stamp.ExtractStr(blob);\n\n  std::cout \u003c\u003c \"String value: '\" \u003c\u003c s \u003c\u003c\"'\\n\";\n}\n```\nAs you can see, creating a new stamp is quite simple thing.\n\n### Creating Value Stamp\n\nLet's imagine there is a C-structure we want to fill with random data from fuzzer:\n\n```\ntypedef struct {\n  short int re;\n  short int im;\n} complex_short;\n```\n\nSame story with complex short int, but this time we want it represented as C-structure.\n\nTo create the Stamp to produce this structure from Blob Data we should inherit\nfrom `StampBaseV\u003ccomplex_short\u003e`\n\n```\nclass ComplexIntStamp: public StampBaseV\u003ccomplex_short\u003e\n{\n  protected:\n    StampArithm\u003cshort int\u003e stampA, stampB;\n  public:\n    virtual int minSize() override;\n    virtual int maxSize() override;\n    virtual complex_short ExtractValue(std::shared_ptr\u003cBlob\u003e blob) override;\n};\n```\n\n`minSize()` and `maxSize()` methods here would be the same as in String Stamp above, because here we extract same two `short int` values. \nWe process them differently, but values are same.\n\n```\nint ComplexIntStamp::minSize()\n{\n  return stampA.minSize() + stampB.minSize();\n}\n\nint ComplexIntStamp::maxSize()\n{\n  return stampA.maxSize() + stampB.maxSize();\n}\n```\n\nIn `ExtractValue` method we locally create desired structure, fill it with\nvalues fetched from the Blob, and return the structure by value.\n\n```\ncomplex_short ComplexIntStamp::ExtractValue(std::shared_ptr\u003cBlob\u003e blob)\n{\n  complex_short res;\n  res.re = stampA.ExtractValue(blob);\n  res.im = stampB.ExtractValue(blob);\n  return res;\n}\n```\n\nThen we can use stamp for extracting `complex_short` directly from the Blob:\n\n```\nint main()\n{\n  char data[] = \"abcdef\";\n  auto blob = std::make_shared\u003cBlob\u003e (data, strlen(data));\n  ComplexIntStamp stamp;\n\n  complex_short cs = stamp.ExtractValue(blob);\n\n  std::cout \u003c\u003c \" re=\" \u003c\u003c cs.re \u003c\u003c\" im=\"\u003c\u003c cs.im \u003c\u003c\"'\\n\";\n}\n```\n\n#### Creating Stamp from Galley\n\nAs it was mentioned before, Galley inherits from same base class as Stamp do.\nSo you may add Extract method to a Galley, and use this new object as a Stamp.\nTo add extract method to Galley you should use multiple inheritance.\n\n```\nclass ArrayOfComplexIntStamp: public GalleyVectorStr, public StampBaseStr\n{\n  public:\n    ArrayOfComplexIntStamp(): GalleyVectorStr(std::dynamic_pointer_cast\u003cStampBaseStr\u003e(std::make_shared\u003cComplexIntStamp\u003e()))  {};\n\n    virtual std::string ExtractStr(std::shared_ptr\u003cBlob\u003e blob) override;\n};\n\n\n```\nBecause of initialization order issue, we have to initialize the stamp inside the call of parent class constructor via `new` method, and then destroy in in the destructor.\n\nHere do not need to implement `maxSize()` and `maxSize()`, as Galley properly implements them for us.\nWe implement only the Extract method we need.\n\n\n```\nstd::string ArrayOfComplexIntStamp::ExtractStr(std::shared_ptr\u003cBlob\u003e blob)\n{\n  std::vector\u003cstd::string\u003e data = ExtractStrVector(blob);\n  std::string res = \"\";\n\n  for(std::string s : data)\n  {\n    if (!res.empty())\n    {\n      res+=\", \";\n    }\n    res+= s;\n  }\n  res = \"[\" + res + \"]\";\n  return res;\n}\n```\n\n## Understanding LibBlobStamper internals\n### How does Galley Vector works for non Unbounded Size Stamps\n\nLets imagine we have an Unbounded Size Stamp, it produce some Item.\nHere we do not care if it is C-structure, or string, we just know that this Item is used by the function we want to fuzz.\nEven more, function we want to fuzz accepts an array of these Items, so we want to provide one or more Item to the target function.\nNumber of Items is not predefined, so it should be random, to test all possible cases.\nBut it should not be really random, because we need reproducible tests, so we would use Blob Data as a source or randomness.\nE.g. take first byte of the Blob Data, and it would tell us how many Items we would have, and use the rest of Blob Data for Items' content.\nIn reality it is not that simple, but this example shows the idea well.\n\nSo here we come to the concept of Oracle.\nEach time we need to make a decision \"how many...\", \"how log\", \"in what proportion\" etc, we take a piece of Blob and make decision using that value.\nThis peace of Blob is called Oracle.\nIn Galleys `unsigned short int` value is used for Oracle.\n\nGeneral idea of applying the Oracle is following.\nWe get Oracle Value from the Blob, and use Max Oracle Value calculate Target Ratio. \nKnowing possible Max and Min values for Target value, we use Target Ratio to calculate desired Target Value. \n\nFirst step: for Unbounded Sized Stamps we should determinate into how may chunks we going to split the Blob.\nWe should have at least one chunk, so minimum value here is 1.\nMaximum number of chunks we will get, if we split Blob into chunks sized as `minSize()` of the stamp.\nSo maximum number of chunks is Blob Size / Stamp's minSize().\nUsing Oracle, we find target ratio, and using target ratio find desired Chunks Count, somewhere between 1 and Max Chunks Count.\n\nAs a second step we should determinate size of each chunk.\nWe reserve `minSize()` for each chunk (fixed part of chunk), and then spread the rest data among the chunk, so each chunk will get it's own variable part.\nTo do that, we get an Oracle for each chunk, calculate total sum of these Oracles.\nThen for each chunk we calculate a ratio, as a ratio of chunk's oracle to total sum of all Oracles. \nAfter it we use chunks ratio to spread Blob data that are available for variable parts among chunks.\n\n### How does Galley Vector works for non Variated Size Stamps\n\nWith Unbounded Size Stamps we had problem that one single stamp will try to consume all available data data.\nWith Variated Size Stamp we do not have such problem, stamp's size is limited, so we can chop Blob data for the stamps chunk by chunk, until we are out of data.\nBut we should keep in mind that Variated Size Stamp is also greedy, and will try to consume `maxSize()` bytes if they are available. \nAnd for fuzzing purposes we need results for all kind of stamp sizes.\nThat is why before chopping each chunk, we will get Oracle, that will predict us how much data (from `minSize()` till `maxSize()`) will be chopped this time.\nThis will allow us to have vector of results produced from chunks of various sizes.\n\n### How does Galley Set works\n\nFor Galley Set, story is almost the same as for Galley Vector, with one correction: in Galley Set we have all types of Stamps (Fixed, Variated, Unbounded) mixed.\nSo First we will reserve fixed part of the chunk for fixed part of each stamp.\nThe rest data we should divide between Variated and Unbounded Stamps.\nTo divide it, we getting Oracle, and conclude in what ratio remaining data would be divided among Variated and Unbounded Stamp.\nIf Variated Stamps get too much data, extra data is given to Unbounded stamps as a \"gift\".\nAfter that data is shared among each group of Stamps in similar way they were\nshared in Galley Vector.\n\n## Tricks and tips\n### Galley Vector to Array C++ trick\n\nIf you have Stamp with Value Extract Method, and you are going to use `GalleyVectorV\u003cT\u003e` to extract array of values, you should know, that `ExtractValuesVector` method of a Galley will return you values as `std::vector\u003cT\u003e`.\n`std::vector\u003cT\u003e` has actual array inside, and you can just `memcpy` it, without iterating over it.\nIt would be something like that:\n\n```\nauto stamp = std::make_shared\u003cComplexIntStamp\u003e();\nGalleyVectorV\u003ccomplex_short\u003e galley(stamp);\ncomplex_short *result;\nstd::vector\u003ccomplex_short\u003e vec = galley.ExtractValuesVector(blob);\nint result_size = vec.size();\n\nresult = (complex_short *) malloc(sizeof(complex_short) * result_size);\nmemcpy((void*) result, (void*) \u0026vec[0], sizeof(complex_short) * result_size);\n```\n\n## Trophies\n\n### PostgreSQL\n\n* [BUG #18214](https://www.postgresql.org/message-id/flat/18214-891f77caa80a35cc%40postgresql.org):\n`poly_contain` (`@\u003e`) hangs forever for input data with zeros and infinities in\nPosrgreSQL 14-16\n* [BUG #17962](https://www.postgresql.org/message-id/17962-4f00b6f26724858d%40postgresql.org): PostgreSQL 11 hangs on `poly_contain` (`@\u003e`) with specific data\n\n\n\n## Further reading\n\n1. Read examples. They are in `examples` directory.\nYou can start study LibBlobStamper with playing with them.\n\n2. Read the code.\nCode is quite clear (I hope), except for the code of Galley, that are a bit intricated.\n\n3. Read tests.\nTests are located in `t` directory.\nMuch effort were put into covering all LibBlobStamper functionality with tests.\nSo if some functionality is not covered with this README, or with standalone examples, you can find examples of usage in the tests.\n\n## Authors\n\nLibBlobStamper were created by Nikolay Shaplov from Postgres Porfessional in 2021-2022.\n\nYou can contact me via e-mail: `n.shaplov@postgrespro.ru` or via matrix: `@n.shaplov:postgrespro.ru`.\nE-mail lists and chat rooms will be created when they are needed.\n\nYou can also do Pull Requests and File Bugs at GitHub: https://github.com/postgrespro/libblobstamper\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpostgrespro%2Flibblobstamper","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fpostgrespro%2Flibblobstamper","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpostgrespro%2Flibblobstamper/lists"}