{"id":13387711,"url":"https://github.com/ampotos/dynstruct","last_synced_at":"2025-03-13T12:32:08.551Z","repository":{"id":30824037,"uuid":"34381364","full_name":"ampotos/dynStruct","owner":"ampotos","description":"Reverse engineering tool for automatic structure recovering and memory use analysis based on DynamoRIO and Capstone","archived":false,"fork":false,"pushed_at":"2019-08-12T09:15:09.000Z","size":555,"stargazers_count":310,"open_issues_count":18,"forks_count":35,"subscribers_count":21,"default_branch":"master","last_synced_at":"2024-08-04T10:05:07.912Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"C","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ampotos.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2015-04-22T09:27:14.000Z","updated_at":"2024-08-02T17:56:27.000Z","dependencies_parsed_at":"2022-08-24T14:20:37.582Z","dependency_job_id":null,"html_url":"https://github.com/ampotos/dynStruct","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ampotos%2FdynStruct","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ampotos%2FdynStruct/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ampotos%2FdynStruct/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ampotos%2FdynStruct/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ampotos","download_url":"https://codeload.github.com/ampotos/dynStruct/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":221366049,"owners_count":16806154,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-07-30T12:01:27.866Z","updated_at":"2024-10-25T00:30:46.950Z","avatar_url":"https://github.com/ampotos.png","language":"C","readme":"# dynStruct\ndynStruct is a tool using dynamoRio to monitor memory accesses of an ELF binary via a data gatherer,\nand use this data to recover structures of the original code.\n\n![First screenshot](http://i.imgur.com/e5avsw0.png)\n\ndynStruct can also be used to quickly find where and by which function a member of a structure is write or read.\n\n## Papers\ndynStruct was the subject of my master thesis and also the subject of a publication.  \n[Master Thesis](https://kar.kent.ac.uk/58461/)  \n[Publication](http://ieeexplore.ieee.org/document/7884661/)\n\n## Requirements\n### Data gatherer\n* CMake \u003e= 2.8\n* [DynamoRIO](https://github.com/DynamoRIO/dynamorio) : Do not use the last release of DynamoRIO (there is a compilation error with one of the extension used by dynStruct). The last version at DynamoRIO [cronbuild](https://github.com/DynamoRIO/dynamorio/releases) is recommended. However in case of any problem [build 7.91.18109](https://github.com/DynamoRIO/dynamorio/releases/tag/cronbuild-7.91.18109) was sucessfully tested and can be use as safe fallback.\n\n### Structure recovery and web interface\n* Python3\n* [Capstone](http://www.capstone-engine.org/)\n* [Bottle](http://bottlepy.org/docs/dev/index.html)\n\n## Setup\n### Data Gatherer \nSet the environment variable DYNAMORIO_HOME to the absolute path of your\nDynamoRIO installation\n\nExecute `build.sh`\n\nTo compile dynStruct for a 32bits target on a 64bits os execute `build.sh 32`\n\n### Structure recovery and web interface\nInstall dependencies for dynStruct.py: `pip3 install -r requirements.txt`\n\n## Data gatherer\n\n### Usage\n\n```\ndrrun -opt_cleancall 3 -c \u003cdynStruct_path\u003e \u003cdynStruct_args\u003e -- \u003cprog_path\u003e \u003cprog_args\u003e\n\n  -h print this help\n  -o \u003cfile_name\u003e\tset output name for json file\n\t\t\t if a file with this name already exist the default name will be used\n\t\t\t in the case of forks, default name will be used for forks json files\n\t\t\t (default: \u003cprog_name\u003e.\u003cpid\u003e)\n  -d \u003cdir_name\u003e\t\tset output directory for json files\n\t\t         (default: current directory)\n  - \t\t\tprint output on console\n\t\t\t Usable only on very small programs\n  -w \u003cmodule_name\u003e\twrap \u003cmodule_name\u003e\n\t\t\t dynStruct record memory blocks only\n\t\t\t if *alloc is called from this module\n  -m \u003cmodule_name\u003e\tmonitor \u003cmodule_name\u003e\n\t\t\t dynStruct record memory access only if\n\t\t\t they are done by a monitore module\n  -a \u003cmodule_name\u003e\tis used to tell dynStruct which module implements\n\t\t\t allocs functions (malloc, calloc, realloc and free)\n\t\t\t this has to be used with the -w option (ex : \"-a ld -w ld\")\n\t\t\t this option can only be used one time\nfor -w, -a and -m options modules names are matched like \u003cmodule_name\u003e*\nthis allow to don't care about the version of a library\n-m libc.so match with all libc verison\n\nThe main module is always monitored and wrapped\nTha libc allocs functions are always used (regardless the use of the -a option)\n\nExample : drrun -opt_cleancall 3 -c dynStruct -m libc.so - -- ls -l\n\nThis command run \"ls -l\" and will only look at block allocated by the program\nbut will monitor and record memory access from the program and the libc\nand print the result on the console\n```\n\n### Example\n\nWe are going to analyse this little program.\n\n```C\nvoid print(char *str)\n{\n  puts(str);\n  str[1] = 'a';\n  puts(str);\n}\n\nint main()\n{\n  char *str;\n\n  str = malloc(5);\n  strcpy(str, \"test\");\n  str[4] = 0;\n  print(str);\n\n  free(str);\n}\n```\nWhich after compilation look like this\n![Example disassembly](http://i.imgur.com/L2i4zJS.png)\n\nIf we run `drrun -c  dynStruct - -- tests/example` we get\n```\ntest\ntast\nblock : 0x0000000000602010-0x0000000000602015(0x5) was free\nalloc by 0x00000000004005b5(main : 0x00000000004005a8 in test_mini) and free by 0x00000000004005ea(main : 0x00000000004005a8 in test_mini)\n\t WRITE :\n\t was access at offset 1 (1 times)\n\tdetails :\n\t\t\t 1 bytes were accessed by 0x0000000000400596 (print : 0x0000000000400576 in test_mini, opcode: c60061) 1 times\n\t was access at offset 4 (2 times)\n\tdetails :\n\t\t\t 1 bytes were accessed by 0x00000000004005c8 (main : 0x00000000004005a8 in test_mini, opcode: c6400400) 1 times\n\t\t\t 1 bytes were accessed by 0x00000000004005d4 (main : 0x00000000004005a8 in test_mini, opcode: c60000) 1 times\n\t was access at offset 0 (1 times)\n\tdetails :\n\t\t\t 4 bytes were accessed by 0x00000000004005c2 (main : 0x00000000004005a8 in test_mini, opcode: c70074657374) 1 times\n```\nWe see all the write accesses on str done by the program himself.\nWe can notice the 4 bytes access at offset 0 of the block due to gcc optimisation for initializing the string.\n\nNow if we run `drrun -c  dynStruct -m libc - -- tests/example` we are going to monitor all the libc accesses, and we get\n```\ntest\ntast\nblock : 0x0000000000602010-0x0000000000602015(0x5) was free\nalloc by 0x00000000004005b5(main : 0x00000000004005a8 in example) and free by 0x00000000004005ea(main : 0x00000000004005a8 in example)\n\t READ :\n\t was access at offset 1 (2 times)\n\tdetails :\n\t\t\t 1 bytes were accessed by 0x00007f51cc88b483 (_IO_default_xsputn : 0x00007f51cc88b410 in libc.so.6, opcode: 0fb67500) 1 times\n\t\t\t 1 bytes were accessed by 0x00007f51cc889aad (_IO_file_xsputn@@GLIBC_2.2.5 : 0x00007f51cc889980 in libc.so.6, opcode: 80380a) 1 times\n\t was access at offset 2 (2 times)\n\tdetails :\n\t\t\t 1 bytes were accessed by 0x00007f51cc88b483 (_IO_default_xsputn : 0x00007f51cc88b410 in libc.so.6, opcode: 0fb67500) 1 times\n\t\t\t 1 bytes were accessed by 0x00007f51cc889aad (_IO_file_xsputn@@GLIBC_2.2.5 : 0x00007f51cc889980 in libc.so.6, opcode: 80380a) 1 times\n\t was access at offset 3 (2 times)\n\tdetails :\n\t\t\t 1 bytes were accessed by 0x00007f51cc88b483 (_IO_default_xsputn : 0x00007f51cc88b410 in libc.so.6, opcode: 0fb67500) 1 times\n\t\t\t 1 bytes were accessed by 0x00007f51cc889a91 (_IO_file_xsputn@@GLIBC_2.2.5 : 0x00007f51cc889980 in libc.so.6, opcode: 807aff0a) 1 times\n\t was access at offset 0 (5 times)\n\tdetails :\n\t\t\t 1 bytes were accessed by 0x00007f51cc88b483 (_IO_default_xsputn : 0x00007f51cc88b410 in libc.so.6, opcode: 0fb67500) 1 times\n\t\t\t 16 bytes were accessed by 0x00007f51cc896d76 (strlen : 0x00007f51cc896d50 in libc.so.6, opcode: f30f6f20) 2 times\n\t\t\t 4 bytes were accessed by 0x00007f51cc89a8ae (__mempcpy_sse2 : 0x00007f51cc89a880 in libc.so.6, opcode: 8b0e) 1 times\n\t\t\t 1 bytes were accessed by 0x00007f51cc889aad (_IO_file_xsputn@@GLIBC_2.2.5 : 0x00007f51cc889980 in libc.so.6, opcode: 80380a) 1 times\n\t WRITE :\n\t was access at offset 1 (1 times)\n\tdetails :\n\t\t\t 1 bytes were accessed by 0x0000000000400596 (print : 0x0000000000400576 in example, opcode: c60061) 1 times\n\t was access at offset 4 (2 times)\n\tdetails :\n\t\t\t 1 bytes were accessed by 0x00000000004005c8 (main : 0x00000000004005a8 in example, opcode: c6400400) 1 times\n\t\t\t 1 bytes were accessed by 0x00000000004005d4 (main : 0x00000000004005a8 in example, opcode: c60000) 1 times\n\t was access at offset 0 (1 times)\n\tdetails :\n\t\t\t 4 bytes were accessed by 0x00000000004005c2 (main : 0x00000000004005a8 in example, opcode: c70074657374) 1 times\n```\nNow all the read accesses done by the libc are listed.\n\n### Multi-process programs\nIf you use the data gatherer on a multi-process program, a file per process will be written.\nThis allow to analyse each process independently.\nIn the case of an execution of an other program the new file will have the name of the program executed.\nIf you don't want the data gatherer to follow new processes use the options -no_follow_children for drrun (example : `drrun -no_follow_children -opt_cleancall 3 -c dynStruct -m libc.so - -- ls -l`)\n\n### Output\nIn order to reduce the memory overhead of the data gatherer, if there is an output file the data will be written in it every 100 block free.\n\n### Known issues\nThe buffering has an issue on my setup when analyzing xterm, no problem with other program or other systems. The issue is a bit of the json output is missing. If you have this issue as well please fill an issue with detail of your setup and of the program analyzed. To fix it if needed comment the macro in includes/out_json.h and uncomment the first macro named DS_PRINTF.\n\n## Structure recovery\n\nThe python script dynStruct.py do the structure recovery and can start the web_ui.\n\nThe idea behind the structure recovery is to have a quick idea of the structures used by the program.\n\nIt's impossible to recover exactly the structures used in the original source code, so some choices had to be made.\nTo recover the size of members dynStruct.py look at the size of the accesses for a particular offset, it keep the more used\nsize, if 2 or more size are used the same number of time it keep the smaller size.\n\nThe default types are ```int\u003csize\u003e_t```, all the default names are ```offset_\u003coffset_in_the_struct\u003e```.\nSome offset in blocks have no read or write accesses in the ouput of the dynStruct dynamoRIO client, so the empty offset are fill\nwith array called ```pad_offset\u003coffset_in_the_struct\u003e```, all pading are uint8_t.\nArray are detected, 5 or more consecutive members of a struct with the same size is considered as an array.\nArray are named ```array_\u003coffset_in_the_struct\u003e```.\nThe last thing that is detected is array of structure, named ```struct_array_\u003coffset_in_the_struct\u003e```    \n\ndynStruct also record the assembly instruction which does the access and a context instruction, the context instruction is the next one in the case of a read access and the previous one in the case of write access.  \nThis context allow to recover type of structure members, the recovered type are pointer (when possible with commentaire for struct pointer and array pointer), function pointer, double, float, signed integer and unsigned integer. When a type is recovered it replace the type in the default type of the structure member.  \nThis context analysis if not 100% reliable but is usually right.\n\nThe recovery of struct try to be the most compact as possible, the output will look like :\n```\nstruct struct_14{\n\tint32_t array_0x0[5];\n\tstruct {\n\t\tint64_t offset_0x0;\n\t\tint8_t offset_0x8;\n\t\tint8_t pad_offset_0x9[7];\n\t}struct_array_0x14[2];\n} \n```\nThe recovery process can take a few minutes if there are large blocks.\n### Usage\n```\nusage: dynStruct.py [-h] [-d DYNAMO_FILE] [-p PREVIOUS_FILE] [-o OUT_PICKLE]\n                    [-n] [-e \u003cfile_name\u003e] [-c] [-w] [-l BIND_ADDR]\n\nDynstruct analize tool\n\noptional arguments:\n  -h, --help        show this help message and exit\n  -d DYNAMO_FILE    output file from dynStruct dynamoRio client\n  -p PREVIOUS_FILE  file to load serialized data\n  -o OUT_PICKLE     file to store serialized data.\n  -n                just load json without recovering structures\n  -e \u003cfile_name\u003e    export structures in C style on \u003cfile_name\u003e\n  -c                print structures in C style on console\n  -w                start the web view\n  -l BIND_ADDR      bind addr for the web view default 127.0.0.1:24242\n```\n### Example\n```\ngcc tests/test.c -o tests/test\ndrrun -opt_cleancall 3 -c dynStruct -o out_test -- tests/test\npython3 dynStruct.py -d out_test  -c\n```\nwill display\n```\n//total size : 0x20\nstruct struct_2 \n{\tint32_t offset_0x0;\n\tint32_t offset_0x4;\n\tint32_t offset_0x8;\n\tuint8_t pad_offset_0xc[4];\n\tvoid(*ptr_fun)() offset_0x10;\n\tint8_t offset_0x18;\n\tuint8_t pad_offset_0x19[7];\n};\n```\n\nThe same output can be obtained via a serialized file:\n```\npython3 dynStruct.py -d out_test  -o serialize_test\npython3 dynStruct.py -p serialize_test -c\n```\n\nSerialized file allow you to load data from a binary without having to redo the structure recovery. Serialized file also saved modification done to structures via the web ui.\n\n### Recovery accuracy\nThe recovering of structure is actually not always 100% accurate but it can still give you a good idea of the internal structures of a program. Improving this will be the main goal in the following month.  \n\n### Known issue\nFor now the script dynStruct.py keep all loaded data from the json on memory in diverse objects. This mean for big json file (multiple hundred of Mo) the memory consumption of this script is very high and may even run out of memory, making the structure recovery process through dynStruct.py impossible. This is not a priority right now but in the future this issue will have a particular attention.\n\n## Web interface\nDynStruct has a web interface which display raw data from the gatherer and the structures recovered by dynStruct.py\n\nThis web interface can be start by using dynStruct.py with the -w option (and -l to change the listening ip/port of the interface).\n\nIf we took the last previous example we can start the web interface with:\n```\npython3 dynStruct.py -p serialize_test -w\n```\nWhen you go to the web interface you will see something like:\n![Block search view](http://i.imgur.com/YdJqAQx.png)\n\n### Navigation\nThe navbar on top allow you to switch between the different data dynStruct can display.  \nBlocks and accesses are raw data from data gatherer (but more readable than console ouput of the gatherer and with search fields).  \nStructures contain structures recovered by dynStruct.py and the structures created by the user if any.  \nDownload header allow you to download a C style header with the actuals structure (similar to ./dynStruct.py -e or ./dynStruct.py -c).\n\n### Blocks and Access\nThe detailed view of blocks display blocks informations, all accesses made in this block and a link to the corresponding structure if any.\nThe context instruction is the previous instruction for a write access and the next one for a read access, this is used to in the structure recovery process.\n![Block detailed view] (http://i.imgur.com/sB4PpjD.png)\n\n### Structures\nYou can access structures detailed view by clicking on the name of the structure in the structures search page.\n![structure detailed view] (http://i.imgur.com/AVeVyjr.png)\n\nOn this view you have the information about the structure, the members of the structure and the list of blocks which are instances of this structure. You can edit the members of the structure and the list of blocks associated with this structure.    \n  \nThere is also a detailed view for each member with the list of accesses made to this member on each block associated with the structure.\n![member detailed view] (http://i.imgur.com/cGJN0ZC.png)\n\nA member can be an inner structure, an array, an array of structure or a simple member (everything which don't match with the previous categories). Union and bitfield are not actually handle by dynStruct.\n\nWhen you edit a structure you can remove it, remove a member (on the edit member view), or edit a member (it's name, type, size and number of units in the case of an array). You can also create new member in the place of any padding member.\n![edit member view](http://i.imgur.com/GmNyNra.png)\n\nOn the Edit instance view you can select multiple blocks (by clicking on them) of the first list to remove them and multiple blocks on the second list to add them. When you click on Edit instances both selected suppression and addition will be executed.  \nOn the second list only blocks with the same size than the structure and not already associated with the structure will be displayed.\n![edit instance view] (http://i.imgur.com/an97OHp.png)\n\n### Edits saving\nAll edits are saved in the serialized file if any (work with files give with args -o and -p).\n\n### Known issue\nThe web interface is run via the same script than the structure recovery, so thay the web interface have the same issue than the structure recovery.\n","funding_links":[],"categories":["\u003ca id=\"c8cdb0e30f24e9b7394fcd5681f2e419\"\u003e\u003c/a\u003eDynamoRIO"],"sub_categories":["\u003ca id=\"6c4841dd91cb173093ea2c8d0b557e71\"\u003e\u003c/a\u003e工具"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fampotos%2Fdynstruct","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fampotos%2Fdynstruct","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fampotos%2Fdynstruct/lists"}