An open API service indexing awesome lists of open source software.

https://github.com/dkogan/findglobals

Finds all global state in a program by looking at the DWARF data
https://github.com/dkogan/findglobals

Last synced: 12 days ago
JSON representation

Finds all global state in a program by looking at the DWARF data

Awesome Lists containing this project

README

          

* Overview
This is a proof-of-concept program that is able to find all global state in a
program by parsing its debug information (which is assumed to be available).

The global state I'm looking for is all persistent data:

- globals variables (static or otherwise)
- static local variables in functions

I'm only interested in data that can change, so I look /only/ at the writeable
segment. This program is beta-quality, but works for me, and is very useful.

* Example
=tst1.c=:
#+BEGIN_SRC C
// ....
struct S1 v1[10];

volatile const char volatile_const_char;
const volatile char const_volatile_char;
const char const_char = 1;

void print_tst1(void)
{
showvar(v1);
showvar(volatile_const_char);
showvar(const_volatile_char);
showvar_readonly(const_char);
}
#+END_SRC

=tst2.c=:
#+BEGIN_SRC C
// ....
int v2 = 5;

const char* const_char_pointer = NULL;
const char const* const_char_const_pointer = NULL;
char const* char_const_pointer = NULL;

const char const_string_array[] = "456";
static const char static_const_string_array[] = "abc";

struct S0 s0;
int x0[0];
int x1[30];

void print_tst2(void)
{
static int static_int_5[5];
static double static_double;
double dynamic_d;

showvar(static_int_5);
showvar(static_double);
showvar(v2);
showvar(const_char_pointer);
showvar_readonly(const_char_const_pointer);
showvar_readonly(char_const_pointer);
showvar_readonly(const_string_array);
showvar_readonly(static_const_string_array);
showvar(s0);
showvar(x0);
showvar(x1);
}
#+END_SRC

=show_globals_from_this_process.c=:
#+BEGIN_SRC C
// ...
#include "getglobals.h"

int main(int argc, char* argv[])
{
printf("=================== Program sees: ===================\n");

print_tst1();
print_tst2();

printf("=================== DWARF sees: ===================\n");

get_addrs_from_this_process_from_DSO_with_function((void (*)())&print_tst1, "tst");
return 0;
}
#+END_SRC

Results:

#+BEGIN_EXAMPLE
$ ./show_globals_from_this_process 2>/dev/null

=================== Program sees: ===================
v1 at 0x55756bc8e240, size 80
volatile_const_char at 0x55756bc8e220, size 1
const_volatile_char at 0x55756bc8e290, size 1
readonly const_char at 0x55756ba8cc65, size 1
static_int_5 at 0x55756bc8e040, size 20
static_double at 0x55756bc8e038, size 8
v2 at 0x55756bc8e010, size 4
const_char_pointer at 0x55756bc8e030, size 8
readonly const_char_pointer_const at 0x55756ba8cbd0, size 8
readonly char_pointer_const at 0x55756ba8cbc8, size 8
readonly const_string_array at 0x55756ba8cbc4, size 4
readonly static_const_string_array at 0x55756ba8cbc0, size 4
s0 at 0x55756bc8e120, size 100
x0 at 0x55756bc8e184, size 0
x1 at 0x55756bc8e1a0, size 120
=================== DWARF sees: ===================
v2 at 0x55756bc8e010, size 4
const_char_pointer at 0x55756bc8e030, size 8
readonly const_char_pointer_const at 0x55756ba8cbd0, size 8
readonly char_pointer_const at 0x55756ba8cbc8, size 8
readonly const_string_array at 0x55756ba8cbc4, size 4
readonly static_const_string_array at 0x55756ba8cbc0, size 4
s0 at 0x55756bc8e120, size 100
x1 at 0x55756bc8e1a0, size 120
static_int_5 at 0x55756bc8e040, size 20
static_double at 0x55756bc8e038, size 8
v1 at 0x55756bc8e240, size 80
volatile_const_char at 0x55756bc8e220, size 1
const_volatile_char at 0x55756bc8e290, size 1
readonly const_char at 0x55756ba8cc65, size 1

dima@shorty:$ ./show_globals_from_this_process 2>/dev/null | sort | uniq -c | sort

1 =================== DWARF sees: ===================
1 =================== Program sees: ===================
1 x0 at 0x55be5ba3d184, size 0
2 const_char_pointer at 0x55be5ba3d030, size 8
2 const_volatile_char at 0x55be5ba3d290, size 1
2 readonly char_pointer_const at 0x55be5b83bbc8, size 8
2 readonly const_char at 0x55be5b83bc65, size 1
2 readonly const_char_pointer_const at 0x55be5b83bbd0, size 8
2 readonly const_string_array at 0x55be5b83bbc4, size 4
2 readonly static_const_string_array at 0x55be5b83bbc0, size 4
2 s0 at 0x55be5ba3d120, size 100
2 static_double at 0x55be5ba3d038, size 8
2 static_int_5 at 0x55be5ba3d040, size 20
2 v1 at 0x55be5ba3d240, size 80
2 v2 at 0x55be5ba3d010, size 4
2 volatile_const_char at 0x55be5ba3d220, size 1
2 x1 at 0x55be5ba3d1a0, size 120
#+END_EXAMPLE

In the example above we see the test program report the locations and addreses
of the data in the test DSO. Then the instrumentation reports the addresses and
data that it sees. These should be identical: the instrumentation should figure
out the correct addresses. We then sort and count duplicate lines in the output.
If the two sets of reports were identical, we should see all lines appear 2
times. We see this with 2 trivial exceptions:

- The lines containing ======= are delimiters
- =x0= has size 0, so the instrumentation doesn't even bother to report it

It is also possible to query non-loaded ELF files by running a different
utility, and passing the query DSO on the commandline:

#+BEGIN_EXAMPLE
dima@shorty:$ ./show_globals_from_elf_file tst.so 2>/dev/null

v2 at 0x212028, size 4
const_char_pointer at 0x212050, size 8
readonly const_char_pointer_const at 0x10cc8, size 8
readonly char_pointer_const at 0x10cc0, size 8
readonly const_string_array at 0x10cbc, size 4
readonly static_const_string_array at 0x10cb8, size 4
s0 at 0x212080, size 100
x1 at 0x212100, size 120
static_int_5 at 0x212060, size 20
static_double at 0x212058, size 8
v1 at 0x2121a0, size 80
volatile_const_char at 0x212180, size 1
const_volatile_char at 0x2121f0, size 1
readonly const_char at 0x10d60, size 1
#+END_EXAMPLE

This is the same data, but since this DSO wasn't loaded, the addresses all have
offsets.

Finally, the variable ranges can be visualized with something like this:

#+BEGIN_EXAMPLE
dima@shorty:$ ./show_globals_from_elf_file tst.so 2>/dev/null | \
awk -n '!/readonly/ && /, size/ \
{ addr = and($3,0xFFFFFFFF); \
print addr,"range", 0,addr, addr+$5; \
print addr,"labels",1,$1 }' | \
feedgnuplot --domain --dataid --rangesize range 3 --rangesize labels 2 \
--style range 'with xerrorbars' \
--style labels 'with labels' \
--ymin -2 --ymax 3 \
--set 'xtics format "0x%08x"'
#+END_EXAMPLE

* Notes
- The example in this tree works with both static linking
(=show_globals_from_this_process=) and dynamic linking
(=show_globals_from_this_process_viaso=).

- This is proof-of-concept software. It's an excellent starting point if you
need this functionality, but do thoroughly test it for your use case.

- This was written with C code in mind, and tested to work with Fortran as well.
I can imagine that C++ can produce persistent state in more ways. Again, test
it thoroughly

- The libraries I'm using to parse the DWARF are woefully underdocumented, and
I'm probably not doing everything 100% correctly. At the risk of repeating
myself: test it thoroughly.

- For ELF objects linked into the executable normally (whether statically or
dynamically) this works. If we're doing something funny like loading
libraries ourselves with =libdl=, then it /probably/ works too, but I'd test
it before assuming.

This effort uncovered a bug in gcc, and one in elfutils.

** gcc bug

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78100

It turns out that gcc erroneously omits the sizing information if you have an
array object pre-declared as an =extern= of unknown size (as was done in my
specific library). So if you have this source:

#+BEGIN_SRC C
#include

extern int s[];
int s[] = { 1,2,3 };

int main(void)
{
printf("%zd\n", sizeof(s));
return 0;
}
#+END_SRC

then the object produced by gcc <= 6.2 doesn't know that =s= has /3/ integers.

** elfutils bug

Apparently elfutils was misreporting the sizes of multi-dimensional arrays.
Reported and fixed here:

https://sourceware.org/bugzilla/show_bug.cgi?id=22546

* Copyright and License
Copyright 2016 Dima Kogan
2017-2018 California Institute of Technology

Released under an MIT-style license:

Permission is hereby granted, free of charge, to any person obtaining a copy of
this software and associated documentation files (the "Software"), to deal in
the Software without restriction, including without limitation the rights to
use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of
the Software, and to permit persons to whom the Software is furnished to do so.