We can't get all the symbols here because of -Bsymbolic on the glib
library, but we can get the higher level bit. And if we're blocking for
a period of time, it can help track things down to know we block for
longer time periods.
The kernel symbol resolver requires access to sysprofd, which might not
be available in some contexts (such as when no polkit agent is available).
This allows that to continue working by disabling the kernel with the
user-only setting.
If we're running on a GCC older than 4.9, then we won't have the
stdatomic.h available. We can just use a full barrier instead using
__sync_synchronize() to get the same effect, albeit slower.
This is useful to find allocations free'd right after they were created.
A temporary allocation is currently defined as a free() right after an
allocation of that same memory address. From a quick glance, that appears
to be similar to what I've been seeing in heaptrack all these years.
In the long term, I'd expect we can do something more useful such as
"freed from similar stack trace" since things like g_strdup_printf()
would obviously break important temporary allocations.
We might get this information from multiple sources (such as Linux's perf
or the proc data source). So only add this information once to avoid
having additional data we don't care about.
This also helps ensure we get a proper tree for the callgraph without
splitting things between updated cmdline information.
We want to be backtracing directly into the capture buffer, but also need
to skip a small number of frames.
If we call the backtrace before filling in information, we can capture to
the position *before* ev->addrs and then overwrite that data right after.
This seems to be significantly faster than doing the manual stepping. A
quick look shows that it has a number of special cases which we'd have
to duplicate, so best to just use it directly.
This removes the 8 bytes of framing data from the MappedRingBuffer which
means we can write more data without racing. But also this means that we
can eventually use the mapped ring buffer as our normal buffer for
capture writing (to be done later).
This uses a simplified form of collection without writing to capture
files directly. The data is written into a ring buffer for Sysprof to
pick up and copy into the real destination file. Using the mmap() ring
buffer allows loss of data when Sysprof cannot keep up, but on the other
hand allows the inferior to be fast enough to be useful.
This is a source that will allow the inferior to call into Sysprof to
create a new mmap()'d ring buffer to share data. This allows significantly
less overhead in the child process as Sysprof itself will take care of
copying the data out of the inferior into the final capture file. There is
more copying of course, but less intrusive to the inferior itself.
This brings over some of the techniques from the old memprof design.
Sysprof and memprof shared a lot of code, so it is pretty natural to
bring back the same callgraph view based on memory allocations.
This reuses the StackStash just like it did in memprof. While it
would be nice to reuse some existing tools out there, the fit of
memprof with sysprof is so naturally aligned, it's not really a
big deal to bring back the LD_PRELOAD. The value really comes
from seeing all this stuff together instead of multiple apps.
There are plenty of things we can implement on top of this that
we are not doing yet such as temporary allocations, cross-thread
frees, graphing the heap, and graphing differences between the
heap at to points in time. I'd like all of these things, given
enough time to make them useful.
This is still a bit slow though due to the global lock we take
to access the writer. To improve the speed here we need to get
rid of that lock and head towards a design that allows a thread
to request a new writer from Sysprof and save it in TLS (to be
destroyed when the thread exits).
This is meant to allow us to find the debug files for a given library for
podman containers running as the current user. However, we still need to
try to translate the fuse-overlayfs paths when parsing the /proc/pid/mounts
or we'll have incorrect paths coming from the event stream.