This frame type can be used to communicate with the peer over the mapped
ring buffer to denote that writing is finished and it can free any
resources for the mapping.
This removes the 8 bytes of framing data from the MappedRingBuffer which
means we can write more data without racing. But also this means that we
can eventually use the mapped ring buffer as our normal buffer for
capture writing (to be done later).
This uses a simplified form of collection without writing to capture
files directly. The data is written into a ring buffer for Sysprof to
pick up and copy into the real destination file. Using the mmap() ring
buffer allows loss of data when Sysprof cannot keep up, but on the other
hand allows the inferior to be fast enough to be useful.
This is a simplified API for the inferior to use (such as from a
LD_PRELOAD) that will use mmap()'d ring buffer created by Sysprof. Doing
so can reduce the amount of overhead in the inferior enough to make some
workloads useful. For example, collecting memory statistics and backtraces
is now fast enough to be useful.
This is a source that will allow the inferior to call into Sysprof to
create a new mmap()'d ring buffer to share data. This allows significantly
less overhead in the child process as Sysprof itself will take care of
copying the data out of the inferior into the final capture file. There is
more copying of course, but less intrusive to the inferior itself.
This is the start of a ring buffer to coordinate between processes without
the overhead of writing directly to files within the inferior process.
Instead, the parent process can monitor the ring buffer for framing
information and pass that along to the capture writer.
This brings over some of the techniques from the old memprof design.
Sysprof and memprof shared a lot of code, so it is pretty natural to
bring back the same callgraph view based on memory allocations.
This reuses the StackStash just like it did in memprof. While it
would be nice to reuse some existing tools out there, the fit of
memprof with sysprof is so naturally aligned, it's not really a
big deal to bring back the LD_PRELOAD. The value really comes
from seeing all this stuff together instead of multiple apps.
There are plenty of things we can implement on top of this that
we are not doing yet such as temporary allocations, cross-thread
frees, graphing the heap, and graphing differences between the
heap at to points in time. I'd like all of these things, given
enough time to make them useful.
This is still a bit slow though due to the global lock we take
to access the writer. To improve the speed here we need to get
rid of that lock and head towards a design that allows a thread
to request a new writer from Sysprof and save it in TLS (to be
destroyed when the thread exits).
This is meant to allow us to find the debug files for a given library for
podman containers running as the current user. However, we still need to
try to translate the fuse-overlayfs paths when parsing the /proc/pid/mounts
or we'll have incorrect paths coming from the event stream.
If the path provided to us is an executable program (instead of a syscap
file) then we can setup the path as the binary to execute in the profiler
assistant and save the user a couple clicks.