This brings over some of the techniques from the old memprof design.
Sysprof and memprof shared a lot of code, so it is pretty natural to
bring back the same callgraph view based on memory allocations.
This reuses the StackStash just like it did in memprof. While it
would be nice to reuse some existing tools out there, the fit of
memprof with sysprof is so naturally aligned, it's not really a
big deal to bring back the LD_PRELOAD. The value really comes
from seeing all this stuff together instead of multiple apps.
There are plenty of things we can implement on top of this that
we are not doing yet such as temporary allocations, cross-thread
frees, graphing the heap, and graphing differences between the
heap at to points in time. I'd like all of these things, given
enough time to make them useful.
This is still a bit slow though due to the global lock we take
to access the writer. To improve the speed here we need to get
rid of that lock and head towards a design that allows a thread
to request a new writer from Sysprof and save it in TLS (to be
destroyed when the thread exits).
If the path provided to us is an executable program (instead of a syscap
file) then we can setup the path as the binary to execute in the profiler
assistant and save the user a couple clicks.
This ensures that we only have one thread doing reloads of stack frame
depths at a time. While we only ref the reader in the state, it should
still be fine because cursors *always* make a copy of the reader for their
internal use. I don't think this should fix#23, but it may reduce the
chances of it happening.
It's unclear to me what could cause #23 to happen, unless for some reason
multiple threads were sharing the reader's internal buffer causing the
frame-> dereferences to be junk. But as stated with reader copies, that
should not be able to happen.
Another possible avenue is that the task is cancelled and for some reason
the task is clearing the task data while the thread is running. Again,
that is not supposed to be possible given the design of GTask as it
should not release task data until finalized.
This changes a couple of our structures to use the atomic rc box instead
of gslice directly. It shouldn't affect anything, just some general
modernization while looking at #23
On a Raspberry PI 4, Sysprof crashes immediately when
trying to memchr() on the 'line' variable. The current
RPI4 support is poor on Linux mainline admitedly, but
having Sysprof to work on it at least gives us a chance
to profile the major bottlenecks.
Protect against NULL 'line'.
This simplifies the visualizer sizing by avoiding the expanding sizes
when there is more space available. Doing so allows us to treat all the
sizing uniformly.
We can also make the ticks area a visualizer for more code re-use.