We need some sort of scale for content, so we will do this with
an overlay for now. However, we will also want something to be
able to do selections in the future.
This still needs some iteration for correctness, but this sort
of gets the ball rolling.
If we have not yet received our proper draw for the new size
allocation (likely right after the size allocate), then we can
just use the old surface but at a scaled value. This is handy
so that we don't block the main loop trying to do drawing of
lots of data points. Instead we just scale the image and wait
for the high-quality version to complete.
This starts getting the mechanics in place for off screen
rendering using a cairo image surface. We create our own
point cache for storing x,y pairs and then simplify our
drawing based on that.
When creating our private reader copy, we need to reset the
reader so that we start at the beginning. Otherwise, we are
likely to be at the end of the capture (especially for in
memory captures).
This is a simple cache that keeps x,y pairs for use when drawing
visualizers. To keep this generic, and save on memory, we simply
store the x,y coordinates as floats between 0.0 and 1.0. This
saves us roughly 50% on each data point over the 2 8-byte
numbers we would otherwise store.
Obviously, we could take this further and make some fancy index
storage with run-length-encode values, but this should work for
now and allow us to get more exotic later.
This provides the plumbing to do the threaded drawing, we just
need to write the capture cursor and draw operations from the
pixman/cairo worker thread (and do so safely).
This requires that both the left and right condition evaluate
to TRUE. We obviously will want to add more of these for things
like OR, NOT, etc. However, we can add them as necessary since
they are fairly self contained patches.
This lets us focus on the query of "show me all events related
to counter X" rather than the implementation details. Which
means later on, if we build a real index, we can optimize this
without changing user code.
This API helps us simplify some of the tooling to iterate
through a capture. In particular, we might want to setup a
bunch of matches and then just iterate through the items.
This can also allow delaying the iteration until the future
which might be handy for visualizers which won't want to block
the main loop.
I'm not jazzed about the 64k buffer created for every cursor
due to the SpCaptureReader copy, but it's probably not a big
deal in practice until we start doing more exotic things.
This function allows copying a capture so that we can do
additional reads. This does, however, copy the buffers which
might be more memory than we want for large usage. We can
tweak things as we go to figure out the cursors.
In case we are building in a flatpak, we might want to rely on a system
installed sysprofd. This means we might want to pretend we have sysprofd
support (to be found on the system), but not actually build sysprofd.
We need this for recent changes to how natural sizing will work now that
the max-content-* properties are stabilizing for their first public ABI
release.
The kernel says here http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/kernel/events/ring_buffer.c?id=7a1dcf6ad#n61 :
* Since the mmap() consumer (userspace) can run on a different CPU:
*
* kernel user
*
* if (LOAD ->data_tail) { LOAD ->data_head
* (A) smp_rmb() (C)
* STORE $data LOAD $data
* smp_wmb() (B) smp_mb() (D)
* STORE ->data_head STORE ->data_tail
* }
*
* Where A pairs with D, and B pairs with C.
*
* In our case (A) is a control dependency that separates the load of
* the ->data_tail and the stores of $data. In case ->data_tail
* indicates there is no room in the buffer to store $data we do not.
*
* D needs to be a full barrier since it separates the data READ
* from the tail WRITE.
*
* For B a WMB is sufficient since it separates two WRITEs, and for C
* an RMB is sufficient since it separates two READs.
*
* See perf_output_begin().
So I'm pretty sure we need a full barrier before writing out data_tail.