We have a fairly large buffer for perf events, so we should be able to
process these much less frequently to help reduce the process performing
the profile from showing up in the profiling results.
It is not enough to get writability, as that wont trigger on the perf
source. Instead, we need to check the mmap'd header and drive things off
of that.
It might be nice to eventually determine how many of the samples are from
our own process and back-off our timeout based on that.
This is a bit different than how we did things previously, but the same
mechanics are involved. Instead of multiple CPU registered together, we'll
just use one-stream-per-cpu.
Partly because I intend to drop support for profiling a single process as
that doesn't really get used much nor does it seem to yield very good
results from perf.