This reveals that we've got really bogus times from these events. I
think there are two things here: the begin is the submit time, not the
submit-to-hardware time. And the end time is retire, which is too
delayed to be really useful. We need to move i915 over to the
low-level tracepoints.
However, this code proved to be useful for vc4, where I have good
timings.