mirror of
https://github.com/varun-r-mallya/sysprof.git
synced 2026-02-12 16:10:54 +00:00
Update TODO
svn path=/trunk/; revision=367
This commit is contained in:
194
TODO
194
TODO
@ -54,8 +54,15 @@ Before 1.2:
|
|||||||
- copying kernel stack to userspace
|
- copying kernel stack to userspace
|
||||||
- it's always 4096 bytes these days
|
- it's always 4096 bytes these days
|
||||||
- heuristically determine functions based on address
|
- heuristically determine functions based on address
|
||||||
|
- callbacks on the stack can be identified
|
||||||
|
by having an offset of 0.
|
||||||
|
- even so there is a lot of false positives.
|
||||||
- is eh_frame usually loaded into memory during normal
|
- is eh_frame usually loaded into memory during normal
|
||||||
operation
|
operation? It is mapped, but probably not paged in,
|
||||||
|
so we will be taking a few major page faults when we
|
||||||
|
first profile something.
|
||||||
|
Unless of course, we store the entire stack in
|
||||||
|
the stackstash. This may use way too much memory though.
|
||||||
|
|
||||||
- vdso
|
- vdso
|
||||||
- assume its the same across processes, just look at
|
- assume its the same across processes, just look at
|
||||||
@ -76,7 +83,8 @@ Before 1.2:
|
|||||||
- do heuristic stackwalk in kernel
|
- do heuristic stackwalk in kernel
|
||||||
- do heuristic stackwalk in userland
|
- do heuristic stackwalk in userland
|
||||||
|
|
||||||
* "Expand all" is horrendously slow because update screenshot gets called
|
|
||||||
|
* "Expand all" is horrendously slow because update_screenshot gets called
|
||||||
for every "expanded" signal. In fact even normal expanding is really
|
for every "expanded" signal. In fact even normal expanding is really
|
||||||
slow. It's probably hopeless to get decent performance out of GtkTreeView,
|
slow. It's probably hopeless to get decent performance out of GtkTreeView,
|
||||||
so we will have to store a list of expanded objects and keep that uptodate
|
so we will have to store a list of expanded objects and keep that uptodate
|
||||||
@ -87,7 +95,7 @@ Before 1.2:
|
|||||||
Or try to parse the machine code. Positions that are called are likely
|
Or try to parse the machine code. Positions that are called are likely
|
||||||
to be functions.
|
to be functions.
|
||||||
|
|
||||||
* Give more sensible 'error messages'. Ie., if you get permission denied for
|
* Give more sensible 'error messages'. Eg., if you get permission denied for
|
||||||
a file, put "Permission denied" instead of "No map"
|
a file, put "Permission denied" instead of "No map"
|
||||||
|
|
||||||
* crc32 checking probably doesn't belong in elfparser.c
|
* crc32 checking probably doesn't belong in elfparser.c
|
||||||
@ -97,6 +105,7 @@ Before 1.2:
|
|||||||
- it's inconvenient that you have to pass in both a parser _and_
|
- it's inconvenient that you have to pass in both a parser _and_
|
||||||
a record. The record should just contain a pointer to the parser.
|
a record. The record should just contain a pointer to the parser.
|
||||||
On the other hand, the result does depend on the parser->offset.
|
On the other hand, the result does depend on the parser->offset.
|
||||||
|
So it's a bit confusing that it's not passed in.
|
||||||
|
|
||||||
- the bin_parser_seek_record (..., 1); idiom is a little dubious
|
- the bin_parser_seek_record (..., 1); idiom is a little dubious
|
||||||
|
|
||||||
@ -111,9 +120,6 @@ Before 1.2:
|
|||||||
|
|
||||||
* Make it compilable against a non-running kernel.
|
* Make it compilable against a non-running kernel.
|
||||||
|
|
||||||
* commandline version should check that the output file is writable
|
|
||||||
before starting the profiling.
|
|
||||||
|
|
||||||
* Maybe report idle time? Although this would come for free with the
|
* Maybe report idle time? Although this would come for free with the
|
||||||
timelines.
|
timelines.
|
||||||
|
|
||||||
@ -135,22 +141,6 @@ Before 1.2:
|
|||||||
just another gtk+ bug.
|
just another gtk+ bug.
|
||||||
|
|
||||||
- Fix bugs/performance issues:
|
- Fix bugs/performance issues:
|
||||||
- decorate_node should be done lazily
|
|
||||||
- Find out why we sometimes get completely ridicoulous stacktraces,
|
|
||||||
where main seems to be called from within Xlib etc. This happens
|
|
||||||
even after restarting everything.
|
|
||||||
- It looks like the stackstash-reorg code confuses "main" from
|
|
||||||
unrelated processes. - currently it looks like if multiple
|
|
||||||
"main"s are present, only one gets listed in the object list.
|
|
||||||
Seems to mostly happen when multiple processes are
|
|
||||||
involved.
|
|
||||||
- Numbers in caller view are completely screwed up.
|
|
||||||
- It looks like it sometimes gets confused with similar but different
|
|
||||||
processes: Something like:
|
|
||||||
process a spends 80% in foo() called from bar()
|
|
||||||
process b spends 1% in foo() called from baz()
|
|
||||||
we get reports of baz() using > 80% of the time.
|
|
||||||
Or something.
|
|
||||||
- add_trace_to_tree() might be a little slow when dealing with deeply
|
- add_trace_to_tree() might be a little slow when dealing with deeply
|
||||||
recursive profiles. Hypothesis: seen_nodes can grow large, and the
|
recursive profiles. Hypothesis: seen_nodes can grow large, and the
|
||||||
algorithm is O(n^2) in the length of the trace.
|
algorithm is O(n^2) in the length of the trace.
|
||||||
@ -290,6 +280,67 @@ Before 1.2:
|
|||||||
would only need to store a list of hashcodes that we
|
would only need to store a list of hashcodes that we
|
||||||
have generated previously.
|
have generated previously.
|
||||||
|
|
||||||
|
- One problem with doing DWARF walking is that the debug code
|
||||||
|
will have to be faulted in. This can be a substantial amount
|
||||||
|
of disk access which is undesirable to have during a
|
||||||
|
profiling run. Even if we only have to fault in the
|
||||||
|
.eh_frame_hdr section, that's still 18 pages for gtk+. The
|
||||||
|
.eh_frame section for gtk+ is 72 pages.
|
||||||
|
|
||||||
|
A possibility may be to consider two stacktraces identical if
|
||||||
|
the only differing values are *outside* the text segments.
|
||||||
|
This may work since stack frames tend to be the same size.
|
||||||
|
|
||||||
|
It is then sufficient in user space to only store one
|
||||||
|
representative for each set of considered-identical stack
|
||||||
|
traces.
|
||||||
|
|
||||||
|
User space storage: Use the stackstash tree. When a new trace
|
||||||
|
is added, just skip over nodes that differ, but where none of
|
||||||
|
them points to text segments. Two possibilities then:
|
||||||
|
|
||||||
|
- when two traces are determined to differ, store them
|
||||||
|
in completely separate trees. This ensures that we
|
||||||
|
will never run the dwarf algorithm on an invalid
|
||||||
|
stack trace, but also means that we won't get shared
|
||||||
|
prefixes for stacktraces.
|
||||||
|
|
||||||
|
- when two traces are determined to differ, branch off
|
||||||
|
as currently. This will share more data, but the
|
||||||
|
dwarf algorithm could be run on invalid traces. It
|
||||||
|
may work in practice though if the compiler
|
||||||
|
generally uses fixed stack frames.
|
||||||
|
|
||||||
|
A twist on is to mark the complete stack traces as
|
||||||
|
"complete". Then after running the DWARF algorithm,
|
||||||
|
the generated stack trace can be saved with it. This
|
||||||
|
way incomplete stack traces branching off a complete
|
||||||
|
one can be completed using the DWARF information for
|
||||||
|
the shared part.
|
||||||
|
|
||||||
|
|
||||||
|
* How to get the user stack:
|
||||||
|
|
||||||
|
/* In principle we should use get_task_mm() but
|
||||||
|
* that will use task_lock() leading to deadlock
|
||||||
|
* if somebody already has the lock
|
||||||
|
*/
|
||||||
|
if (spin_is_locked (¤t->alloc_lock))
|
||||||
|
printk ("alreadylocked\n");
|
||||||
|
{
|
||||||
|
struct mm_struct *mm = current->mm;
|
||||||
|
if (mm)
|
||||||
|
{
|
||||||
|
printk (KERN_ALERT "stack size: %d (%d)\n",
|
||||||
|
mm->start_stack - regs->REG_STACK_PTR,
|
||||||
|
current->pid);
|
||||||
|
|
||||||
|
stacksize = mm->start_stack - regs->REG_STACK_PTR;
|
||||||
|
}
|
||||||
|
else
|
||||||
|
stacksize = 1;
|
||||||
|
}
|
||||||
|
|
||||||
* If interrupt happens in kernel mode, send both
|
* If interrupt happens in kernel mode, send both
|
||||||
kernel stack and user space stack, have userspace stitch them
|
kernel stack and user space stack, have userspace stitch them
|
||||||
together. well, they could be stitched together in the kernel.
|
together. well, they could be stitched together in the kernel.
|
||||||
@ -358,7 +409,7 @@ http://www.linuxbase.org/spec/booksets/LSB-Embedded/LSB-Embedded/ehframe.html
|
|||||||
look in dwarf2-frame.[ch] in the gdb distribution.
|
look in dwarf2-frame.[ch] in the gdb distribution.
|
||||||
|
|
||||||
Also look at bozo-profiler
|
Also look at bozo-profiler
|
||||||
http://www-sop.inria.fr/dream/personnel/Mathieu.Lacage/bozo-profiler/bozo-profiler-1.1.tar.gz
|
http://cutebugs.net/bozo-profiler/
|
||||||
which has an elf32 parser/debugger
|
which has an elf32 parser/debugger
|
||||||
|
|
||||||
- Make busy cursors more intelligent
|
- Make busy cursors more intelligent
|
||||||
@ -499,12 +550,8 @@ Later:
|
|||||||
- Find out how to hack around gtk+ bug causing multiple double clicks
|
- Find out how to hack around gtk+ bug causing multiple double clicks
|
||||||
to get eaten.
|
to get eaten.
|
||||||
|
|
||||||
- Consider what it would take to take stacktraces of other languages
|
- Consider what it would take to take stacktraces of other languages such
|
||||||
|
as perl, python, java, ruby, or bash. Or scheme.
|
||||||
- perl,
|
|
||||||
- python
|
|
||||||
- java
|
|
||||||
- bash
|
|
||||||
|
|
||||||
Possible solution is for the script binaries to have a function
|
Possible solution is for the script binaries to have a function
|
||||||
called something like
|
called something like
|
||||||
@ -516,10 +563,14 @@ Later:
|
|||||||
This function would behave essentially like a signal handler: couldn't
|
This function would behave essentially like a signal handler: couldn't
|
||||||
call malloc(), couldn't call printf(), etc.
|
call malloc(), couldn't call printf(), etc.
|
||||||
|
|
||||||
Note thought that scripting languages will generally have a stack with
|
Note though that scripting languages will generally have a stack with
|
||||||
both script-binary-stack, script stack, and library stacks. We wouldn't
|
both script-binary-stack, script stack, and library stacks. We wouldn't
|
||||||
want scripts to need to parse dwarf. Also if we do that thing with
|
want scripts to need to parse dwarf. Also if we do that thing with
|
||||||
sending the entire stack to userspace, things will be further complicated.
|
sending the entire stack to userspace, things will be further
|
||||||
|
complicated.
|
||||||
|
|
||||||
|
Also note languages like scheme that uses heap allocated activation
|
||||||
|
records.
|
||||||
|
|
||||||
- Consider this usecase:
|
- Consider this usecase:
|
||||||
Someone is considering replacing malloc()/free() with a freelist
|
Someone is considering replacing malloc()/free() with a freelist
|
||||||
@ -615,50 +666,73 @@ Later:
|
|||||||
it asynchronously.
|
it asynchronously.
|
||||||
Visualization: A timeline with alternating CPU/disk activity.
|
Visualization: A timeline with alternating CPU/disk activity.
|
||||||
|
|
||||||
- What function is doing all the synchronous reading, and what files/offsets is
|
- What function is doing all the synchronous reading, and what
|
||||||
it reading. Visualization: lots of reads across different files out of one
|
files/offsets is it reading. Visualization: lots of reads across
|
||||||
function
|
different files out of one function
|
||||||
|
|
||||||
- A piece of the program is doing disk I/O. We can drop that entire piece of
|
- A piece of the program is doing disk I/O. We can drop that
|
||||||
code. Sysprof visualization is ok, although seeing the files accessed is useful
|
entire piece of code. Sysprof visualization is ok, although seeing
|
||||||
so that we can tell if those files are not just going to be used in
|
the files accessed is useful so that we can tell if those files are
|
||||||
other places. (Gnumeric plugin_init()).
|
not just going to be used in other places. (Gnumeric plugin_init()).
|
||||||
|
|
||||||
- A function is reading a file synchronously, but there is other (CPU/disk) stuff
|
- A function is reading a file synchronously, but there is other
|
||||||
that could be done at the same time. Visualization: A piece of the timeline
|
(CPU/disk) stuff that could be done at the same time. Visualization:
|
||||||
is diskbound with little or no CPU used.
|
A piece of the timeline is diskbound with little or no CPU used.
|
||||||
|
|
||||||
- Want to improve code locality of library or binary. Visualization: no GUI, just
|
- Want to improve code locality of library or binary. Visualization:
|
||||||
produce a list of functions that should be put first in the file. Then run the
|
no GUI, just produce a list of functions that should be put first in
|
||||||
program again until the list converges. (Valgrind may be more useful here).
|
the file. Then run the program again until the list converges.
|
||||||
|
(Valgrind may be more useful here).
|
||||||
|
|
||||||
- Nautilus reads a ton of files, icons + all the files in the homedirectory.
|
- Nautilus reads a ton of files, icons + all the files in the
|
||||||
Normal sysprof visualization is probably useful enough.
|
homedirectory. Normal sysprof visualization is probably useful
|
||||||
|
enough.
|
||||||
|
|
||||||
- Profiling a login session.
|
- Profiling a login session.
|
||||||
|
|
||||||
- Many applications are running at the same time, doing IPC. It would be useful
|
- Many applications are running at the same time, doing IPC. It would
|
||||||
if we could figure out what other things a given process is waiting on. Eg., in
|
be useful if we could figure out what other things a given process
|
||||||
poll, find out what processes have the other ends of the fd's open.
|
is waiting on. Eg., in poll, find out what processes have the other
|
||||||
Visualization: multiple lines on a graph. Lines join up where one process
|
ends of the fd's open.
|
||||||
is blocking on another. That would show processes holding up the progress
|
Visualization: multiple lines on a graph. Lines join up where
|
||||||
very clearly.
|
one process is blocking on another. That would show processes holding
|
||||||
|
up the progress very clearly.
|
||||||
This was suggested by Federico.
|
This was suggested by Federico.
|
||||||
|
|
||||||
- Need to report stat() as well. (Where do inode data end up? In the buffer-cache?)
|
- Need to report stat() as well. (Where do inode data end up? In the
|
||||||
Also open() may cause disk reads (seeks).
|
buffer-cache?) Also open() may cause disk reads (seeks).
|
||||||
|
|
||||||
- To generate the timeline we need to know when a disk request is issued and when it
|
- To generate the timeline we need to know when a disk request is
|
||||||
is completed. This way we can assign blame to all applications that have issued a
|
issued and when it is completed. This way we can assign blame to all
|
||||||
disk request at a given point in time.
|
applications that have issued a disk request at a given point in time.
|
||||||
|
|
||||||
The disk timeline should probably vary in intensity with the number of outstanding
|
|
||||||
disk requests.
|
|
||||||
|
|
||||||
|
The disk timeline should probably vary in intensity with the number
|
||||||
|
of outstanding disk requests.
|
||||||
|
|
||||||
|
|
||||||
-=-=-=-=-=-=-=-=-=-=-=-=-=-=- ALREADY DONE -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
|
-=-=-=-=-=-=-=-=-=-=-=-=-=-=- ALREADY DONE -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
|
||||||
|
|
||||||
|
* Various:
|
||||||
|
- decorate_node should be done lazily
|
||||||
|
- Find out why we sometimes get completely ridicoulous stacktraces,
|
||||||
|
where main seems to be called from within Xlib etc. This happens
|
||||||
|
even after restarting everything.
|
||||||
|
- It looks like the stackstash-reorg code confuses "main" from
|
||||||
|
unrelated processes. - currently it looks like if multiple
|
||||||
|
"main"s are present, only one gets listed in the object list.
|
||||||
|
Seems to mostly happen when multiple processes are
|
||||||
|
involved.
|
||||||
|
- Numbers in caller view are completely screwed up.
|
||||||
|
- It looks like it sometimes gets confused with similar but different
|
||||||
|
processes: Something like:
|
||||||
|
process a spends 80% in foo() called from bar()
|
||||||
|
process b spends 1% in foo() called from baz()
|
||||||
|
we get reports of baz() using > 80% of the time.
|
||||||
|
Or something.
|
||||||
|
|
||||||
|
* commandline version should check that the output file is writable
|
||||||
|
before starting the profiling.
|
||||||
|
|
||||||
* See if we can reproduce the problem where libraries didn't get correctly
|
* See if we can reproduce the problem where libraries didn't get correctly
|
||||||
reloaded after new versions were installed.
|
reloaded after new versions were installed.
|
||||||
This is just the (deleted) problem. Turns out that the kernel
|
This is just the (deleted) problem. Turns out that the kernel
|
||||||
|
|||||||
Reference in New Issue
Block a user