From 8af6c38541909456aadf07f968951adca6f61d25 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?S=C3=B8ren=20Sandmann=20Pedersen?= Date: Sat, 11 Aug 2007 23:08:58 +0000 Subject: [PATCH] Update TODO svn path=/trunk/; revision=367 --- TODO | 194 +++++++++++++++++++++++++++++++++++++++++------------------ 1 file changed, 134 insertions(+), 60 deletions(-) diff --git a/TODO b/TODO index 112012fc..e8f93c22 100644 --- a/TODO +++ b/TODO @@ -54,8 +54,15 @@ Before 1.2: - copying kernel stack to userspace - it's always 4096 bytes these days - heuristically determine functions based on address + - callbacks on the stack can be identified + by having an offset of 0. + - even so there is a lot of false positives. - is eh_frame usually loaded into memory during normal - operation + operation? It is mapped, but probably not paged in, + so we will be taking a few major page faults when we + first profile something. + Unless of course, we store the entire stack in + the stackstash. This may use way too much memory though. - vdso - assume its the same across processes, just look at @@ -76,7 +83,8 @@ Before 1.2: - do heuristic stackwalk in kernel - do heuristic stackwalk in userland -* "Expand all" is horrendously slow because update screenshot gets called + +* "Expand all" is horrendously slow because update_screenshot gets called for every "expanded" signal. In fact even normal expanding is really slow. It's probably hopeless to get decent performance out of GtkTreeView, so we will have to store a list of expanded objects and keep that uptodate @@ -87,7 +95,7 @@ Before 1.2: Or try to parse the machine code. Positions that are called are likely to be functions. -* Give more sensible 'error messages'. Ie., if you get permission denied for +* Give more sensible 'error messages'. Eg., if you get permission denied for a file, put "Permission denied" instead of "No map" * crc32 checking probably doesn't belong in elfparser.c @@ -97,6 +105,7 @@ Before 1.2: - it's inconvenient that you have to pass in both a parser _and_ a record. The record should just contain a pointer to the parser. On the other hand, the result does depend on the parser->offset. + So it's a bit confusing that it's not passed in. - the bin_parser_seek_record (..., 1); idiom is a little dubious @@ -111,9 +120,6 @@ Before 1.2: * Make it compilable against a non-running kernel. -* commandline version should check that the output file is writable - before starting the profiling. - * Maybe report idle time? Although this would come for free with the timelines. @@ -135,22 +141,6 @@ Before 1.2: just another gtk+ bug. - Fix bugs/performance issues: - - decorate_node should be done lazily - - Find out why we sometimes get completely ridicoulous stacktraces, - where main seems to be called from within Xlib etc. This happens - even after restarting everything. - - It looks like the stackstash-reorg code confuses "main" from - unrelated processes. - currently it looks like if multiple - "main"s are present, only one gets listed in the object list. - Seems to mostly happen when multiple processes are - involved. - - Numbers in caller view are completely screwed up. - - It looks like it sometimes gets confused with similar but different - processes: Something like: - process a spends 80% in foo() called from bar() - process b spends 1% in foo() called from baz() - we get reports of baz() using > 80% of the time. - Or something. - add_trace_to_tree() might be a little slow when dealing with deeply recursive profiles. Hypothesis: seen_nodes can grow large, and the algorithm is O(n^2) in the length of the trace. @@ -290,6 +280,67 @@ Before 1.2: would only need to store a list of hashcodes that we have generated previously. + - One problem with doing DWARF walking is that the debug code + will have to be faulted in. This can be a substantial amount + of disk access which is undesirable to have during a + profiling run. Even if we only have to fault in the + .eh_frame_hdr section, that's still 18 pages for gtk+. The + .eh_frame section for gtk+ is 72 pages. + + A possibility may be to consider two stacktraces identical if + the only differing values are *outside* the text segments. + This may work since stack frames tend to be the same size. + + It is then sufficient in user space to only store one + representative for each set of considered-identical stack + traces. + + User space storage: Use the stackstash tree. When a new trace + is added, just skip over nodes that differ, but where none of + them points to text segments. Two possibilities then: + + - when two traces are determined to differ, store them + in completely separate trees. This ensures that we + will never run the dwarf algorithm on an invalid + stack trace, but also means that we won't get shared + prefixes for stacktraces. + + - when two traces are determined to differ, branch off + as currently. This will share more data, but the + dwarf algorithm could be run on invalid traces. It + may work in practice though if the compiler + generally uses fixed stack frames. + + A twist on is to mark the complete stack traces as + "complete". Then after running the DWARF algorithm, + the generated stack trace can be saved with it. This + way incomplete stack traces branching off a complete + one can be completed using the DWARF information for + the shared part. + + +* How to get the user stack: + + /* In principle we should use get_task_mm() but + * that will use task_lock() leading to deadlock + * if somebody already has the lock + */ + if (spin_is_locked (¤t->alloc_lock)) + printk ("alreadylocked\n"); + { + struct mm_struct *mm = current->mm; + if (mm) + { + printk (KERN_ALERT "stack size: %d (%d)\n", + mm->start_stack - regs->REG_STACK_PTR, + current->pid); + + stacksize = mm->start_stack - regs->REG_STACK_PTR; + } + else + stacksize = 1; + } + * If interrupt happens in kernel mode, send both kernel stack and user space stack, have userspace stitch them together. well, they could be stitched together in the kernel. @@ -358,7 +409,7 @@ http://www.linuxbase.org/spec/booksets/LSB-Embedded/LSB-Embedded/ehframe.html look in dwarf2-frame.[ch] in the gdb distribution. Also look at bozo-profiler - http://www-sop.inria.fr/dream/personnel/Mathieu.Lacage/bozo-profiler/bozo-profiler-1.1.tar.gz + http://cutebugs.net/bozo-profiler/ which has an elf32 parser/debugger - Make busy cursors more intelligent @@ -499,12 +550,8 @@ Later: - Find out how to hack around gtk+ bug causing multiple double clicks to get eaten. -- Consider what it would take to take stacktraces of other languages - - - perl, - - python - - java - - bash +- Consider what it would take to take stacktraces of other languages such + as perl, python, java, ruby, or bash. Or scheme. Possible solution is for the script binaries to have a function called something like @@ -516,10 +563,14 @@ Later: This function would behave essentially like a signal handler: couldn't call malloc(), couldn't call printf(), etc. - Note thought that scripting languages will generally have a stack with + Note though that scripting languages will generally have a stack with both script-binary-stack, script stack, and library stacks. We wouldn't want scripts to need to parse dwarf. Also if we do that thing with - sending the entire stack to userspace, things will be further complicated. + sending the entire stack to userspace, things will be further + complicated. + + Also note languages like scheme that uses heap allocated activation + records. - Consider this usecase: Someone is considering replacing malloc()/free() with a freelist @@ -615,50 +666,73 @@ Later: it asynchronously. Visualization: A timeline with alternating CPU/disk activity. - - What function is doing all the synchronous reading, and what files/offsets is - it reading. Visualization: lots of reads across different files out of one - function + - What function is doing all the synchronous reading, and what + files/offsets is it reading. Visualization: lots of reads across + different files out of one function - - A piece of the program is doing disk I/O. We can drop that entire piece of - code. Sysprof visualization is ok, although seeing the files accessed is useful - so that we can tell if those files are not just going to be used in - other places. (Gnumeric plugin_init()). + - A piece of the program is doing disk I/O. We can drop that + entire piece of code. Sysprof visualization is ok, although seeing + the files accessed is useful so that we can tell if those files are + not just going to be used in other places. (Gnumeric plugin_init()). - - A function is reading a file synchronously, but there is other (CPU/disk) stuff - that could be done at the same time. Visualization: A piece of the timeline - is diskbound with little or no CPU used. + - A function is reading a file synchronously, but there is other + (CPU/disk) stuff that could be done at the same time. Visualization: + A piece of the timeline is diskbound with little or no CPU used. - - Want to improve code locality of library or binary. Visualization: no GUI, just - produce a list of functions that should be put first in the file. Then run the - program again until the list converges. (Valgrind may be more useful here). + - Want to improve code locality of library or binary. Visualization: + no GUI, just produce a list of functions that should be put first in + the file. Then run the program again until the list converges. + (Valgrind may be more useful here). - - Nautilus reads a ton of files, icons + all the files in the homedirectory. - Normal sysprof visualization is probably useful enough. + - Nautilus reads a ton of files, icons + all the files in the + homedirectory. Normal sysprof visualization is probably useful + enough. - Profiling a login session. - - Many applications are running at the same time, doing IPC. It would be useful - if we could figure out what other things a given process is waiting on. Eg., in - poll, find out what processes have the other ends of the fd's open. - Visualization: multiple lines on a graph. Lines join up where one process - is blocking on another. That would show processes holding up the progress - very clearly. + - Many applications are running at the same time, doing IPC. It would + be useful if we could figure out what other things a given process + is waiting on. Eg., in poll, find out what processes have the other + ends of the fd's open. + Visualization: multiple lines on a graph. Lines join up where + one process is blocking on another. That would show processes holding + up the progress very clearly. This was suggested by Federico. - - Need to report stat() as well. (Where do inode data end up? In the buffer-cache?) - Also open() may cause disk reads (seeks). + - Need to report stat() as well. (Where do inode data end up? In the + buffer-cache?) Also open() may cause disk reads (seeks). - - To generate the timeline we need to know when a disk request is issued and when it - is completed. This way we can assign blame to all applications that have issued a - disk request at a given point in time. - - The disk timeline should probably vary in intensity with the number of outstanding - disk requests. + - To generate the timeline we need to know when a disk request is + issued and when it is completed. This way we can assign blame to all + applications that have issued a disk request at a given point in time. + The disk timeline should probably vary in intensity with the number + of outstanding disk requests. -=-=-=-=-=-=-=-=-=-=-=-=-=-=- ALREADY DONE -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- +* Various: + - decorate_node should be done lazily + - Find out why we sometimes get completely ridicoulous stacktraces, + where main seems to be called from within Xlib etc. This happens + even after restarting everything. + - It looks like the stackstash-reorg code confuses "main" from + unrelated processes. - currently it looks like if multiple + "main"s are present, only one gets listed in the object list. + Seems to mostly happen when multiple processes are + involved. + - Numbers in caller view are completely screwed up. + - It looks like it sometimes gets confused with similar but different + processes: Something like: + process a spends 80% in foo() called from bar() + process b spends 1% in foo() called from baz() + we get reports of baz() using > 80% of the time. + Or something. + +* commandline version should check that the output file is writable + before starting the profiling. + * See if we can reproduce the problem where libraries didn't get correctly reloaded after new versions were installed. This is just the (deleted) problem. Turns out that the kernel