Update TODO

svn path=/trunk/; revision=367
2026-02-12 16:10:54 +00:00 · 2007-08-11 23:08:58 +00:00
parent ef23082882
commit 8af6c38541
1 changed files with 134 additions and 60 deletions
--- a/194
+++ b/194
@ -54,8 +54,15 @@ Before 1.2:
 		- copying kernel stack to userspace
 			- it's always 4096 bytes these days
 		- heuristically determine functions based on address
 			- callbacks on the stack can be identified
 			  by having an offset of 0.
 			- even so there is a lot of false positives.
 		- is eh_frame usually loaded into memory during normal
-		  operation
+		  operation? It is mapped, but probably not paged in,
 		  so we will be taking a few major page faults when we
 		  first profile something.
 			Unless of course, we store the entire stack in
 		  the stackstash. This may use way too much memory though.
 	- vdso
 		- assume its the same across processes, just look at
@ -76,7 +83,8 @@ Before 1.2:
 		- do heuristic stackwalk in kernel
 		- do heuristic stackwalk in userland
-* "Expand all" is horrendously slow because update screenshot gets called 
+
 * "Expand all" is horrendously slow because update_screenshot gets called 
  for every "expanded" signal. In fact even normal expanding is really
  slow. It's probably hopeless to get decent performance out of GtkTreeView,
  so we will have to store a list of expanded objects and keep that uptodate
@ -87,7 +95,7 @@ Before 1.2:
  Or try to parse the machine code. Positions that are called are likely 
  to be functions.
-* Give more sensible 'error messages'. Ie., if you get permission denied for
+* Give more sensible 'error messages'. Eg., if you get permission denied for
  a file, put "Permission denied" instead of "No map"
 * crc32 checking probably doesn't belong in elfparser.c
@ -97,6 +105,7 @@ Before 1.2:
 	- it's inconvenient that you have to pass in both a parser _and_
 	  a record. The record should just contain a pointer to the parser.
 	  On the other hand, the result does depend on the parser->offset.
 	  So it's a bit confusing that it's not passed in.
 	- the bin_parser_seek_record (..., 1); idiom is a little dubious
@ -111,9 +120,6 @@ Before 1.2:
 * Make it compilable against a non-running kernel.
 * commandline version should check that the output file is writable
  before starting the profiling.
 * Maybe report idle time? Although this would come for free with the
  timelines.
@ -135,22 +141,6 @@ Before 1.2:
  just another gtk+ bug.
 - Fix bugs/performance issues:
 	- decorate_node should be done lazily
 	- Find out why we sometimes get completely ridicoulous stacktraces,
 	  where main seems to be called from within Xlib etc. This happens
 	  even after restarting everything.
 	- It looks like the stackstash-reorg code confuses "main" from
 	  unrelated processes. - currently it looks like if multiple
 	  "main"s are present, only one gets listed in the object list.
 		Seems to mostly happen when multiple processes are 
 		involved.
 	- Numbers in caller view are completely screwed up.
 	- It looks like it sometimes gets confused with similar but different
 	  processes: Something like:
 		process a spends 80% in foo() called from bar()
 		process b spends 1% in foo() called from baz()
 	  we get reports of baz() using > 80% of the time.
 	  Or something.
 	- add_trace_to_tree() might be a little slow when dealing with deeply
 	  recursive profiles. Hypothesis: seen_nodes can grow large, and the
 	  algorithm is O(n^2) in the length of the trace.
@ -290,6 +280,67 @@ Before 1.2:
 		  would only need to store a list of hashcodes that we
 		  have generated previously.
 	- One problem with doing DWARF walking is that the debug code
 	  will have to be faulted in. This can be a substantial amount
 	  of disk access which is undesirable to have during a
 	  profiling run. Even if we only have to fault in the
 	  .eh_frame_hdr section, that's still 18 pages for gtk+. The 
 	  .eh_frame section for gtk+ is 72 pages.
 	  A possibility may be to consider two stacktraces identical if
 	  the only differing values are *outside* the text segments.
 	  This may work since stack frames tend to be the same size. 
 	  It is then sufficient in user space to only store one
 	  representative for each set of considered-identical stack
 	  traces.
 	  User space storage: Use the stackstash tree. When a new trace
 	  is added, just skip over nodes that differ, but where none of
 	  them points to text segments. Two possibilities then:
 	        - when two traces are determined to differ, store them
 	          in completely separate trees. This ensures that we
 	          will never run the dwarf algorithm on an invalid
 	          stack trace, but also means that we won't get shared
 	          prefixes for stacktraces.
 		- when two traces are determined to differ, branch off
 		  as currently. This will share more data, but the
 		  dwarf algorithm could be run on invalid traces. It
 		  may work in practice though if the compiler
 		  generally uses fixed stack frames.
 		  A twist on is to mark the complete stack traces as
 		  "complete". Then after running the DWARF algorithm,
 		  the generated stack trace can be saved with it. This
 		  way incomplete stack traces branching off a complete
 		  one can be completed using the DWARF information for
 		  the shared part.
 * How to get the user stack:
   /* In principle we should use get_task_mm() but
    * that will use task_lock() leading to deadlock
    * if somebody already has the lock
    */
   if (spin_is_locked (&current->alloc_lock))
           printk ("alreadylocked\n");
   {
           struct mm_struct *mm = current->mm;
           if (mm)
           {
                   printk (KERN_ALERT "stack size: %d (%d)\n",
                           mm->start_stack - regs->REG_STACK_PTR,
                           current->pid);
                   stacksize = mm->start_stack - regs->REG_STACK_PTR;
           }
           else
                   stacksize = 1;
   }
 * If interrupt happens in kernel mode, send both
  kernel stack and user space stack, have userspace stitch them
  together. well, they could be stitched together in the kernel. 
@ -358,7 +409,7 @@ http://www.linuxbase.org/spec/booksets/LSB-Embedded/LSB-Embedded/ehframe.html
  look in dwarf2-frame.[ch] in the gdb distribution. 
  Also look at bozo-profiler
-	http://www-sop.inria.fr/dream/personnel/Mathieu.Lacage/bozo-profiler/bozo-profiler-1.1.tar.gz
+	http://cutebugs.net/bozo-profiler/
  which has an elf32 parser/debugger
 - Make busy cursors more intelligent
@ -499,12 +550,8 @@ Later:
 - Find out how to hack around gtk+ bug causing multiple double clicks 
  to get eaten.
- Consider what it would take to take stacktraces of other languages
+- Consider what it would take to take stacktraces of other languages such
-
+  as perl, python, java, ruby, or bash. Or scheme.
 	- perl, 
 	- python
 	- java
 	- bash
  Possible solution is for the script binaries to have a function
  called something like 
@ -516,10 +563,14 @@ Later:
  This function would behave essentially like a signal handler: couldn't
  call malloc(), couldn't call printf(), etc. 
-  Note thought that scripting languages will generally have a stack with 
+  Note though that scripting languages will generally have a stack with 
  both script-binary-stack, script stack, and library stacks. We wouldn't
  want scripts to need to parse dwarf. Also if we do that thing with 
-  sending the entire stack to userspace, things will be further complicated.
+  sending the entire stack to userspace, things will be further
  complicated.
  Also note languages like scheme that uses heap allocated activation
  records.
 - Consider this usecase:
 	Someone is considering replacing malloc()/free() with a freelist
@ -615,50 +666,73 @@ Later:
 	  it asynchronously.
 	  Visualization: A timeline with alternating CPU/disk activity. 
-	- What function is doing all the synchronous reading, and what files/offsets is
+	- What function is doing all the synchronous reading, and what
-	  it reading. Visualization: lots of reads across different files out of one 
+	  files/offsets is it reading. Visualization: lots of reads across
-	  function
+	  different files out of one function
-	- A piece of the program is doing disk I/O. We can drop that entire piece of
+	- A piece of the program is doing disk I/O. We can drop that
-  	  code. Sysprof visualization is ok, although seeing the files accessed is useful
+ 	  entire piece of code. Sysprof visualization is ok, although seeing
- 	  so that we can tell if those files are not just going to be used in
+	  the files accessed is useful so that we can tell if those files are
-	  other places. (Gnumeric plugin_init()).
+	  not just going to be used in other places. (Gnumeric plugin_init()).
-	- A function is reading a file synchronously, but there is other (CPU/disk) stuff
+	- A function is reading a file synchronously, but there is other
-	  that could be done at the same time. Visualization: A piece of the timeline 
+	  (CPU/disk) stuff that could be done at the same time. Visualization:
-	  is diskbound with little or no CPU used.
+	  A piece of the timeline is diskbound with little or no CPU used.
-	- Want to improve code locality of library or binary. Visualization: no GUI, just
+	- Want to improve code locality of library or binary. Visualization:
-	  produce a list of functions that should be put first in the file. Then run the
+	  no GUI, just produce a list of functions that should be put first in
-	  program again until the list converges. (Valgrind may be more useful here).
+	  the file. Then run the program again until the list converges.
 	  (Valgrind may be more useful here).
-	- Nautilus reads a ton of files, icons + all the files in the homedirectory.
+	- Nautilus reads a ton of files, icons + all the files in the
-	  Normal sysprof visualization is probably useful enough.
+	  homedirectory. Normal sysprof visualization is probably useful
 	  enough.
 	- Profiling a login session. 
-	- Many applications are running at the same time, doing IPC. It would be useful
+	- Many applications are running at the same time, doing IPC. It would
-	  if we could figure out what other things a given process is waiting on. Eg., in
+	  be useful if we could figure out what other things a given process
-	  poll, find out what processes have the other ends of the fd's open.
+	  is waiting on. Eg., in poll, find out what processes have the other
-		Visualization: multiple lines on a graph. Lines join up where one process
+	  ends of the fd's open.
-		is blocking on another. That would show processes holding up the progress
+		Visualization: multiple lines on a graph. Lines join up where
-		very clearly.
+	  one process is blocking on another. That would show processes holding
 	  up the progress very clearly.
 	  This was suggested by Federico.
-    - Need to report stat() as well. (Where do inode data end up? In the buffer-cache?)
+    - Need to report stat() as well. (Where do inode data end up? In the
-      Also open() may cause disk reads (seeks).
+      buffer-cache?) Also open() may cause disk reads (seeks).
-    - To generate the timeline we need to know when a disk request is issued and when it
+    - To generate the timeline we need to know when a disk request is
-      is completed. This way we can assign blame to all applications that have issued a
+      issued and when it is completed. This way we can assign blame to all
-      disk request at a given point in time. 
+      applications that have issued a disk request at a given point in time. 
 	The disk timeline should probably vary in intensity with the number of outstanding
 	disk requests.
      The disk timeline should probably vary in intensity with the number
      of outstanding disk requests.
 -=-=-=-=-=-=-=-=-=-=-=-=-=-=- ALREADY DONE -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
 * Various:
 	- decorate_node should be done lazily
 	- Find out why we sometimes get completely ridicoulous stacktraces,
 	  where main seems to be called from within Xlib etc. This happens
 	  even after restarting everything.
 	- It looks like the stackstash-reorg code confuses "main" from
 	  unrelated processes. - currently it looks like if multiple
 	  "main"s are present, only one gets listed in the object list.
 		Seems to mostly happen when multiple processes are 
 		involved.
 	- Numbers in caller view are completely screwed up.
 	- It looks like it sometimes gets confused with similar but different
 	  processes: Something like:
 		process a spends 80% in foo() called from bar()
 		process b spends 1% in foo() called from baz()
 	  we get reports of baz() using > 80% of the time.
 	  Or something.
 * commandline version should check that the output file is writable
  before starting the profiling.
 * See if we can reproduce the problem where libraries didn't get correctly
  reloaded after new versions were installed.
 	This is just the (deleted) problem. Turns out that the kernel