Before 1.0: * Correctness - grep FIXME - When the module is unloaded, kill all processes blocking in read - or block unloading until all processes have exited * Interface - If the current profile has a name, display it in the title bar - Sould just install the kernel module if it running as root, pop up a dialog if not. Note we must be able to start without module now, since it is useful to just load profiles from disk. - Is there a portable way of asking for the root password? - Install a small suid program that only inserts the module? (instant security hole ..) - hook up menu items view/start etc (or possibly get rid of them or move them) - Consider expanding a few more levels of a new descendants tree * Build system - Need to make "make install" work (how do you know where to install kernel modules?) - in /lib/modules/`uname -r`/kernel/drivers/ - need to run depmod as root after that - Then modprobe run as root should correctly find it. - Find out what distributions it actually works on (ask for sucess/failure-stories in 0.9.x releases) - auto*? - .desktop file - translation should be hooked up Before 1.2: - Make busy cursors more intelligent - when you click something in the main list and we don't respond within 50ms (or perhaps when we expect to not be able to do so (can we know the size in advance?)) - instead of as we do now: set the busy cursor unconditionally - Reorganise stackstash and profile - stackstash should just take traces of addresses without knowing anything about what those addresses mean - stacktraces should then begin with a process - profile should take traces of pointers to presentation objects without knowing anything about these presentation objects. - Creating a profile is then - For each stack node, compute a presentation object (probably need to export opaque stacknode objects with set/get_user_data) - Send each stack trace to the profile module, along with presentation objects - Charge 'self' properly to processes that don't get any stack trace at all (probably we get that for free with stackstash reorganisation) - support more than one reader of the samples properly - Don't generate them if noone cares - When not profiling, sysprof shouldn't care - Add ability to show more than one function at a time. Algorithm: Find all relevant nodes; For each relevant node best_so_far = relevant node walk towards root if node is relevant, best_so_far = relevant add best_so_far to interesting for each interesting list leaves for each leaf add trace to tree (leaf, interesting) - Consider adding KDE-style nested callgraph view - Add support for line numbers within functions - consider caching [filename => bin_file] - Have kernel module report the file the address was found in Should avoid a lot of potential broken/raciness with dlopen etc. - Make things faster - Can I get it to profile itself? - speedprof seems to report that lots of time is spent in stack_stash_foreach() and also in generate_key() - add an 'everything' object. It is really needed for a lot of things - Non-GUI version that can save in a format the GUI can understand. Could be used for profiling startup etc. Would preferably be able to dump the data to a network socket. Should be able to react to eg. SIGUSR1 by dumping the data. - Figure out how Google's pprof script works. Then add real call graph drawing. (google's script is really simple; uses dot from graphviz). - hide internal stuff in ProfileDescendant - possibly add dependency on glib 2.8 if it is released at that point. (g_file_replace()) Later: - Find out how to hack around gtk+ bug causing multiple double clicks to get eaten. - Consider what it would take to take stacktraces of other languages - perl, - python - java - bash Possible solution is for the script binaries to have a function called something like __sysprof__generate_stacktrace (char **functions, int n_functions); that the sysprof kernel module could call (and make return to the kernel). This function would behave essentially like a signal handler: couldn't call malloc(), couldn't call printf(), etc. - Consider this usecase: Someone is considering replacing malloc()/free() with a freelist for a certain data structure. All use of this data structure is confined to one function, foo(). It is now interesting to know how much time that particular function spends on malloc() and free() combined. Possible UI: - Select foo(), - find an instance of malloc() - shift-click on it, - all traces with malloc are removed - a new item "..." appears immeidately below foo() - malloc is added below "..." - same for free - at this point, the desired data can be read at comulative for "..." Actually, with this UI, you could potentially get rid of the caller list: Just present the call tree under an root, and use ... to single out the stuff you are interested in. Maybe also get rid of 'callers' by having a new "show details" dialog or something. - figure out a way to deal with both disk and CPU. Need to make sure that things that are UNINTERRUPTIBLE while there are RUNNING tasks are not considered bad. Not entirely clear that the sysprof visualization is right for disk. Maybe assign a size of n to traces with n *unique* disk access (ie. disk accesses that are not required by any other stack trace). Or assign values to nodes in the calltree based on how many diskaccesses are contained in that tree. Ie., if I get rid of this branch, how many disk accesses would that get rid of. Or turn it around and look at individual disk accesses and see what it would take to get rid of it. Ie., a number of traces are associated with any given diskaccess. Just show those. Or for a given tree with contained disk accesses, figure out what *other* traces has the same diskaccesses. Or visualize a set of squares with a color that is more saturated depending on the number of unique stack traces that access it. Then look for the lightly saturated ones. The input to the profiler would basically be (stack trace, badness, cookie) For CPU: badness=10ms, cookie= For Disk: badness=, cookie= For Memory: badness=, cookie= Cookies are use to figure out whether an access is really the same, ie., for two identical cookies, the size is still just one, however Memory is different from disk because you can't reasonably assume that stuff that has been read will stay in cache (for short profile runs you can assume that with disk, but not for long ones). DONE: - give profiles on the command line - Hopefully the oops at the end of this file is gone now that we use mmput/get_task_mm. For older kernels those symbols are not exported though, so we will probably have to either use the old way (directly accessing the mm's) or just not support those kernels. - Need an icon - hook up about box - Add busy cursors, - when you hit "Profile" - when you click something in the main list and we don't respond within 50ms (or perhaps when we expect to not be able to do so (can we know the size in advance?)) - kernel module should put process to sleep before sampling. Should get us more accurate data - Make sure samples label shows correct nunber after Open - Move "samples" label to the toolbar, then get rid of statusbar. - crashes when you ctrl-click the selected item in the top left pane ssp: looks like it doesn't handle the none-selected case - loading and saving - consider making ProfileObject more of an object. - make an "everything" object maybe not necessary -- there is a libc_ctors_something() - make presentation strings nicer four different kinds of symbols: a) I know exactly what this is b) I know in what library this is c) I know only the process that did this d) I know the name, but there is another similarly named one (a) is easy, (b) should be (c) should just become "???" (d) not sure - processes with a cmdline of "" should get a [pid = %d] instead. - make an "n samples" label Process stuff: - make threads be reported together (simply report pids with similar command lines together) (note: it seems separating by pid is way too slow (uses too much memory), so it has to be like this) - stack stash should allow different pids to refer to the same root (ie. there is no need to create a new tree for each pid) The *leaves* should contain the pid, not the root. You could even imagine a set of processes, each referring to a set of leaves. - when we see a new pid, immediately capture its mappings Road map: - new object Process - hashable by pointer - contains list of maps - process_from_pid (pid_t pid, gboolean join_threads) - new processes are gets their maps immediately - resulting pointer must be unref()ed, but it is possible it just points to an existing process - processes with identical cmdlines are taken together - method lookup_symbol() - method get_name() - ref/unref - StackStash stores map from process to leaves - Profile is called with processes It is possible that we simply need a better concept of Process: If two pids have the same command line, consider them the same, period. This should save considerable amounts of memory. The assumptions: "No pids are reused during a profiling run" "Two processes with the same command line have the same mappings" are somewhat dubious, but probably necessary. (More complex kernel: have the module report - new pid arrived (along with mappings) - mapping changed for pid - stacktrace) - make symbols in executable work - the hashtables used in profile.c should not accept NULL as the key - make callers work - autoexpand descendant tree - make double clicks work - fix leaks - Find out what happened here: Apr 11 15:42:08 great-sage-equal-to-heaven kernel: Unable to handle kernel NULL pointer dereference at virtual address 000001b8 Apr 11 15:42:08 great-sage-equal-to-heaven kernel: printing eip: Apr 11 15:42:08 great-sage-equal-to-heaven kernel: c017342c Apr 11 15:42:08 great-sage-equal-to-heaven kernel: *pde = 00000000 Apr 11 15:42:08 great-sage-equal-to-heaven kernel: Oops: 0000 [#1] Apr 11 15:42:08 great-sage-equal-to-heaven kernel: Modules linked in: sysprof_module(U) i2c_algo_bit md5 ipv6 parport_pc lp parport autofs4 sunrpc video button battery ac ohci1394 ieee1394 uhci_hcd ehci_hcd hw_random tpm_atmel tpm i2c_i801 i2c_core snd_intel8x0 snd_ac97_codec snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd soundcore snd_page_alloc e1000 floppy dm_snapshot dm_zero dm_mirror ext3 jbd dm_mod ata_piix libata sd_mod scsi_mod Apr 11 15:42:08 great-sage-equal-to-heaven kernel: CPU: 0 Apr 11 15:42:08 great-sage-equal-to-heaven kernel: EIP: 0060:[] Not tainted VLI Apr 11 15:42:08 great-sage-equal-to-heaven kernel: EFLAGS: 00010287 (2.6.11-1.1225_FC4) Apr 11 15:42:08 great-sage-equal-to-heaven kernel: EIP is at grab_swap_token+0x35/0x21f Apr 11 15:42:08 great-sage-equal-to-heaven kernel: eax: 0bd48023 ebx: d831d028 ecx: 00000282 edx: 00000000 Apr 11 15:42:08 great-sage-equal-to-heaven kernel: esi: c1b72934 edi: c1045820 ebp: c1b703f0 esp: c18dbdd8 Apr 11 15:42:08 great-sage-equal-to-heaven kernel: ds: 007b es: 007b ss: 0068 Apr 11 15:42:08 great-sage-equal-to-heaven kernel: Process events/0 (pid: 3, threadinfo=c18db000 task=f7e62000) Apr 11 15:42:09 great-sage-equal-to-heaven kernel: Stack: 000011a8 00000000 000011a8 c1b703f0 c0151731 c016f58f 000011a8 c1b72934 Apr 11 15:42:09 great-sage-equal-to-heaven kernel: 000011a8 c0166415 c1b72934 c1b72934 c0163768 ee7ccc38 f459fbf8 bf92e7b8 Apr 11 15:42:09 great-sage-equal-to-heaven kernel: f6c6a934 c0103b92 bfdaba18 c1b703f0 00000001 c1b81bfc c1b72934 bfdaba18 Apr 11 15:42:09 great-sage-equal-to-heaven kernel: Call Trace: Apr 11 15:42:09 great-sage-equal-to-heaven kernel: [] find_get_page+0x9/0x24 Apr 11 15:42:09 great-sage-equal-to-heaven kernel: [] read_swap_cache_async+0x32/0x83Apr 11 15:42:09 great-sage-equal-to-heaven kernel: [] do_swap_page+0x262/0x600 Apr 11 15:42:09 great-sage-equal-to-heaven kernel: [] pte_alloc_map+0xc6/0x1e6 Apr 11 15:42:09 great-sage-equal-to-heaven kernel: [] common_interrupt+0x1a/0x20 Apr 11 15:42:09 great-sage-equal-to-heaven kernel: [] handle_mm_fault+0x1da/0x31d Apr 11 15:42:09 great-sage-equal-to-heaven kernel: [] __follow_page+0xa2/0x10d Apr 11 15:42:09 great-sage-equal-to-heaven kernel: [] get_user_pages+0x145/0x6ee Apr 11 15:42:09 great-sage-equal-to-heaven kernel: [] kmap_high+0x52/0x44e Apr 11 15:42:09 great-sage-equal-to-heaven kernel: [] common_interrupt+0x1a/0x20 Apr 11 15:42:09 great-sage-equal-to-heaven kernel: [] x_access_process_vm+0x111/0x1a5 [sysprof_module] Apr 11 15:42:10 great-sage-equal-to-heaven kernel: [] read_user_space+0x19/0x1d [sysprof_module] Apr 11 15:42:10 great-sage-equal-to-heaven kernel: [] read_frame+0x35/0x51 [sysprof_module] Apr 11 15:42:10 great-sage-equal-to-heaven kernel: [] generate_stack_trace+0x8b/0xb4 Apr 11 15:42:10 great-sage-equal-to-heaven kernel: [] do_generate+0x3f/0xa0 [sysprof_module] Apr 11 15:42:10 great-sage-equal-to-heaven kernel: [] worker_thread+0x1b0/0x450 Apr 11 15:42:10 great-sage-equal-to-heaven kernel: [] schedule+0x30d/0x780 Apr 11 15:42:10 great-sage-equal-to-heaven kernel: [] __wake_up_common+0x39/0x59 Apr 11 15:42:10 great-sage-equal-to-heaven kernel: [] do_generate+0x0/0xa0 [sysprof_module] Apr 11 15:42:10 great-sage-equal-to-heaven kernel: [] default_wake_function+0x0/0xc Apr 11 15:42:10 great-sage-equal-to-heaven kernel: [] worker_thread+0x0/0x450 Apr 11 15:42:10 great-sage-equal-to-heaven kernel: [] kthread+0x87/0x8b Apr 11 15:42:10 great-sage-equal-to-heaven kernel: [] kthread+0x0/0x8b Apr 11 15:42:10 great-sage-equal-to-heaven kernel: [] kernel_thread_helper+0x5/0xb Apr 11 15:42:10 great-sage-equal-to-heaven kernel: Code: e0 8b 00 8b 50 74 8b 1d c4 55 3d c0 39 da 0f 84 9b 01 00 00 a1 60 fc 3c c0 39 05 30 ec 48 c0 78 05 83 c4 20 5b c3 a1 60 fc 3c c0 <3b> 82 b8 01 00 00 78 ee 81 3d ac 55 3d c0 3c 4b 24 1d 0f 85 78