Analyzing run-time behavior

A sample session using flow profiling

Earlier we demonstrated how to do locality of reference tuning. No further example will be presented here since the steps presented earlier are straightforward. Important Note: when running experiments for locality of reference, try to touch every part of your code but don't bother repeating actions. Any part of the code that is not touched cannot get tuned!

This example will show how to use flow-profile logging for run-time flow and statistical analysis. Our example will be an editor.

If you intend to control the experiment while it is running (placing marks, turning on and off logging), you will need two connections to the computer (multiple windows or a separate terminal). The first connection is used to run the experiment, the second to control it. It is quite difficult to do this with only one connection.

The editor is compiled with flow-profiling enabled. Let's assume that users have complained that searches and escaping to shell take too long. For the type of analysis we are doing, accurate timestamping is required, so we will run the experiment as root. We start it in the "run" window.

   # fprof -s -CStartState=off,LogPrefix=/tmp/test editor main.c
   [ output ]

We started with logging off, because, in this case, we are not concerned about start-up time.

Let's assume that now the program is initialized. Let's take a look in /tmp in the control window.

   # ls -l /tmp/test*
   -rwxrwxr-x    1 root     other        4096 Apr 21 10:29 /tmp/test.733
   # fprof -i /tmp/test.733
   Log name: /tmp/test.733
   The experiment was run on node 'mine' at Thu Apr 21 10:29:43 1994

The log has not been compiled This is an accurate time stamp log

The program had access to these objects: /src/editor/editor found for reading /usr/X/lib/ found for reading /usr/lib/ found for reading /usr/lib/ found for reading /usr/ccs/lib/ found for reading /usr/lib/ found for reading /usr/X/lib/ found for reading

Everything looks fine. Let's start logging.

   # fprof -o /tmp/test.733

Now, let's go through each of the scenarios that users complained about, marking where each begins/ends.

   [ In the run window ]
   do a search
   [ In the control window ]
   # fprof -m /tmp/test.733

[ In the run window ] escape to shell # fprof -m /tmp/test.733

Our experiment is done. So, let's shut down the program.

   [ In the run window ]
   get out of shell
   close down the editor

It's time to analyze the results. Starting off fprof:

   $ fprof
   fprof: open /tmp/test.733
   fprof: count 0 m1

After opening the log, the count command is used to calculate the statistics from the start of the log to the first mark. Let's look at the per-function statistics.

   fprof: stats
              Function              No. of Calls    Top of Stack     On Stack   
   ------------------------------- --------------- --------------- ---------------
   kbd_key                         7               3.147132        3.147132      
   get_character                   6               0.000375        3.134960      
   find_string                     1               0.000631        1.939222     
There are, of course, many more entries in the table, but they are not relevant to this discussion. Looking from the top down, you might be interested in the first 2 entries, but they are probably not important. Unlike prof, flow profiling shows total time spent in the function (including the time spent in the system). kbd_key is blocked on a read call for most of its time. On the other hand, find_string seems to have no good reason for taking a full 2 seconds. Since it is spending most of its time calling other routines, a call to callees would seem to be in order.

   fprof: callees find_string
              Function              No. of Calls    Top of Stack     On Stack   
   ------------------------------- --------------- --------------- ---------------
   string_compare                  100             .5158641        .5158641      
   shortstring_compare             150             .3616259        .3616259      

What we see here is that 88% of the time spent in find_string was spent with one of these two routines on top of the stack. Now, the problem is to localize. If we want, we can:

Now, let's look at what happens when one attempts to escape to shell. In this code, the routine escape_to_shell is called when one wants to do this. Since
this scenario is shorter, it is simpler and more clear to read the records sequentially.
   # rewind
   # search escape_to_shell
   Calling      escape_to_shell                          at 5.342:234
   # output_until_mark
   Calling      spawn_process                            at 5.342:240
   Calling      spawn_user_command                       at 5.342:800
   Calling      visible_message                          at 5.342:810
   Returning    visible_message                          at 6.321:002
   Returning    escape_to_shell                          at 6.818:589
Again, uninteresting parts of the output have been deleted. The point of this example is that you can see, by simply viewing the list of functions and times, what happens. The function visible_message called with the right parameters pauses to allow the user to see a message on the message line. Maybe the programmer didn't even think that visible_message was used here; maybe the programmer never heard of the function visible_message. Either way, with the sequence of functions in hand, the path to a solution from here should be clear.

Next topic: prof and lprof on lprof
Previous topic: Improving program performance by programmer analysis

© 2004 The SCO Group, Inc. All rights reserved.
UnixWare 7 Release 7.1.4 - 27 April 2004