Some notes on the implementation of KCachegrind
================================================




The "recursive" Problem
------------------------

The following is not up-to-date. It's kept as a source of ideas.
The first versions of KCachegrind simple used the cost values from
the profiles without any calculatin.

However, for recursive functions, we need to search for cycles and
handle them specifically...



...
I have thought much about my recursive problem, and I have a solution for it.
Because it will need modifictions in my call-tree patch (and trace format), I
would like to ask you to verify that the following is correct...
... and sorry for this longhish mail. I hope this is understandable.

A somewhat extended example of my last mail, costs of calls are inside of the arrows, self costs are in parenthesis after function names:
Functions A,B,C; A is calling B and C, B is calling A rec., this calls C again.

Line1: =50=> A (10)
    2:       =10=> C (10)
    3:       =30=> B (10)
    4:             =20=> A(10)
    5:                   =10=> C (10)


Small definitions for arbitrary functions A,B:

S_n(A): Self cost of A for all non-recursive calls to A
S_r(A): Self cost of A for all recursive calls to A
S(A) = S_n(A) + S_r(A): Self cost of A
C_n(A): Cumulative cost of A for all non-recursive calls to A
C_r(A): Cumulative cost of A for all 1st-level recursive calls to A
C_nn(A,B): All call costs from non-recursive A to non-rec. B
C_nr(A,B): All call costs from non-recursive A to 1st-level recursive B
C_rn(A,B): All call costs from recursive A to non-rec. B
C_rr(A,B): All call costs from recursive A to 1st-level recursive B
C_xn(A,B) = C_nn(A,B) + C_rn(A,B): Call costs from A to non-rec. B
C_xr(A,B) = C_nr(A,B) + C_rr(A,B): Call costs from A to 1st-level rec. B

At the moment, in Cachegrind-CT is logged S(A), C_xn(A,B), and C_xr(A,B).


1) What's the cumulative cost of A to be shown (this should be C_n(A)) ?

In the example above, this is 50 (the call B=>A is to a recursive A).

* This is the sum of all calls to a non-recursive A, i.e.

  [1]  C_n(A) = sum over all B: C_xn(B,A) 

It's the definition for C_n(A) above.

The same as I always did, seems quite correct.
Note that the first few functions when switching to CPU emulation have
no callers: ONLY here, I give them all the name "(startup)", and sum up its self cost + cost of all calls from it. Important: "(startup)" is never called recursive inside of the program!

* This is NOT the sum of self cost of A + calls from A to non-recursive  functions, i.e.

  [2]  C_n(A) != S(A) + sum over all B: C_xn(A,B)

as stated in the last mail.
The example above shows:
  C_n(A) != S(A) + C_xn(A,B) + C_xn(A,C) = 20 + 30 + 20 = 70.

And that's the bug obvious in KCachegrind V0.2, as I stop drawing for recursive calls: it correct C_n(A) to be 70, and this leaves incorrect free spaces in the call graph :-(

It's wrong because:
- some self cost of A, more exactly S_r(A), is added twice because its in the call costs C_xn(A,B) included if B calls A recursive.
- C_xn(A,B) also includes call costs from recursive A, and thus can be added twice for a call chain: A=>C=>A=>B : Here, C_xn(A,C) includes C_xn(A,B).

* But another formula for C_n(A) is:

  [3]  C_n(A) = S_n(A) + sum over all B: C_nn(A,B)

Example above:
 C_n(A) = S_n(A) + C_nn(A,B) + C_nn(A,C) = 10 + 30 + 10 = 50.


Status for KCachegrind V0.2: S(A) and C(A) are correct.


2) What cost should be shown in the callee list of A in KCachegrind?

At the moment, I show for all B both C_xn(A,B) and C_xr(A,B), the later with
a recursive icon.
This is confusing because C_xn(A,C) includes C_xn(A,B) for A=>C=>A=>B.

CHANGE: Distinguish calls from rec/nonrec. functions, i.e.
log C_nn(A,B), C_rn(A,B), C_nr(A,B), C_rr(a,B).
This is the modification I need in Cachegrind-CT (i.e. 4 different cost sums per call).

Instead of one "recursion icon", I would add icons for "=>r", "r=>", "r=>r".
Thus, the user can check that sum of call costs without icon is smaller than
the cumulative cost.

With [3] and [1], I can calculate S_n(A) without another Cachegrind modification:

  [4]  S_n(A) = C_n(A) - sum over all B: C_nn(A,B)
              = sum over all C: C_xn(C,A) - sum over all B: C_nn(A,B)

It makes sense to show the partitioning of the self cost in the Info tab.



3) How should the call graph be drawn for the above function?

In KCachegrind V0.1, I draw recursions:
A gets a size of 50, but uses 70 for partitioning inside, because self cost of A is 20 (WRONG).
Inside A, C gets 20, B gets 30.
Inside B, A gets 20 and is drawn recursively the same again (WRONG!).

In KCachegrind V0.2, I stop at recursions:
Again, A gets a size of 50, but uses 70 for partitioning inside, because self cost of A is 20 (WRONG).
Inside A, C gets 20, B gets 30, and we stop at B (leaves WRONG empty space).

My proposal for KCachegrind V0.3:
Still stop at recursion, and include recursive cost only once.
A gets 50 (from the above nesting level and inside!): for empty space of A, S(A) = 20 is used (as always), B now gets 10, C gets 20 (CORRECT).
The rule is: aside from each outest drawing of X in the call graph (could be multiple), cost of calls to recursive functions C_xr(X,Y) is subtracted from C_n(X).
Thus, A is NOT shown inside B, because all self cost of A is already in the empty space of the outest A.
