ApacheTop Readme

ApacheTop watches a logfile generated by Apache (in standard common or
combined logformat, although it doesn't (yet) make use of any of the extra
fields in combined) and generates human-parsable output in realtime.


Several commandline options dictate some of its' behaviour:
-f logfile
	Select which file to watch.

-t logtype [common|combined|atop]
	Only common does anything at the moment; the other two are simply
	mapped as if you'd supplied common.
	Once finished, combined will use the extra Referrer information to
	provide more stats, and atop may well be a custom logformat which is
	easier for ApacheTop to parse.

-H hits | -T time
	These options are mutually exclusive. Specify only one, if any at
	all. They work as follows. ApacheTop maintains a table of
	information internally containing all the relevant information about
	the hits it's seen. This table can only be a finite size, so you
	need to decide how big it's going to be. You have two options.
	You can either:
		Use -H to say "remember <this many> hits"
	or	Use -T to say "remember all hits in <this many> seconds"
	
	The default (at the moment) is to remember hits for 30 seconds.
	Setting this too large (whichever option you choose) will cause
	ApacheTop to use more memory and more CPU time. My experimentation
	finds that remembering no more than around 5000 requests works well.

-q
	Instructs ApacheTop to keep the querystrings, not remove them

-l
	Instructs ApacheTop to lowercase all URLs, thus /FOO and /foo are
	treated as the same and accumulate the same statistics.

-s segments
	Instructs ApacheTop to only keep the first <segments> parts of the
	path. Trailing slashes are kept if present. Statistics are then
	merged for each truncated url.
	This is easiest to demonstrate with examples:
	-s 2 would produce the following:
	/media/x.jpg               ->  /media/x.jpg
	/media/images/x.jpg        ->  /media/images/
	/media/images/small/x.jpg  ->  /media/images/
	/media/images/big/x.jpg    ->  /media/images/
	Stats for the last three URLs would be merged in this case.

-p
	Instructs ApacheTop to keep the protocol (http:// usually) at the
	front of its' referrer strings. Normal behaviour is to remove them
	to give more room to more useful information.


-r secs
	Set default refresh delay, in seconds.


Once it's running, you'll see a display like this:

last hit: 09:17:07        atop runtime:  0 days, 00:58:20              09:17:08
All:       638924 reqs ( 182.65/sec)      3433539K ( 981.6K/sec)  (   5.4K/req)
2xx:  455415 (71.3%) 3xx:  175745 (27.5%) 4xx:  7746 ( 1.2%) 5xx:    10 ( 0.0%)
R ( 30s):    5195 reqs ( 173.17/sec)        25405K ( 846.8K/sec)  (   4.9K/req)
2xx:    3447 (66.4%) 3xx:    1715 (33.0%) 4xx:    33 ( 0.6%) 5xx:     0 ( 0.0%)
                                                                               
 REQS REQ/S    KB  KB/S URL                                                    
  103   3.4  2983  99.4 /                                 
   56   1.9   239   8.0 /tickerdata/story2.dat
   47   1.6   104   3.6 /home/today/patina.js
   44   1.5    82   2.8 /home/styles/home_d0e2ee.css
<snip>


The top line displays the time the last hit was seen, how long it's been
running, and the current time.

The next two lines display information about every single hit ApacheTop has
processed in this incarnation.
Firstly you see how many hits the data is representing. After that, the
average number of hits/second since starting. Following that, the total number
of KB witnessed; then the average KB/sec. Finally you see the average KB per
request.
The next line shows a breakdown of return codes; in this particular example you
can see that 71.3% of the hits returned a 2xx code. 27.5% were 3xx, and so on.
You also have the actual number of hits in each group.

The two lines below this are where the commandline options -h and -t come in.
The data in these lines reads the same as the two above them, but this data is
only for the hits remembered in ApacheTop's internal table (remember that?).
You can see how many seconds of data this represents in the R ( 30s) at the
beginning of the line. This is for 30 seconds. These two lines of information
are good for a "what is my server doing *right now*?" scenario, while the two
above them are good for a picture over the course of a few minutes or hours.

Underneath this header, you'll see a list of URLs along with their relevant
number of requests, requests per second, kb, and kb per second.
This list is generated from the internal table ApacheTop maintains. Thus, in
this example, the list is being generated from the last 30 seconds of data. You
can see the root page has been requested 103 times in the last 30 seconds,
resulting in about 3.4 hits per second. Additionally, those 103 requests have
resulted in 2983K of traffic, at an average of 99.4K/second.

You may sort this list by any of the first four columns as follows:
	r	Sort by REQUESTS
	R	Sort by REQUESTS/SECOND
	b	Sort by KBYTES
	B	Sort by KBYTES/SECOND


Additionally, you can press d during runtime to switch the list of displayed
items between URLs, IPs, and REFERRERs. URLs is the default, and simply
groups together hits on your site and provides collated stats for each one.
IPs, similarly, groups hits from each IP and shows you stats for it. So you
can see how much bandwidth is being used by any given IP. REFERRERs is handy
if you want to see where your traffic is coming from. The stats here reflect
how many pages/kbytes have been served as a result of a particular referrer.

To hold the current screen at any time, press p - statistics will still be
generated in the background, but whatever is displayed at the current time
is kept onscreen until you press p again.


Ignore the marker (*) beside the list of URLs/IPs/Referrers at the moment.
It doesn't do anything yet.


Bug reports and patches are very welcome. Please send any comments on.
Chris Elsworth <chris@shagged.org>
