miti
/
graphene


			
				
					
						
						
							12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576
							.\" $Id$
.TH LAT_CTX 8 "$Date$" "(c)1994 Larry McVoy" "LMBENCH"
.SH NAME
lat_ctx \- context switching benchmark
.SH SYNOPSIS
.B lat_ctx 
.I [-s size_in_kbytes]
.I #procs [#procs ...]
.SH DESCRIPTION
.B lat_ctx
measures context switching time for any reasonable
number of processes of any reasonable size.
The processes are connected in a ring of Unix pipes.  Each process
reads a token from its pipe, possibly does some work, and then writes
the token to the next process.
.PP
Processes may vary in number.  Smaller numbers of processes result in
faster context switches.  More than 20 processes is not supported.
.PP
Processes may vary in size.  A size of zero is the baseline process that
does nothing except pass the token on to the next process.  A process size
of greater than zero means that the process does some work before passing
on the token.  The work is simulated as the summing up of an array of the
specified size.  The summing is an unrolled loop of about a 2.7 thousand
instructions.  
.PP
The effect is that both the data and the instruction cache
get polluted by some amount before the token is passed on.  The data 
cache gets polluted by approximately the process ``size''.  The instruction
cache gets polluted by a constant amount, approximately 2.7
thousand instructions.  
.PP
The pollution of the caches results in larger context switching times for
the larger processes.  This may be confusing because the benchmark takes
pains to measure only the context switch time, not including the overhead
of doing the work.  The subtle point is that the overhead is measured using
hot caches.  As the number and size of the processes increases, the caches
are more and more polluted until the set of processes do not fit.  The 
context switch times go up because a context switch is defined as the switch 
time
plus the time it takes to restore all of the process state, including 
cache state.  This means that the switch includes the time for the cache
misses on larger processes.
.SH OUTPUT
Output format is intended as input to \fBxgraph\fP or some similar program.
The format is multi line, the first line is a title that specifies the
size and non-context switching overhead of the test.  Each subsequent 
line is a pair of numbers that indicates the number of processes and 
the cost of a context switch.  The overhead and the context switch times are
in micro second units.  The numbers below are for a SPARCstation 2.
.sp
.ft CB
.nf
"size=0 ovr=179
2 71
4 104
8 134
16 333
20 438
.br
.fi
.ft
.SH BUGS
The numbers produced by this benchmark are somewhat inaccurate; they vary
by about 10 to 15% from run to run.  A series of runs may be done and the
lowest numbers reported.  The lower the number the more accurate the results.
.PP
The reasons for the inaccuracies are possibly interaction between the 
VM system and the processor caches.  It is possible that sometimes the
benchmark processes are laid out in memory such that there are fewer 
TLB/cache conflicts than other times.  This is pure speculation on my part.
.SH ACKNOWLEDGEMENT
Funding for the development of
this tool was provided by Sun Microsystems Computer Corporation.
.SH "SEE ALSO"
lmbench(8).