lat_ctx.8 3.1 KB

12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576
  1. .\" $Id$
  2. .TH LAT_CTX 8 "$Date$" "(c)1994 Larry McVoy" "LMBENCH"
  3. .SH NAME
  4. lat_ctx \- context switching benchmark
  5. .SH SYNOPSIS
  6. .B lat_ctx
  7. .I [-s size_in_kbytes]
  8. .I #procs [#procs ...]
  9. .SH DESCRIPTION
  10. .B lat_ctx
  11. measures context switching time for any reasonable
  12. number of processes of any reasonable size.
  13. The processes are connected in a ring of Unix pipes. Each process
  14. reads a token from its pipe, possibly does some work, and then writes
  15. the token to the next process.
  16. .PP
  17. Processes may vary in number. Smaller numbers of processes result in
  18. faster context switches. More than 20 processes is not supported.
  19. .PP
  20. Processes may vary in size. A size of zero is the baseline process that
  21. does nothing except pass the token on to the next process. A process size
  22. of greater than zero means that the process does some work before passing
  23. on the token. The work is simulated as the summing up of an array of the
  24. specified size. The summing is an unrolled loop of about a 2.7 thousand
  25. instructions.
  26. .PP
  27. The effect is that both the data and the instruction cache
  28. get polluted by some amount before the token is passed on. The data
  29. cache gets polluted by approximately the process ``size''. The instruction
  30. cache gets polluted by a constant amount, approximately 2.7
  31. thousand instructions.
  32. .PP
  33. The pollution of the caches results in larger context switching times for
  34. the larger processes. This may be confusing because the benchmark takes
  35. pains to measure only the context switch time, not including the overhead
  36. of doing the work. The subtle point is that the overhead is measured using
  37. hot caches. As the number and size of the processes increases, the caches
  38. are more and more polluted until the set of processes do not fit. The
  39. context switch times go up because a context switch is defined as the switch
  40. time
  41. plus the time it takes to restore all of the process state, including
  42. cache state. This means that the switch includes the time for the cache
  43. misses on larger processes.
  44. .SH OUTPUT
  45. Output format is intended as input to \fBxgraph\fP or some similar program.
  46. The format is multi line, the first line is a title that specifies the
  47. size and non-context switching overhead of the test. Each subsequent
  48. line is a pair of numbers that indicates the number of processes and
  49. the cost of a context switch. The overhead and the context switch times are
  50. in micro second units. The numbers below are for a SPARCstation 2.
  51. .sp
  52. .ft CB
  53. .nf
  54. "size=0 ovr=179
  55. 2 71
  56. 4 104
  57. 8 134
  58. 16 333
  59. 20 438
  60. .br
  61. .fi
  62. .ft
  63. .SH BUGS
  64. The numbers produced by this benchmark are somewhat inaccurate; they vary
  65. by about 10 to 15% from run to run. A series of runs may be done and the
  66. lowest numbers reported. The lower the number the more accurate the results.
  67. .PP
  68. The reasons for the inaccuracies are possibly interaction between the
  69. VM system and the processor caches. It is possible that sometimes the
  70. benchmark processes are laid out in memory such that there are fewer
  71. TLB/cache conflicts than other times. This is pure speculation on my part.
  72. .SH ACKNOWLEDGEMENT
  73. Funding for the development of
  74. this tool was provided by Sun Microsystems Computer Corporation.
  75. .SH "SEE ALSO"
  76. lmbench(8).