bw_mem.8 3.2 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123
  1. .\" $Id$
  2. .TH BW_MEM_CP 8 "$Date$" "(c)1994 Larry McVoy" "LMBENCH"
  3. .SH NAME
  4. bw_mem_cp \- time memory copy speeds
  5. .SH SYNOPSIS
  6. .B bw_mem_cp
  7. .I size
  8. .I rd|wr|rdwr|cp|fwr|frd|fcp|bzero|bcopy
  9. .I [unaligned]
  10. .SH DESCRIPTION
  11. .B bw_mem
  12. measures memory bandwidth for a variety of memory operations, such
  13. as read, write, and copy. Results are reported in megabytes per
  14. second.
  15. .P
  16. The
  17. .B size
  18. specification may end with ``k'' or ``m'' to mean
  19. kilobytes (* 1024) or megabytes (* 1024 * 1024).
  20. .P
  21. The optional
  22. .B unaligned
  23. parameter ensures that the source and destination arrays
  24. are not aligned, and it only affects the
  25. .IR rdwr ,
  26. .IR cp ,
  27. and
  28. .I fcp
  29. benchmarks.
  30. .P
  31. The illustrative code fragments included in the descriptions
  32. below are a simplification of the actual code included in the
  33. benchmark. The benchmark code includes a large number of the
  34. desired operation inside the inner loop to ensure that the
  35. loop overhead is less than 1%.
  36. .TP
  37. .B "rd"
  38. measures the bandwidth for reading data into the processor.
  39. It computes the sum of an every fourth element of an integer
  40. array.
  41. .IP ""
  42. \fC for (i = 0; i < N; i += 4) sum += array[i];\fR
  43. .TP
  44. .B "wr"
  45. measures the bandwidth for writing data to memory.
  46. It assigns a constant value to every fourth element
  47. of an integer array.
  48. .IP ""
  49. \fC for (i = 0; i < N; i += 4) array[i] = 1;\fR
  50. .TP
  51. .B "rdwr"
  52. measures the bandwidth for reading data into memory and
  53. then writing data to the same memory location.
  54. For every fourth element in an array it adds the current
  55. value to a running sum before assigning a new (constant)
  56. value to the element.
  57. .IP ""
  58. \fC for (i = 0; i < N; i += 4) {
  59. .br
  60. sum += array[i];
  61. .br
  62. array[i] = 1;
  63. .br
  64. }\fR
  65. .TP
  66. .B "cp"
  67. measures the bandwidth for copying data from one location
  68. to another.
  69. It does an array copy and it only accesses every fourth word.
  70. .IP ""
  71. \fC for (i = 0; i < N; i += 4) dest[i] = source[i];\fR
  72. .TP
  73. .B "frd"
  74. measures the bandwidth for reading data into the processor.
  75. It computes the sum of an array of integer values.
  76. .IP ""
  77. \fC for (i = 0; i < N; i++) sum += array[i];\fR
  78. .TP
  79. .B "fwr"
  80. measures the bandwidth for writing data to memory.
  81. It assigns a constant value to each member of an array of
  82. integer values.
  83. .IP ""
  84. \fC for (i = 0; i < N; i++) array[i] = 1;\fR
  85. .TP
  86. .B "fcp"
  87. measures the bandwidth for copying data from one location to another.
  88. It does an array copy: dest[i] = source[i].
  89. .IP ""
  90. \fC for (i = 0; i < N; i++) {
  91. .br
  92. sum += array[i];
  93. .br
  94. array[i] = 1;
  95. .br
  96. }\fR
  97. .TP
  98. .B "bzero"
  99. measures how fast the system can
  100. .I bzero
  101. memory.
  102. .TP
  103. .B "bcopy"
  104. measures how fast the system can
  105. .I bcopy
  106. data.
  107. .SH OUTPUT
  108. Output format is \f(CB"%0.2f %.2f\\n", megabytes,
  109. megabytes_per_second\fP, i.e.,
  110. .sp
  111. .ft CB
  112. 8.00 25.33
  113. .ft
  114. .SH MEMORY UTILIZATION
  115. This benchmark can move up to three times the requested memory.
  116. Bcopy will use 2-3 times as much memory bandwidth:
  117. there is one read from the source and a write to the destionation. The
  118. write usually results in a cache line read and then a write back of
  119. the cache line at some later point. Memory utilization might be reduced
  120. by 1/3 if the processor architecture implemented ``load cache line''
  121. and ``store cache line'' instructions (as well as ``getcachelinesize'').
  122. .SH "SEE ALSO"
  123. lmbench(8).