123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123 |
- .\" $Id$
- .TH BW_MEM_CP 8 "$Date$" "(c)1994 Larry McVoy" "LMBENCH"
- .SH NAME
- bw_mem_cp \- time memory copy speeds
- .SH SYNOPSIS
- .B bw_mem_cp
- .I size
- .I rd|wr|rdwr|cp|fwr|frd|fcp|bzero|bcopy
- .I [unaligned]
- .SH DESCRIPTION
- .B bw_mem
- measures memory bandwidth for a variety of memory operations, such
- as read, write, and copy. Results are reported in megabytes per
- second.
- .P
- The
- .B size
- specification may end with ``k'' or ``m'' to mean
- kilobytes (* 1024) or megabytes (* 1024 * 1024).
- .P
- The optional
- .B unaligned
- parameter ensures that the source and destination arrays
- are not aligned, and it only affects the
- .IR rdwr ,
- .IR cp ,
- and
- .I fcp
- benchmarks.
- .P
- The illustrative code fragments included in the descriptions
- below are a simplification of the actual code included in the
- benchmark. The benchmark code includes a large number of the
- desired operation inside the inner loop to ensure that the
- loop overhead is less than 1%.
- .TP
- .B "rd"
- measures the bandwidth for reading data into the processor.
- It computes the sum of an every fourth element of an integer
- array.
- .IP ""
- \fC for (i = 0; i < N; i += 4) sum += array[i];\fR
- .TP
- .B "wr"
- measures the bandwidth for writing data to memory.
- It assigns a constant value to every fourth element
- of an integer array.
- .IP ""
- \fC for (i = 0; i < N; i += 4) array[i] = 1;\fR
- .TP
- .B "rdwr"
- measures the bandwidth for reading data into memory and
- then writing data to the same memory location.
- For every fourth element in an array it adds the current
- value to a running sum before assigning a new (constant)
- value to the element.
- .IP ""
- \fC for (i = 0; i < N; i += 4) {
- .br
- sum += array[i];
- .br
- array[i] = 1;
- .br
- }\fR
- .TP
- .B "cp"
- measures the bandwidth for copying data from one location
- to another.
- It does an array copy and it only accesses every fourth word.
- .IP ""
- \fC for (i = 0; i < N; i += 4) dest[i] = source[i];\fR
- .TP
- .B "frd"
- measures the bandwidth for reading data into the processor.
- It computes the sum of an array of integer values.
- .IP ""
- \fC for (i = 0; i < N; i++) sum += array[i];\fR
- .TP
- .B "fwr"
- measures the bandwidth for writing data to memory.
- It assigns a constant value to each member of an array of
- integer values.
- .IP ""
- \fC for (i = 0; i < N; i++) array[i] = 1;\fR
- .TP
- .B "fcp"
- measures the bandwidth for copying data from one location to another.
- It does an array copy: dest[i] = source[i].
- .IP ""
- \fC for (i = 0; i < N; i++) {
- .br
- sum += array[i];
- .br
- array[i] = 1;
- .br
- }\fR
- .TP
- .B "bzero"
- measures how fast the system can
- .I bzero
- memory.
- .TP
- .B "bcopy"
- measures how fast the system can
- .I bcopy
- data.
- .SH OUTPUT
- Output format is \f(CB"%0.2f %.2f\\n", megabytes,
- megabytes_per_second\fP, i.e.,
- .sp
- .ft CB
- 8.00 25.33
- .ft
- .SH MEMORY UTILIZATION
- This benchmark can move up to three times the requested memory.
- Bcopy will use 2-3 times as much memory bandwidth:
- there is one read from the source and a write to the destionation. The
- write usually results in a cache line read and then a write back of
- the cache line at some later point. Memory utilization might be reduced
- by 1/3 if the processor architecture implemented ``load cache line''
- and ``store cache line'' instructions (as well as ``getcachelinesize'').
- .SH "SEE ALSO"
- lmbench(8).
|