.\" $Id$
.TH BW_MEM_CP 8 "$Date$" "(c)1994 Larry McVoy" "LMBENCH"
.SH NAME
bw_mem_cp \- time memory copy speeds
.SH SYNOPSIS
.B bw_mem_cp
.I size
.I rd|wr|rdwr|cp|fwr|frd|fcp|bzero|bcopy
.I [unaligned]
.SH DESCRIPTION
.B bw_mem
measures memory bandwidth for a variety of memory operations, such
as read, write, and copy.  Results are reported in megabytes per
second.
.P
The 
.B size
specification may end with ``k'' or ``m'' to mean
kilobytes (* 1024) or megabytes (* 1024 * 1024).
.P
The optional
.B unaligned
parameter ensures that the source and destination arrays
are not aligned, and it only affects the
.IR rdwr ,
.IR cp ,
and
.I fcp
benchmarks.
.P
The illustrative code fragments included in the descriptions 
below are a simplification of the actual code included in the
benchmark.  The benchmark code includes a large number of the 
desired operation inside the inner loop to ensure that the 
loop overhead is less than 1%.
.TP
.B "rd"
measures the bandwidth for reading data into the processor.  
It computes the sum of an every fourth element of an integer
array.
.IP ""
\fC	for (i = 0; i < N; i += 4) sum += array[i];\fR
.TP
.B "wr"
measures the bandwidth for writing data to memory.  
It assigns a constant value to every fourth element
of an integer array.
.IP ""
\fC	for (i = 0; i < N; i += 4) array[i] = 1;\fR
.TP
.B "rdwr"
measures the bandwidth for reading data into memory and 
then writing data to the same memory location.  
For every fourth element in an array it adds the current 
value to a running sum before assigning a new (constant)
value to the element.
.IP ""
\fC	for (i = 0; i < N; i += 4) { 
.br
		sum += array[i];
.br
		array[i] = 1;
.br
	}\fR
.TP
.B "cp"
measures the bandwidth for copying data from one location 
to another.  
It does an array copy and it only accesses every fourth word.
.IP ""
\fC	for (i = 0; i < N; i += 4) dest[i] = source[i];\fR
.TP
.B "frd"
measures the bandwidth for reading data into the processor.  
It computes the sum of an array of integer values.
.IP ""
\fC	for (i = 0; i < N; i++) sum += array[i];\fR
.TP
.B "fwr"
measures the bandwidth for writing data to memory.  
It assigns a constant value to each member of an array of 
integer values.
.IP ""
\fC	for (i = 0; i < N; i++) array[i] = 1;\fR
.TP
.B "fcp"
measures the bandwidth for copying data from one location to another.
It does an array copy: dest[i] = source[i].
.IP ""
\fC	for (i = 0; i < N; i++) { 
.br
		sum += array[i];
.br
		array[i] = 1;
.br
	}\fR
.TP
.B "bzero"
measures how fast the system can
.I bzero
memory.
.TP
.B "bcopy"
measures how fast the system can
.I bcopy
data.
.SH OUTPUT
Output format is \f(CB"%0.2f %.2f\\n", megabytes, 
megabytes_per_second\fP, i.e.,
.sp
.ft CB
8.00 25.33
.ft
.SH MEMORY UTILIZATION
This benchmark can move up to three times the requested memory.  
Bcopy will use 2-3 times as much memory bandwidth:
there is one read from the source and a write to the destionation.  The
write usually results in a cache line read and then a write back of
the cache line at some later point.  Memory utilization might be reduced
by 1/3 if the processor architecture implemented ``load cache line''
and ``store cache line'' instructions (as well as ``getcachelinesize'').
.SH "SEE ALSO"
lmbench(8).