.\" $Id$ .TH BW_MEM_CP 8 "$Date$" "(c)1994 Larry McVoy" "LMBENCH" .SH NAME bw_mem_cp \- time memory copy speeds .SH SYNOPSIS .B bw_mem_cp .I size .I rd|wr|rdwr|cp|fwr|frd|fcp|bzero|bcopy .I [unaligned] .SH DESCRIPTION .B bw_mem measures memory bandwidth for a variety of memory operations, such as read, write, and copy. Results are reported in megabytes per second. .P The .B size specification may end with ``k'' or ``m'' to mean kilobytes (* 1024) or megabytes (* 1024 * 1024). .P The optional .B unaligned parameter ensures that the source and destination arrays are not aligned, and it only affects the .IR rdwr , .IR cp , and .I fcp benchmarks. .P The illustrative code fragments included in the descriptions below are a simplification of the actual code included in the benchmark. The benchmark code includes a large number of the desired operation inside the inner loop to ensure that the loop overhead is less than 1%. .TP .B "rd" measures the bandwidth for reading data into the processor. It computes the sum of an every fourth element of an integer array. .IP "" \fC for (i = 0; i < N; i += 4) sum += array[i];\fR .TP .B "wr" measures the bandwidth for writing data to memory. It assigns a constant value to every fourth element of an integer array. .IP "" \fC for (i = 0; i < N; i += 4) array[i] = 1;\fR .TP .B "rdwr" measures the bandwidth for reading data into memory and then writing data to the same memory location. For every fourth element in an array it adds the current value to a running sum before assigning a new (constant) value to the element. .IP "" \fC for (i = 0; i < N; i += 4) { .br sum += array[i]; .br array[i] = 1; .br }\fR .TP .B "cp" measures the bandwidth for copying data from one location to another. It does an array copy and it only accesses every fourth word. .IP "" \fC for (i = 0; i < N; i += 4) dest[i] = source[i];\fR .TP .B "frd" measures the bandwidth for reading data into the processor. It computes the sum of an array of integer values. .IP "" \fC for (i = 0; i < N; i++) sum += array[i];\fR .TP .B "fwr" measures the bandwidth for writing data to memory. It assigns a constant value to each member of an array of integer values. .IP "" \fC for (i = 0; i < N; i++) array[i] = 1;\fR .TP .B "fcp" measures the bandwidth for copying data from one location to another. It does an array copy: dest[i] = source[i]. .IP "" \fC for (i = 0; i < N; i++) { .br sum += array[i]; .br array[i] = 1; .br }\fR .TP .B "bzero" measures how fast the system can .I bzero memory. .TP .B "bcopy" measures how fast the system can .I bcopy data. .SH OUTPUT Output format is \f(CB"%0.2f %.2f\\n", megabytes, megabytes_per_second\fP, i.e., .sp .ft CB 8.00 25.33 .ft .SH MEMORY UTILIZATION This benchmark can move up to three times the requested memory. Bcopy will use 2-3 times as much memory bandwidth: there is one read from the source and a write to the destionation. The write usually results in a cache line read and then a write back of the cache line at some later point. Memory utilization might be reduced by 1/3 if the processor architecture implemented ``load cache line'' and ``store cache line'' instructions (as well as ``getcachelinesize''). .SH "SEE ALSO" lmbench(8).