GAP (GaussPar) - Chapter 3: Finding a suitable number of blocks

Goto Chapter: Top 1 2 3 4 5 Ind

3 Finding a suitable number of blocks

Experiments with matrices over fields of sizes 2 to 11 and dimensions 500 to 10.000 have found values for numberBlocks from 6 to 15 to be acceptable. Note though, that this highly depends on the calculation, the number of used threads and the machine itself.

3.1 Measure Contention

3.1-1 GAUSS_MeasureContention

‣ GAUSS_MeasureContention( numberBlocks, q, A[, showOutput] ) ( function )

This function helps with finding a suitable value for numberBlocks, which has a big influence on the performance of the EchelonMatTransformationBlockwise (2.1-1) function. For inputs numberBlocks, a field size q, and a matrix A over GF(q), this function does the following:

it calls the parallel function EchelonMatTransformationBlockwise (2.1-1),
it calls the sequential function EchelonMatTransformation from the package Gauss,
it prints the wall time, that is the elapsed real time, each call took,
for the parallel function, it prints an estimate of the lock contenation ratio, a short explanation of lock contention is given below. A high lock contention ratio deteriorates the performance of the algorithm.

The input showOutput can be used to suppress printing of messages.

The influence of numberBlocks on the performance is as follows:

if numberBlocks is too small, then not enough calculations can happen in parallel
if numberBlocks is too big then:
- the lock contention ratio increases
- HPC-GAP's task framework generates a big overhead since it is not optimized.

In a parallel computation several threads may try to access an object at the same time. HPC-GAP needs to prevent situations in which one thread writes into an object that another thread is currently reading, since that could lead to corrupted data. To this end, HPC-GAP can synchronize access to objects via locks.

If a thread reads an object, that thread can acquire a lock for that object, that is no other thread can write into it until the lock is released. If a thread tries to write into said object before the lock is released we say that the lock was contended. In such a situation the contending thread needs to wait until the lock is released. This leads to waiting times or unnecessary context-switches and deteriorates performance if it happens too often.

gap> n := 4000;; numberBlocks := 8;; q := 5;;
gap> A := RandomMat(n, n, GF(q));;
gap> GAUSS_MeasureContention(numberBlocks, q, A);
Make sure you called GAP with sufficient preallocated memory via `-m` if you
try non-trivial examples! Otherwise garbage collection will be a big
overhead.

Starting the parallel algorithm
EchelonMatTransformationBlockwise.

Wall time  parallel execution (ms): 33940.
CPU  time  parallel execution (ms): 2
Lock statistics(estimates):
acquired - 181502, contended - 1120, ratio - 1.%
Locks acquired and contention counters per thread
[ thread, locks acquired, locks contended ]:
[ 5, 54093, 248 ]
[ 6, 51004, 228 ]
[ 7, 37102, 389 ]
[ 8, 39303, 255 ]

Starting the sequential algorithm
EchelonMatTransformation

Wall time Gauss pkg execution (ms): 66778.

Speedup factor (sequential / parallel wall time):
1.96

[Top of Book] [Contents] [Previous Chapter] [Next Chapter]

Goto Chapter: Top 1 2 3 4 5 Ind

generated by GAPDoc2HTML