Goto Chapter: Top 1 2 3 4 5 Ind
 [Top of Book]  [Contents]   [Previous Chapter]   [Next Chapter] 

3 Finding a suitable number of blocks
 3.1 Measure Contention

3 Finding a suitable number of blocks

Experiments with matrices over fields of sizes 2 to 11 and dimensions 500 to 10.000 have found values for numberBlocks from 6 to 15 to be acceptable. Note though, that this highly depends on the calculation, the number of used threads and the machine itself.

3.1 Measure Contention

3.1-1 GAUSS_MeasureContention
‣ GAUSS_MeasureContention( numberBlocks, q, A[, showOutput] )( function )

This function helps with finding a suitable value for numberBlocks, which has a big influence on the performance of the EchelonMatTransformationBlockwise (2.1-1) function. For inputs numberBlocks, a field size q, and a matrix A over GF(q), this function does the following:

The input showOutput can be used to suppress printing of messages.

The influence of numberBlocks on the performance is as follows:

In a parallel computation several threads may try to access an object at the same time. HPC-GAP needs to prevent situations in which one thread writes into an object that another thread is currently reading, since that could lead to corrupted data. To this end, HPC-GAP can synchronize access to objects via locks.

If a thread reads an object, that thread can acquire a lock for that object, that is no other thread can write into it until the lock is released. If a thread tries to write into said object before the lock is released we say that the lock was contended. In such a situation the contending thread needs to wait until the lock is released. This leads to waiting times or unnecessary context-switches and deteriorates performance if it happens too often.

gap> n := 4000;; numberBlocks := 8;; q := 5;;
gap> A := RandomMat(n, n, GF(q));;
gap> GAUSS_MeasureContention(numberBlocks, q, A);
Make sure you called GAP with sufficient preallocated memory via `-m` if you
try non-trivial examples! Otherwise garbage collection will be a big
overhead.

Starting the parallel algorithm
EchelonMatTransformationBlockwise.

Wall time  parallel execution (ms): 33940.
CPU  time  parallel execution (ms): 2
Lock statistics(estimates):
acquired - 181502, contended - 1120, ratio - 1.%
Locks acquired and contention counters per thread
[ thread, locks acquired, locks contended ]:
[ 5, 54093, 248 ]
[ 6, 51004, 228 ]
[ 7, 37102, 389 ]
[ 8, 39303, 255 ]

Starting the sequential algorithm
EchelonMatTransformation

Wall time Gauss pkg execution (ms): 66778.

Speedup factor (sequential / parallel wall time):
1.96
 [Top of Book]  [Contents]   [Previous Chapter]   [Next Chapter] 
Goto Chapter: Top 1 2 3 4 5 Ind

generated by GAPDoc2HTML