The way to massively parallel computation
The vector and matrix objects provided by MtxVec have the ability to be super conductive for multiple threads. This means that when multiple objects are allocated and freed from multiple threads, there will never be a case of contention, or synchronization lock. This allows multi-threaded applications to use MtxVec expressions like a + b without worrying about memory allocation bottlenecks. Without MtxVec such expressions can result in nearly 100% contention of threads. We benchmarked the evaluation speed of the following function by varying the size of input vectors:
function TestFunction(x,m,b: TVec): Vector;
var xm: Vector;
xm.Adopt(x); //adopted by Vector to allow use of expressions
Result := 0.5*(1.0+ sgn(xm-m)*(1.0-Exp(-Abs(xm-m)/b)));
The memory management features:
- Dedicated memory allocated per thread typically does not exceed CPU cache size (2MB). This makes the operation very memory and CPU cache efficient.
- MtxVec memory management does not interfere with other parts of the application which continue to use the default memory manager. Only those parts of code using MtxVec based vector/matrix objects are affected.
- Inteligent memory reuse keeps the working set of memory tightely packed and small, taking optimal advantage of the CPU cache design.
- Truly linear scaling of code on multiple cores becomes a reality.
The first picture shows an example of using two cores and comparing processing gains of super conductive memory manager with standard memory manager as a function of vector length:
As the number of cores increases the minimum vector length for which performance penalty is miniscule increases. As the vector length increases, the pressure on CPU cache size increases, thus possibly nullifying any gains due to multi-threading. By using MtxVec these problems can be completely avoided and fully linear scaling can be achieved independently of vector length. The second picture was done on a quad core CPU:
On quad core CPU, the differences increase showing a larger advantage, indicating that the default memory manager does not scale beyond two cores for this type of processing and that for short vectors the gains of using more than two cores can also be negative. (Gain factor larger than 4).
The source code of the benchmark is included with Dew Lab Studio trial version and Dew Lab Studio demos.