We took an Intel Core i7-7820X for a spin and compared the speed-up for scientific computations to Intel Core i5-4670. In the table below you can see some results, which are very typical across a large range of different scientific algorithms. The test run is from our "Efficient multithreading" example in the MtxVec demo. The code computes DFT using vectorized sin, cos, add, multiply and sum of vector.
i5-4670, 32bit, 4cores,
i7-7820X, 32bit, 4cores
i7-7820X, 64bit, 4cores
i7-7820X, 64bit, 8cores
Pascal, one core (not vectorized)
One CPU core (vectorized)
With blocks, one CPU core
With hand-written blocks
Threaded, with blocks
Threaded, blocks, Annonymous
Tthreaded, hand written, DoForLoop
Threaded, blocks, TParallel.For
The code executed with MtxVec takes full advantage of all instruction set features. This includes AVX-512 included with i7 7820X. Note that "turbo" frequencies between both CPUs are different. When using AVX, the CPU will also not "turbo boost" up to the highest frequency. i7-7820X was mostly boosting up to 4.0GHz and the i5-4670 remained at 3.4GHz. The test was run with "default" optimized motherboard configuration and without overclocking.
Best results are in bold separately for single core (1.76s) and multi-core (0.27s) in the rightmost column. It appears that Intel software tools (compiler + libs) only optimize for AVX-512 for 64bit apps. In this (64bit) case the performance improvement per core is about 1.11/0.43 = 2.5x between both CPUs. In case of 32bit apps, the gain is only about 1.3x.The ratio of the fastest code path on 7820x against non-optimized code reaches a factor of 35/0.27 = 130x when all 8 cores are used with AVX-512. The fastest code path running on one core gives a gain 35/1.76 = 19.8x
Interestingly enough, the dgemm on which linear algebra (LAPACK) mostly depends on remains at only 30% gain even in 64bit mode. Possibly related to missing AVX-512 instructions available only on 7900X-series CPUs and some XEON CPUs. More AVX-512 capable CPUs are scheduled to be released in 2018 and 2019.
AVX-512 largely delivers on the promise on increasing the performance per clock by about 2x even in heaviliy multithreaded scenarios. This fact however is largely absent from various benchmarks that can be found on internet. Either the tested applications are not 64bit or they are not yet properly optimized for AVX-512 (instructions + memory bandwidth). When compared to i7-8700K, the multimedia and scientific benchmarks should be showing an advantage of about 1.8x per one core for i7-7280X.
The first release of Dew Lab Studio in 2018 brings support for Linux to MtxVec, DSP Master and Stats Master. The support is for now limited to those units, which do not require GUI and to the Core Edition with the latest Embarcadero Rad Studio Tokyo 10.2 (Update 2).
The latest update to Dew Lab Studio brings comprehensive support for Accelerate framework on Apple devices running iOS, iPad and OS X. The DSP Master has been complemented with cross-platform enabled components for audio playback and recording thus greatly simplifying development and deployment of audio processing/analysis applications to mobile platforms. Additionally mtxVec received a major upgrade for its expression parser/scripting engine bringing it much closer to Matlab/Scilab like capability. It is now possible to write while-loops, for-loops, if-else clauses and make use of the concatenation operator: a = [1,2 ; 3, 4];
The latest update brings support for Rad Studio 10.2 Tokyo and the .NET version delivers support for Visual Studio 2017.NET. Major new features include introduction of integer matrix type and extensive new integer math optimizations. Among other things, the expression parser received a major upgrade and now includes support for integers, integer vectors, integer matrices and boolean vectors and matrices.
The latest release of FFT Properties v6 brings several major enhancements. One is the ability of a real-time high resolution spectrogram with full support for zoom spectrum in Signal Recorder. SignalRecorder now also allows monitoring and recording the output of any playback device. One other very usefull feature is the ability of the SignalAnalyzer to open arbitrary (compressed) audio/video file. This makes it possible to analyze frequency content of audio tracks of any multi-language multi-channel movie file as well.
We are happy to announce availability of Dew Lab Studio 2016 supporting the latest Embarcadero Rad Studio 10.1 Berlin, additional performance improvements due to the updated dlls and many new features.
Latest update of Dew Lab Studio for VS.NET delivers notable performance boost to linear algebra routines, support for VS 2015.NET and support for latest Steema TeeChart. There were multiple improvements across all the products.
Latest version of Dew Lab Studio 2015 adds support for Delphi Rad Studio 10 Seattle and finally delivers comprehensive cross-platform support allowing users to deploy to Windows, Android, OS X and iOS from the same FireMonkey project. Several important issues have only been fixed in Delphi 10 thus making this possible for the very first time.
We are happy to announce the availability of update to Dew Lab Studio 2015 with full support for Android OS and Embarcadero Rad Studio XE8. The new release also gives a significant performance boost through multiple internal optimizations. Gains vary depending on algorithm.
We are happy to inform you of availability of the new major version of our product MtxVec v5. The key new feature of MtxVec v5 is called MtxVec Core Edition. This capability allows you to build your application based on MtxVec using only pure pascal code. The main purpose of this new feature is to allow portability of code based on MtxVec to other platforms like iOS, OSx and Android.
The new RAD Studio XE7 brings some new language syntax which can be helpfull when working with Vectors and Matrices. The Vector and Matrix types below are from our MtxVec numerical library. The new syntax options signifcantly improve code readabilty when creating or assigning arrays in code.
Dew Lab Studio 2014 with support for Rad Studio XE7 is now available. DSP Master received support for the new audio API available from Windows Vista onward. The codebase now also compiles for OSx and Android, although no official support yet.
Dew Lab Studio 2014 with support for Rad Studio XE6 is now available for download. New are methods for "generalized" computation of Eigenvalues, Schur vectors and Singular value decomposition. Eigenvalue calculations are faster, new are methods Eigenvalues and generalized Eigenvalues for symmetric sparse matrices, condition numbers for eigenvalues and more.
Delphi has featured function inlining since 2005. But it was not until XE6 and 2014 when this feature really lived up to its promise. Our MtxVec library uses default array property on records and objects to access individual values of vectors and matrices. Even though we specified the setter and getters to be inlined:
the performance did not match access to a simple dynamic array. Well, here comes the XE6 and the speed for 1D arrays is a match. Performance improvement by 6x. Even more, when accessing elements of 2D dynamic arrays, the 2D inline property for accessing elements on matrices is faster:
var a: TMtx; //our TObject class d: array of array of double; begin .. a[i,j] := 0; ///faster than d[i,j] ... d[i,j] := 0; .. end;
Performance improvement by a total of 4x in compare to XE5 and before. This makes a lot of our code noticably faster simply by making use of the new XE6 Delphi compiler.
New update of Dew Lab Studio 2014 for VS.NET delivers many new features and important bug fixes. Enhancements have also been made to the installer and a new set of dlls is included for faster performance.
Dew Lab Studio for FireMonkey has been made available for registered users for Embarcadero Delphi and C++Builder. Currently only Windows OS is supported. A prerelease version of MtxVec Core is included allowing the products to compile without dependancies upon external dlls.
While working on the next code update, the code example below raised some eyebrows. How much time does it take to compute equidistant histogram? It turnes out, that our old code needed about 3x more time than array copy operation. The current version is a near match, being only 10% slower than best optimized array copy:
Results.SetZero; Data.BlockInit; while not Data.BlockEnd do begin aData.ThresholdGT_LT(Data,Max,Max,Min, Min); aData.Normalize(aData, Min, BinWidth); aData.CopyTo(intData, TRounding.rnTrunc); HistoCount(intData.IData, intData.DataIndex(0), Results.IData, Results.dataIndex(0), intData.Length); Data.BlockNext; end; Data.SetFullRange; //Old code: // for i:=0 to Data.Length-1 do // begin // j:= Trunc((Data.Values[i]-Min)*InvBinWidth); // was round // if (j < 0) then Inc(leftTail) // else if (j >= NumBins) then Inc(righttail) // else Results.IValues[j] := Results.IValues[j] + 1; // end;
While the vectorization seems dominating, it brings only 20% speedup. The other 250% come in to effect only after block processing is applied. (BlockInit, BlockNext, BlockEnd).
An updated version of Dew Lab Studio for Delphi for Embarcadero XE5 has been made available. Most notably the support for 64bit C++Builder has been added. This release also includes an enhanced installer which automatically finds and detects the version of the installed TeeChart.