News

Desktop CPU Performance progress from 2013 until 2017 (AVX-512 tested)

We took an Intel Core i7-7820X for a spin and compared the speed-up for scientific computations to Intel Core i5-4670. In the table below you can see some results, which are very typical across a large range of different scientific algorithms. The test run is from our "Efficient multithreading" example in the MtxVec demo. The code computes DFT using vectorized sin, cos, add, multiply and sum of vector.

 

  i5-4670, 32bit, 4cores,
i7-7820X, 32bit, 4cores
i7-7820X, 64bit, 4cores i7-7820X, 64bit, 8cores
Pascal, one core (not vectorized) 40.24s 34.59s 35.62s 35.19
One CPU core (vectorized) 7.12s 5.86s 3.72s 3.77s
With blocks, one CPU core  6.80s  4.67s 2.44s 2.40s
With hand-written blocks  5.75s  4.25s 1.75s 1.76s
Threaded (naive)  9.12s  7.22s 5.96s 5.52s
Threaded, with blocks  1.77s  1.22s 0.55s 0.34s
Threaded, blocks, Annonymous  1.78s  1.18s 0.57s 0.33s
Tthreaded, hand written, DoForLoop  1.54s  1.11s 0.43s 0.27s
Threaded, blocks, TParallel.For  2.93s  2.27s 1.20s 0.97s

 

The code executed with MtxVec takes full advantage of all instruction set features. This includes AVX-512 included with i7 7820X. Note that "turbo" frequencies between both CPUs are different. When using AVX, the CPU will also not "turbo boost" up to the highest frequency. i7-7820X was mostly boosting up to 4.0GHz and the i5-4670 remained at 3.4GHz. The test was run with "default" optimized motherboard configuration and without overclocking. 

Best results are in bold separately for single core (1.76s) and multi-core (0.27s) in the rightmost column. It appears that Intel software tools (compiler + libs) only optimize for AVX-512 for 64bit apps. In this (64bit) case the performance improvement per core is about 1.11/0.43 = 2.5x between both CPUs. In case of 32bit apps, the gain is only about 1.3x.The ratio of the fastest code path on 7820x against non-optimized code reaches a factor of 35/0.27 = 130x when all 8 cores are used with AVX-512. The fastest code path running on one core gives a gain 35/1.76 = 19.8x

Interestingly enough, the dgemm on which linear algebra (LAPACK) mostly depends on remains at only 30% gain even in 64bit mode. Possibly related to missing AVX-512 instructions available only on 7900X-series CPUs and some XEON CPUs. More AVX-512 capable CPUs are scheduled to be released in 2018 and 2019.

AVX-512 largely delivers on the promise on increasing the performance per clock by about 2x even in heaviliy multithreaded scenarios. This fact however is largely absent from various benchmarks that can be found on internet. Either the tested applications are not 64bit or they are not yet properly optimized for AVX-512 (instructions + memory bandwidth). When compared to i7-8700K, the multimedia and scientific benchmarks should be showing an advantage of about 1.8x per one core for i7-7280X.

  • Created on .

Dew Lab Studio 2018 for .NET

The most recent release of Dew Lab Studio for .NET delivers a cumulative update of all the new features added to our products Math, Stats and DSP Master in 2017.

  • Created on .

Dew Lab Studio 2018

The first release of Dew Lab Studio in 2018 brings support for Linux to MtxVec, DSP Master and Stats Master. The support is for now limited to those units, which do not require GUI and to the Core Edition with the latest Embarcadero Rad Studio Tokyo 10.2 (Update 2).

  • Created on .

Dew Lab Studio 2017 R3

The latest update to Dew Lab Studio brings comprehensive support for Accelerate framework on Apple devices running iOS, iPad and OS X. The DSP Master has been complemented with cross-platform enabled components for audio playback and recording thus greatly simplifying development and deployment of audio processing/analysis applications to mobile platforms. Additionally mtxVec received a major upgrade for its expression parser/scripting engine bringing it much closer to Matlab/Scilab like capability. It is now possible to write while-loops, for-loops, if-else clauses and make use of the concatenation operator: a = [1,2 ; 3, 4];

  • Created on .

Dew Lab Studio 2017 R2

The latest update brings support for Rad Studio 10.2 Tokyo and the .NET version delivers support for Visual Studio 2017.NET. Major new features include introduction of integer matrix type and extensive new integer math optimizations. Among other things, the expression parser received a major upgrade and now includes support for integers, integer vectors, integer matrices and boolean vectors and matrices.

  • Created on .

FFT Properties v6 has arrived

The latest release of FFT Properties v6 brings several major enhancements. One is the ability of a real-time high resolution spectrogram with full support for zoom spectrum in Signal Recorder. SignalRecorder now also allows monitoring and recording the output of any playback device. One other very usefull feature is the ability of the SignalAnalyzer to open arbitrary (compressed) audio/video file. This makes it possible to analyze frequency content of audio tracks of any multi-language multi-channel movie file as well.

  • Created on .

Dew Lab Studio 2016

We are happy to announce availability of Dew Lab Studio 2016 supporting the latest Embarcadero Rad Studio 10.1 Berlin, additional performance improvements due to the updated dlls and many new features.
  • Created on .

Dew Lab Studio for VS.NET 2016

Latest update of Dew Lab Studio for VS.NET delivers notable performance boost to linear algebra routines, support for VS 2015.NET and support for latest Steema TeeChart. There were multiple improvements across all the products.
  • Created on .

Dew Lab Studio for Delphi Seattle 10

Latest version of Dew Lab Studio 2015 adds support for Delphi Rad Studio 10 Seattle and finally delivers comprehensive cross-platform support allowing users to deploy to Windows, Android, OS X and iOS from the same FireMonkey project. Several important issues have only been fixed in Delphi 10 thus making this possible for the very first time.
  • Created on .

Dew Lab Studio for Android and XE8

We are happy to announce the availability of update to Dew Lab Studio 2015 with full support for Android OS and Embarcadero Rad Studio XE8. The new release also gives a significant performance boost through multiple internal optimizations. Gains vary depending on algorithm.
  • Created on .

Dew Lab Studio 2015 and MtxVec v5

We are happy to inform you of availability of the new major version of our product MtxVec v5. The key new feature of MtxVec v5 is called MtxVec Core Edition. This capability allows you to build your application based on MtxVec using only pure pascal code. The main purpose of this new feature is to allow portability of code based  on MtxVec to other platforms like iOS, OSx and Android.
  • Created on .

New language features in Delphi XE7

The new RAD Studio XE7 brings some new language syntax which can be helpfull when working with Vectors and Matrices. The Vector and Matrix types below are from our MtxVec numerical library. The new syntax options signifcantly improve code readabilty when creating or assigning arrays in code.

 

XE7Arrays

  • Created on .

Dew Lab Studio for XE7

Dew Lab Studio 2014 with support for Rad Studio XE7 is now available. DSP Master received support for the new audio API available from Windows Vista onward. The codebase now also compiles for OSx and Android, although no official support yet.

  • Created on .

Dew Lab Studio for XE6

Dew Lab Studio 2014 with support for Rad Studio XE6 is now available for download. New are methods for "generalized" computation of Eigenvalues, Schur vectors and Singular value decomposition. Eigenvalue calculations are faster, new are methods Eigenvalues and generalized Eigenvalues for symmetric sparse matrices, condition numbers for eigenvalues and more.

  • Created on .

Rad Studio XE6, Lo and Behold!

Delphi has featured function inlining since 2005. But it was not until XE6 and 2014 when this feature really lived up to its promise. Our MtxVec library uses default array property on records and objects to access individual values of vectors and matrices. Even though we specified the setter and getters to be inlined:

function getDefaultArray(const Idx: integer): double; inline;
procedure setDefaultArray(const Idx: integer; const Value: double); inline;

...

property Values[const Idx: integer]: double read getDefaultArray write setDefaultArray; default;

the performance did not match access to a simple dynamic array. Well, here comes the XE6 and the speed for 1D arrays is a match. Performance improvement by 6x. Even more, when accessing elements of 2D dynamic arrays, the 2D inline property for accessing elements on matrices is faster: 

var a: TMtx;  //our TObject class
   d: array of array of double;
begin
..
a[i,j] := 0;  ///faster than d[i,j]
...
d[i,j] := 0;
..
end;

Performance improvement by a total of 4x in compare to XE5 and before. This makes a lot of our code noticably faster simply by making use of the new XE6 Delphi compiler.

  • Created on .

Dew Lab Studio 2014 for VS.NET

New update of Dew Lab Studio 2014 for VS.NET delivers many new features and important bug fixes. Enhancements have also been made to the installer and a new set of dlls is included for faster performance.

  • Created on .

Dew Lab Studio for FireMonkey

Dew Lab Studio for FireMonkey has been made available for registered users for Embarcadero Delphi and C++Builder. Currently only Windows OS is supported. A prerelease version of MtxVec Core is included allowing the products to compile without dependancies upon external dlls.
  • Created on .

Code snipet for Histogram

While working on the next code update, the code example below raised some eyebrows. How much time does it take to compute equidistant histogram? It turnes out, that our old code needed about 3x more time than array copy operation. The current version is a near match, being only 10% slower than best optimized array copy:

        Results.SetZero;
        Data.BlockInit;
        while not Data.BlockEnd do
        begin
            aData.ThresholdGT_LT(Data,Max,Max,Min, Min);
            aData.Normalize(aData, Min, BinWidth);
            aData.CopyTo(intData, TRounding.rnTrunc);
            HistoCount(intData.IData, intData.DataIndex(0), Results.IData, Results.dataIndex(0), intData.Length);
            Data.BlockNext;
        end;
        Data.SetFullRange;
//Old code:
//        for i:=0 to Data.Length-1 do
//        begin
//          j:= Trunc((Data.Values[i]-Min)*InvBinWidth); // was round
//          if (j < 0) then Inc(leftTail)
//          else if (j >= NumBins) then Inc(righttail)
//          else Results.IValues[j] := Results.IValues[j] + 1;
//        end;
While the vectorization seems dominating, it brings only 20% speedup. The other 250% come in to effect only after block processing is applied. (BlockInit, BlockNext, BlockEnd).

  • Created on .

Dew Lab Studio for XE5

An updated version of Dew Lab Studio for Delphi for Embarcadero XE5 has been made available. Most notably the support for 64bit C++Builder has been added. This release also includes an enhanced installer which automatically finds and detects the version of the installed TeeChart.

  • Created on .

© DewResearch 1997 - 2018 All Rights Reserved.

E-mail This email address is being protected from spambots. You need JavaScript enabled to view it..
Delphi & C++ Builder are registered trademarks of Embarcadero Corporation. All other brands and product names are trademarks or registered trademarks of their respective owners.