Introducing

MtxVec v5

Multicore math engine for science and engineering

Develop with Delphi, C# or C++ and deliver the code speed of assembler.

Comprehensive and fast numerical math library

Support for VS.NET, Embarcadero Delphi and C++ Builder

Statistical and DSP add-ons

MtxVec

Multicore math engine for science and engineering

DSP Master

Advanced signal processing package

Stats Master

Statistical package

Data Miner

Artificial intelligence enabling components

FFT Properties

Signal Analyzer and Recorder

Latest News

Desktop CPU Performance progress from 2013 until 2017 (AVX-512 tested)

We took an Intel Core i7-7820X for a spin and compared the speed-up for scientific computations to Intel Core i5-4670. In the table below you can see some results, which are very typical across a large range of different scientific algorithms. The test run is from our "Efficient multithreading" example in the MtxVec demo. The code computes DFT using vectorized sin, cos, add, multiply and sum of vector.

 

  i5-4670, 32bit, 4cores,
i7-7820X, 32bit, 4cores
i7-7820X, 64bit, 4cores i7-7820X, 64bit, 8cores
Pascal, one core (not vectorized) 40.24s 34.59s 35.62s 35.19
One CPU core (vectorized) 7.12s 5.86s 3.72s 3.77s
With blocks, one CPU core  6.80s  4.67s 2.44s 2.40s
With hand-written blocks  5.75s  4.25s 1.75s 1.76s
Threaded (naive)  9.12s  7.22s 5.96s 5.52s
Threaded, with blocks  1.77s  1.22s 0.55s 0.34s
Threaded, blocks, Annonymous  1.78s  1.18s 0.57s 0.33s
Tthreaded, hand written, DoForLoop  1.54s  1.11s 0.43s 0.27s
Threaded, blocks, TParallel.For  2.93s  2.27s 1.20s 0.97s

 

The code executed with MtxVec takes full advantage of all instruction set features. This includes AVX-512 included with i7 7820X. Note that "turbo" frequencies between both CPUs are different. When using AVX, the CPU will also not "turbo boost" up to the highest frequency. i7-7820X was mostly boosting up to 4.0GHz and the i5-4670 remained at 3.4GHz. The test was run with "default" optimized motherboard configuration and without overclocking. 

Best results are in bold separately for single core (1.76s) and multi-core (0.27s) in the rightmost column. It appears that Intel software tools (compiler + libs) only optimize for AVX-512 for 64bit apps. In this (64bit) case the performance improvement per core is about 1.11/0.43 = 2.5x between both CPUs. In case of 32bit apps, the gain is only about 1.3x.The ratio of the fastest code path on 7820x against non-optimized code reaches a factor of 35/0.27 = 130x when all 8 cores are used with AVX-512. The fastest code path running on one core gives a gain 35/1.76 = 19.8x

Interestingly enough, the dgemm on which linear algebra (LAPACK) mostly depends on remains at only 30% gain even in 64bit mode. Possibly related to missing AVX-512 instructions available only on 7900X-series CPUs and some XEON CPUs. More AVX-512 capable CPUs are scheduled to be released in 2018 and 2019.

AVX-512 largely delivers on the promise on increasing the performance per clock by about 2x even in heaviliy multithreaded scenarios. This fact however is largely absent from various benchmarks that can be found on internet. Either the tested applications are not 64bit or they are not yet properly optimized for AVX-512 (instructions + memory bandwidth). When compared to i7-8700K, the multimedia and scientific benchmarks should be showing an advantage of about 1.8x per one core for i7-7280X.

Read more

Embt Partner2
Microsoft DotNET Logo
Logo Vsip

Numerical library for Delphi and .NET developers

Dew Research develops mathematical software for advanced scientific computing trusted by many customers. MtxVec for Delphi, C++ Builder or .NET is alternative for products like Matlab, LabView, OMatrix, SciLab, etc. We offer a high performance math library, statistics library and digital signal processing library (dsp library) for:

  • Embarcadero/CodeGear Delphi and C++Builder numerical libraries and components
  • Microsoft .NET components -- including Visual Studio add-ons and a .NET numerical library for C++, C#, and Visual Basic

Product Features

MtxVec is an object oriented vectorized math library, the core of Dew Lab Studio, featuring a comprehensive set of mathematical and statistical functions executing at impressive speeds.
Designed for large data sets with complete vector/matrix arithmetic, it adds the following capabilities to your development environment:
  • A comprehensive set of mathematical, signal processing and statistical functions
  • Substantial performance improvements of floating point math by exploiting the SSE2, SSE3, SSE4.2 and Intel AVX 1.0 and AVX 2.0 instruction sets offered by modern CPUs.
  • Solutions based on it scale linearly with core count which makes it ideal for massively parallel systems.
  • Improved compactness and readability of code.
  • Support for native 64bit execution gives free way to memory hungry applications
  • Significantly shorter development times by protecting the developer from a wide range of possible errors.
  • Direct integration with TeeChart© to simplify and speed up the charting.
  • No royalty fees for distribution of compiled binaries in products you develop
Displaying large amounts of data

Displaying large amounts of data

Superconductive memory manager

Superconductive memory manager

Linear and cubic interpolation

Linear and cubic interpolation

Optimized Functions

The base math library uses the LAPACK (Linear Algebra Pack) version optimized for Core Duo and Core i7 CPU’s provided by Intel with their Math Kernel library. Our library is organized into a set of “primitive” highly optimized functions covering all the basic math operations. All higher level algorithms use these basic optimized functions, similar to the way LAPACK uses the Basic Linear Algebra Subprograms (BLAS).

Performance Secrets

Code vectorization

The program achieves substantial performance improvements in floating point arithmetic by exploiting the CPU Streaming SIMD Extensions (SSE) 2, 3 and SSE4 instruction sets. (SIMD = Single Instruction Multiple Data.)

Super conductive memory management

Effective massively parallel execution is achieved with the help of a super conductive memory management, which features zero thread contention and inter-lock problems allowing linear scaling with number of cores while maintaining low memory consumption and no interference with non-computational parts of the project.

Some of our customers

Bank for International Settlements (BIS)
Fraunhofer Institute of Optronics, System Technologies, and Image Exploitation IOSB
Accelerate Diagnostics, Inc.
marketingQED Ltd
NMISA - National Metrology Institute of South Africa
French National Institute for Agricultural Research

© DewResearch 1997 - 2018 All Rights Reserved.

E-mail This email address is being protected from spambots. You need JavaScript enabled to view it..
Delphi & C++ Builder are registered trademarks of Embarcadero Corporation. All other brands and product names are trademarks or registered trademarks of their respective owners.