Skip to main content

Knowledge Base

The Fast And Accurate Scalar Math Library

What Is Math387

Math387 is the scalar-math foundation of the MtxVec library — the unit you import when you want Sin, Cos, Log, Exp, Power, Sqrt, and the rest of the elementary mathematical function family to be as fast as the silicon allows and as accurate as IEEE 754 permits, without compromising one for the other.

The unit is a direct replacement for the standard Math (Delphi) or <cmath> (C++ Builder) imports in your numerical code. Nearly every function you would call from those imports — and several dozen that aren't in either — is provided in Math387 with a familiar name, a familiar signature, and a guaranteed accuracy and performance profile that no general-purpose runtime library can match.

The unit provides AVX2/FMA and SSE4.2 hand-optimized assembler implementations of the entire elementary function set. The classical Delphi runtime — and the equivalent in every other compiler — historically delegated the transcendental functions to either the x87 floating-point unit (slow, with quirky 80-bit semantics that don't always interact cleanly with double-precision code) or to platform libm calls (faster, but with the call-overhead cost and dependency on whatever the platform happens to ship). Math387's new implementations bypass both: every elementary function is implemented in inline native SSE2/SSE4.2/AVX2 code that runs in the same register file as the rest of your numerical inner loop, with no x87 round-trips and no external function call overhead.

The result, in practical terms, is that the math calls in your hot loop run between two and ten times faster than they did before — depending on the function — while also being more accurate than what the runtime previously provided. You don't have to choose between speed and accuracy; you get both:

  • Hand-tuned AVX2/FMA implementations for every elementary function. AVX2 with FMA (Fused Multiply-Add) is the instruction set extension that ships on every Intel Core processor from Haswell (2013) onward and every AMD processor from Zen (2017) onward — by now, essentially the entire installed base of x86 hardware. FMA in particular is critical for high-accuracy elementary functions: a fused a*b + c operation rounds once instead of twice, which directly improves the accuracy of polynomial-evaluation kernels at no cost in speed.
  • Hand-tuned SSE4.2 fallback implementations for every elementary function. Older hardware, virtualized environments where AVX2 may not be exposed to the guest, and certain server processors where AVX2 is disabled for power reasons — all of these still get hand-tuned scalar SSE code that significantly outperforms what the standard runtime provides. SSE4.2 is the floor: every 64-bit x86 system shipped since 2008 has it.
  • Runtime CPU dispatch, so your application code doesn't need to know or care which instruction set the host supports. You call Math387.Sin(X); the library has already detected at startup time whether the CPU supports AVX2/FMA, and dispatches each call to the optimal implementation. The same binary runs at full speed on a current-generation desktop and on a decade-old workstation; the only difference is the dispatch path inside the function, which the user never sees.
  • Sub-1-ULP accuracy on every function, every variant, every platform. ULP — "Unit in the Last Place" — is the appropriate accuracy metric for floating-point code: it measures how many representable floating-point values your result is away from the true mathematical answer. A sub-1-ULP result is the best you can possibly do in the target precision: the next-better answer is unrepresentable. Math387's elementary functions are verified to this standard against multi-precision arithmetic ground truth, on every shipped target.
  • Single-precision implementations as first-class citizens, not "the double version with a cast." Every function in the unit has a dedicated single-precision variant (with the f suffix: Sinf, Cosf, Expf, etc.) that is independently tuned, independently coefficient-optimized, and independently verified. Single-precision code that lived on top of double-precision math used to pay double-precision cost; with Math387's pure-single implementations, single-precision code runs roughly twice as fast as it did before.
  • 72.000 lines of assembler spread across 74 distinct math functions adding a mere 200KB to your executable without external dll dependancies. 

Performance: Faster Than Intel's Own Math Library

The honest benchmark for any scalar math implementation is "how does it compare against Intel SVML — the Short Vector Math Library that ships with the Intel C++ compiler?" SVML is the reference: it's hand-tuned by Intel engineers, it has been the gold standard for fast scalar math on x86 for two decades, and every commercial numerical library is measured against it.

Math387 matches or beats Intel libm on the majority of elementary functions across the four standard variants (Win64 + Win32, double + single precision, AVX2 + SSE4.2 paths). The benchmark methodology is the same that Intel uses for SVML: pseudo-random inputs across each function's full mathematical domain, timed against libm_<func>_<core> symbols from libmmt, in a tight loop on a modern Intel desktop processor.

A representative sample of the headline numbers, in nanoseconds per call from AMD 9950X:

Math Function Performance Comparison
FunctionWin64 AVX2SVML AVX2RatioWin64 SSE4.2SVML SSE4.2Ratio
Ln (double) 2.5 3.1 0.83× 2.5 3.6 0.70×
Exp (double) 1.9 2.3 0.80× 2.3 2.4 0.97×
Expf (single) 1.1 1.0 1.10× 1.3 1.7 0.77×
Lnf (single) 1.3 1.1 1.20× 1.6 2.1 0.77×
Sin (double) 5.2 4.9 1.05× 6.3 4.9 1.28×
Cos (double) 4.9 4.9 1.01× 6.9 4.8 1.42×
Cosf (single) 4.1 5.3 0.78× 5.3 6.3 0.85×
ArcTanf (single) 2.1 1.7 1.20× 2.0 1.5 1.35×

The "0.83×" entries mean Math387 is 17% faster than the corresponding Intel SVML call. The "1.0×" entries are statistical parity — within measurement noise. The few entries above 1.0× are within 10-40% of SVML.

These are not benchmark-special numbers — they are the steady-state throughput you get when you call the functions in a real loop, on real data. The benchmark loop is the typical "for each input, accumulate the function's result" pattern that any data-processing application uses; the per-call cost is measured by running millions of iterations and dividing.

Accuracy: Verified Against Multi-Precision Ground Truth

Performance gains that come at the cost of accuracy are not really gains — they are just a different kind of slow, where you have to add more computational work elsewhere to make up for the accuracy you gave up. Math387 does not make this trade.

Every elementary function in the unit is verified, on every shipped target platform, against multi-precision arithmetic ground truth. The reference is the (python) mpmath multi-precision library, configured to evaluate each function to thousands of bits of precision and then correctly round to the target IEEE 754 format. Every function passes a regression suite of 500-1,000 pseudo-random inputs distributed across its full domain, plus a hand-curated set of special-case inputs covering the IEEE corner cases: zero, signed zero, ±∞, NaN, subnormal numbers, the domain boundaries, and the inputs at which range reduction can lose precision.

The accuracy target is sub-1-ULP for every function in the set. ULP measurement is the appropriate metric: it tells you how many representable floating-point values your result is away from the correctly-rounded true answer. A sub-1-ULP result is correct to the last bit: you cannot do better in the target precision than to round the true mathematical answer to the nearest representable float.

In practice, the elementary functions in Math387 achieve typical maximum errors in the range of 0.6 to 1.0 ULP for most functions (verified against mpmath ground truth, measured per-function in the regression suite). A few functions in the set — notably Tan (1.15 ULP measured maximum) and ArcTan2 (1.27 ULP) — fall slightly above 1 ULP but well below 2 ULP. The acceptance threshold for any function shipping in the unit is a documented per-function maximum measured ULP, with the actual measurement available for review.

Why Accuracy Matters

Sub-ULP accuracy is not just a marketing number. In real applications, the difference between a result rounded to the last bit and a result off by several ULP can show up in unexpected places.

Iterative algorithms — Newton-Raphson root finding, gradient descent, fixed-point iteration — accumulate accuracy losses across iterations. With sub-ULP elementary functions, the per-iteration error budget is dominated by the algorithm's own arithmetic, not by the elementary function calls inside it.

Comparison and equality logic — checking whether two computed quantities are equal-up-to-tolerance — depends on the per-call accuracy of the math. Tighter math accuracy lets you tighten your application tolerances, which lets you distinguish smaller real differences in your data.

Backward-stable algorithm proofs — the mathematical guarantees that published numerical algorithms come with — typically assume the elementary functions are correctly rounded (or close to it). Higher-accuracy elementary functions let those proofs transfer more directly to your implementation.

These concerns determine whether a numerical application gives consistent answers across different platforms, hardware generations, and rebuilds — and whether subtle bugs surface in production.

The Full Function Inventory

Math387 ships the full elementary mathematical function set, organized into the following families. Every function listed has both a double-precision variant (the natural name: Sin, Exp, Log) and a single-precision variant (Sinf, Expf, Logf etc., using the f-suffix convention from C99 <math.h>). Complex-number variants (TCplx and TSCplx types) are also provided for the functions where they're mathematically meaningful.

The Logarithm and Exponent Family

  • Ln(x) — natural logarithm.
  • Lnf(x) — natural logarithm, single precision.
  • Log10(x), Log10f(x) — base-10 logarithm.
  • Log2(x), Log2f(x) — base-2 logarithm.
  • LogN(N, x), LogNf(N, x) — logarithm of arbitrary base.
  • Exp(x), Expf(x) — natural exponent (e^x).
  • Exp10(x), Exp10f(x) — base-10 exponent (10^x), 30% faster than the equivalent Power(10, x).
  • Exp2(x), Exp2f(x) — base-2 exponent (2^x), same speedup vs Power(2, x).
  • Expj(omega), Expjf(omega) — Euler's formula: e^(i·ω) returned as a complex number.

The new SVML-competitive assembler implementations apply to all of these. Ln and Exp are the most-called functions in this family for most applications and receive the deepest optimization.

The Trigonometric Family

  • Sin(x), Sinf(x) — sine.
  • Cos(x), Cosf(x) — cosine.
  • Tan(x), Tanf(x) — tangent.
  • SinCos(x, out S, out C), SinCosf(x, out S, out C) — combined sine and cosine, faster than two separate calls.
  • ArcSin(x), ArcSinf(x) — inverse sine.
  • ArcCos(x), ArcCosf(x) — inverse cosine.
  • ArcTan(x), ArcTanf(x) — inverse tangent.
  • ArcTan2(y, x), ArcTan2f(y, x) — two-argument inverse tangent, correctly handling all four quadrants.

Range reduction across all of these uses the Payne-Hanek algorithm for large arguments — the standard correct method that preserves full precision across the entire double-precision dynamic range. The naive fmod(x, 2π) reduction that some library implementations use loses precision catastrophically for arguments above 2²⁰ or so; Math387 returns a meaningful, accurate result for any finite input including astronomically large angles.

The Hyperbolic Family

  • Sinh(x), Sinhf(x) — hyperbolic sine.
  • Cosh(x), Coshf(x) — hyperbolic cosine.
  • Tanh(x), Tanhf(x) — hyperbolic tangent.
  • SinhCosh(x, out S, out C), SinhCoshf(x, out S, out C) — combined hyperbolic sine and cosine in one call.
  • ArcSinh(x), ArcSinhf(x) — inverse hyperbolic sine.
  • ArcCosh(x), ArcCoshf(x) — inverse hyperbolic cosine.
  • ArcTanh(x), ArcTanhf(x) — inverse hyperbolic tangent.

The SinhCosh combined function is particularly noteworthy: computing Sinh(x) and Cosh(x) separately means evaluating exp(x) twice (once for each); SinhCosh evaluates it once and recombines the result, cutting the work roughly in half. Any application that needs both quantities for the same argument should call SinhCosh instead of separate Sinh and Cosh.

The Reciprocal Trig and Hyp Families

These functions are conspicuously absent from most standard math libraries — including the Delphi Math unit, .NET's System.Math, and the C++ standard library. Math387 provides them as first-class citizens with the same care as the standard set.

  • Sec(x), Secf(x) — secant: 1/cos(x).
  • Csc(x), Cscf(x) — cosecant: 1/sin(x).
  • Cot(x), Cotf(x) — cotangent: cos(x)/sin(x).
  • ArcSec(x), ArcSecf(x) — inverse secant.
  • ArcCsc(x), ArcCscf(x) — inverse cosecant.
  • ArcCot(x), ArcCotf(x) — inverse cotangent.
  • Sech(x), Sechf(x) — hyperbolic secant: 1/cosh(x).
  • Csch(x), Cschf(x) — hyperbolic cosecant: 1/sinh(x).
  • Coth(x), Cothf(x) — hyperbolic cotangent: cosh(x)/sinh(x).
  • ArcSech(x), ArcSechf(x) — inverse hyperbolic secant.
  • ArcCsch(x), ArcCschf(x) — inverse hyperbolic cosecant.
  • ArcCoth(x), ArcCothf(x) — inverse hyperbolic cotangent.

These functions appear naturally in physics (relativistic kinematics, optics), engineering (transmission-line equations, antenna theory), navigation (Mercator-projection conversions), and statistics (Cauchy and Student-t distribution computations). Code that previously had to compose them by hand — 1.0 / Cos(x) for secant, with the implicit numerical concerns around Cos(x) near a zero — can now call the appropriately-named function and get a numerically-stable, accuracy-verified result.

The Power and Root Family

  • Sqrt(x), Sqrtf(x) — square root, dispatched to the hardware SQRTSD/SQRTSS instruction.
  • Power(base, exp), Powerf(base, exp) — general power function for real-valued exponents.
  • IntPower(base, exp), IntPowerf(base, exp) — integer-exponent power, an order of magnitude faster than Power for the common case where the exponent is an integer.
  • Pythag(x, y), Pythagf(x, y) — numerically-stable hypotenuse computation, sqrt(x² + y²) without overflow or underflow even for extreme inputs.

The Rounding and Conversion Family

  • Ceil(x), Floor(x) — ceiling and floor as double results.
  • CeilToInt(x), FloorToInt(x), RoundToInt(x), TruncToInt(x) — rounding directly to native machine Integer (32-bit) with no intermediate float-to-int overhead.
  • CeilToInt64(x), FloorToInt64(x), RoundToInt64(x), TruncToInt64(x) — the 64-bit-integer versions.
  • TruncAndFrac(x, out Fraction), TruncAndFracToInt(x, out Fraction) — simultaneous integer-part and fractional-part extraction.
  • Rem(x, y) — IEEE-754-conformant remainder operation (different from the language's mod operator on negative inputs).
  • FixAngle(theta) — fast angle reduction to the principal value [-π, π], useful for angle accumulation in control loops.
  • DegToRad(deg), RadToDeg(rad) — unit conversion.

The Integer Math Helpers

  • GcdInt64(a, b) — greatest common divisor, native 64-bit integer.
  • LcmInt64(a, b) — least common multiple, native 64-bit integer.

These are not transcendental functions, but they are commonly needed in numerical code and Math387 provides them with the same performance discipline as the rest of the unit

Complex-Number Math, Single and Double Precision

A feature that sets Math387 apart from essentially every standard runtime math library: complex-number variants of the elementary functions are provided as first-class citizens, in both single and double precision.

Standard runtime math libraries — Delphi's Math unit, the C/C++ standard library, .NET's System.Math, the various platform libm implementations — provide only real-valued elementary functions. C99 introduced complex number support via <complex.h>, but it remains an opt-in extension that many compilers and platforms only partially implement, and even where supported it's typically slower than the real-valued counterparts because no special tuning has been done for it. C++'s std::complex is a generic template wrapper around real-valued math; it is correct but performance-incidental. The result, in practice, is that developers who need complex-number transcendentals end up implementing them by hand — a recipe for accuracy bugs in the corner cases (NaN-and-Inf handling, branch-cut conventions, near-zero behavior).

Math387 ships these as proper library functions. Two complex types are provided:

  • TCplx — double-precision complex, two double fields.
  • TSCplx — single-precision complex, two single fields.

Both are Pascal record types (not classes), so they live on the stack and have zero allocation cost. Operator overloading is supported in Delphi for the natural arithmetic — Z := A * B + C reads the way the math reads. The records pass by value in the standard calling convention; the slightly larger TCplx (16 bytes) is passed by const-reference under the hood for efficiency, transparently to the user.

Complex-Variant Functions

The elementary set has complex-typed overloads. Both TCplx and TSCplx overloads are provided where listed; naming convention is that real-domain function names get a complex-argument overload (Sin(TCplx), Exp(TCplx)), while complex-specific operations carry a C-prefix or a descriptive name:

  • Transcendentals (overload by complex arg): Exp(z), Ln(z), Log10(z), Log2(z), Power(a, z), Power(z, b), Power(z1, z2).
  • Complex square root: CSqrt(z) (the dedicated complex sqrt — Sqrt is real-only).
  • Trigonometric (overload by complex arg): Sin(z), Cos(z), Tan(z), ArcSin(z), ArcCos(z), ArcTan(z). Branch cuts follow the standard IEEE 754 / C99 conventions.
  • Hyperbolic (overload by complex arg): Sinh(z), Cosh(z), Tanh(z), ArcSinh(z), ArcCosh(z), ArcTanh(z).
  • Reciprocal trig and hyp (overload by complex arg): Sec(z), Csc(z), Cot(z), Sech(z), Csch(z), Coth(z), and their inverses — families almost universally absent from standard runtime libraries even for the real case, let alone complex.
  • Magnitude and phase: CAbs(z) (modulus, |z|, returns a double/single), Arg(z) (argument / phase in radians, returns a real), SqrAbs(z) (squared magnitude, faster than CAbs(z) * CAbs(z) and without the intermediate square root), Norm(z) (squared magnitude under the mathematical-norm convention).
  • Coordinate conversion: CartToPolar(z): TCplx (returns a complex whose .Re field is the magnitude and .Im is the phase angle in radians); PolarToCart(z): TCplx (inverse operation, takes a TCplx packed as magnitude+phase and returns the Cartesian form).
  • Euler's formula: Expj(omega: double): TCplx returns e^(i·omega); Expjf(omega: single): TSCplx is the single-precision counterpart. Natural primitive for frequency-domain operations, phasor computation, and complex-exponential generators.
  • Complex arithmetic helpers: CInv(z) (reciprocal, 1/z), CSqr(z) (square, z*z, dedicated routine that can use a tighter implementation than the operator-overload path), CDiv(a, b) (division — with multiple overloads: complex/complex, real/complex, complex/real). The unit also includes CAbs1 (1-norm |Re| + |Im|, a faster alternative to CAbs when the precise Euclidean magnitude isn't required) and CInvMulI (a phase-rotation primitive — see the documentation for the exact form).
  • Complex hypotenuse: Pythag(z1, z2): TCplx — the complex overload of the real-valued Pythag, computing sqrt(z1² + z2²) on complex inputs.

Why Single-Precision Complex Matters

Double-precision complex (TCplx) is the natural default for scientific computing — it gives you the same 15-17 decimal digits in the real and imaginary parts that you'd expect from double arithmetic. But applications that process large complex-valued datasets — radar I/Q signal pipelines, communications-system simulations, large-scale spectral analysis, computer-graphics complex-arithmetic kernels — frequently want to use single precision to halve their memory footprint and double their effective SIMD throughput. For these applications, the existence of a properly-tuned single-precision complex type (TSCplx) means you don't have to choose between "wasting memory on double precision when single is fine" and "writing your own complex-arithmetic helpers because the library doesn't provide them."

TSCplx is a 8-byte record (two single fields), passes through SSE registers naturally, and supports the same full set of elementary function overloads as TCplx. The single-precision complex variants are independently tuned — they are not "the double versions cast to single" — and have their own per-function accuracy and performance characteristics.

What This Enables

Code that involves complex numbers is much shorter and clearer when the library supports them properly. A few representative examples:

// Phasor representation of a sinusoid at angular frequency Omega:
var Phasor: TCplx;
Phasor := Math387.Expj(Omega * T);
// Complex magnitude of a phasor sum (numerically stable):
var Magnitude: Double;
Magnitude := Math387.CAbs(Math387.Pythag(Z1, Z2));
// Cartesian-to-polar conversion in one call (returns a TCplx
// packed as Re=magnitude, Im=phase-in-radians):
var Polar: TCplx;
Polar := Math387.CartToPolar(Z);
// Now Polar.Re is |Z| and Polar.Im is arg(Z).

The same operations expressed with hand-rolled complex arithmetic on top of real-valued math are several times longer, harder to read, and substantially more error-prone in the corner cases. Math387 makes complex-valued elementary math as natural in Delphi/C++ Builder code as it would be in MATLAB or Mathematica — without leaving the native compiled-language environment.

How to Use Math387

Using Math387 in your code is as simple as uses Math387; in Delphi or the equivalent #include <Math387.hpp> in C++ Builder.

uses
  Math387;
procedure ProcessSamples(const Input: TArray&lt;Double&gt;; var Output: TArray&lt;Double&gt;);
var
  I: Integer;
  Phase, Magnitude: Double;
begin
  for I := 0 to High(Input) do
  begin
    Phase := Math387.ArcTan2(Input[I], 1.0);
    Magnitude := Math387.Exp(-Phase * Phase) * Math387.Sin(Phase);
    Output[I] := Magnitude;
  end;
end;

That's it. There is no initialization step, no configuration, no separate runtime to start. The first time your application calls any Math387 function, the unit detects the host CPU's instruction-set support (AVX2/FMA presence) and configures the per-function dispatch tables accordingly. From then on, every call goes through the optimal code path for the host hardware, with no per-call overhead from the dispatch.

The unit is thread-safe: multiple threads can call any combination of Math387 functions simultaneously, with no internal synchronization in the function bodies. The dispatch state is set up once at startup and read-only after that, so concurrent reads from many threads never contend.

The unit qualifies as a drop-in replacement for most existing code that uses the standard Math unit. Replace uses Math with uses Math387 (or add Math387 to the import list alongside Math), and existing calls to Sin, Cos, Exp, Ln, Sqrt, Power, ArcTan2, and the rest pick up the Math387 implementations through Delphi's name-resolution rules. Existing code typically compiles unchanged and immediately gets the performance and accuracy benefits.

For new code, qualify your function calls explicitly: Math387.Sin(x) rather than Sin(x). This makes the dependency explicit, makes the source of the implementation unambiguous to future readers, and avoids any name-resolution surprises if you import multiple math-related units.

Cross-Platform: Same Code, Every Target

Math387's new assembler implementations are written and tuned for the x86_64 instruction set with AVX2 and SSE4.2 dispatch on Windows.

On non-x86_64 targets — ARM64 desktops, mobile devices, embedded systems — Math387 dispatches to platform-appropriate implementations: ARM NEON-optimized routines where the algorithm benefits from SIMD, and platform libm calls where the platform's own math library is already well-tuned. The user-visible API is identical across all of these: Math387.Sin(x) works the same way on every supported target, with results that are bit-equal-up-to-the-documented-ULP-tolerance across platforms.

This matters for cross-platform deployment. An application that uses Math387 in its Windows version can be cross-compiled to macOS or Linux without re-engineering the math layer. The same source code runs natively on every target; the same accuracy guarantees apply on every target; the runtime characteristics are predictable on every target.

For pure x86_64 desktop Windows OS deployment — the vast majority of professional applications — Math387 delivers the hand-tuned vendor-class performance described above. For mobile and embedded deployment, you still get the API consistency and the accuracy guarantees, with the per-platform performance dictated by the platform's underlying math infrastructure.

Compatibility, Stability, and What You Gain

Math387 is a long-standing component of MtxVec. The unit has shipped through Delphi releases from XE6 (PV20) through the current Athens (PV37), with C++ Builder and .NET versions tracking the same release cadence. 

The user-visible effect is that the same code, recompiled, runs significantly faster on the same hardware. If your application currently uses Math387, just upgrade and observe the performance step. If your application uses the standard Math unit, add Math387 to your imports and watch the elementary function calls speed up. Delphi introduced AVX2 support for assembler with Rad Studio 11.0 (Alexandria). Consequently the Delphi versions released before Alexandria only support SSE42 and what is older than Delphi XE6 is not covered. For Delphi versions newer than Alexandria, the library does automatic dispatch between AVX2 and SSE42 depending on what the hardware on which the code is running supports. The hardware without SSE42 instructions is not supported and you would need to use an older version of the MtxVec library to deploy to such devices. 

The Math387 unit is part of the standard MtxVec distribution. Licensing terms match the rest of MtxVec: commercial per-developer licensing with a fully-functional evaluation version available at any time. Source-available licensing is offered for customers in regulated industries or with audit requirements; the Math387 assembler is delivered as Pascal-callable inline-assembler code that any developer can read, understand, and audit.

What This Release Delivers in Practice

The combination of all of the above adds up to a tangible change in what's feasible at the application level. Briefly:

  • Your transcendental function calls are no longer the bottleneck in your hot loops. The Sin and Exp calls that used to dominate your profile now run two to ten times faster. Algorithmic restructuring you previously did to avoid math calls (precomputed lookup tables, polynomial approximations rolled by hand, batching transcendentals to amortize call overhead) can be undone — the math calls are fast enough to invoke directly.
  • Your accuracy is verified, not hoped-for. Sub-1-ULP results across the entire elementary function set, on every shipped variant, mean that the numerical correctness of your application is grounded on the same foundation that the published mathematical-software literature assumes. No more chasing subtle accumulated-error bugs that come from 3-ULP math under the hood.
  • Your code becomes shorter and clearer. The reciprocal trig and reciprocal hyperbolic families that are missing from standard libraries are right there in Math387 with their natural names. The combined SinhCosh and SinCos functions let you both outputs compute with the cost of one. The Pythag function gives you numerically-stable vector magnitudes without you having to write the scaling logic yourself.
  • Your application becomes more portable. The same Math387 API works identically on Windows, Linux, macOS, iOS, and Android, with the same accuracy guarantees on every target. The numerical layer stops being the part of your code that fights cross-platform deployment.
  • Your team can focus on the application-specific algorithms, not on the foundations. The numerical foundation is solved, verified, fast, and portable. The interesting engineering — the algorithms that distinguish your product from competitors — is where you spend your time, not on re-inventing Sin and Exp

A Migration Example

Suppose you have existing Delphi code that uses the standard Math unit:

uses Math, SysUtils;
function GaussianAt(const X, Mu, Sigma: Double): Double;
var
  Z, Norm: Double;
begin
  Z := (X - Mu) / Sigma;
  Norm := 1.0 / (Sigma * Sqrt(2.0 * Pi));
  Result := Norm * Exp(-0.5 * Z * Z);
end;

To migrate to Math387, change the import:

uses Math387, SysUtils;
function GaussianAt(const X, Mu, Sigma: Double): Double;
var
  Z, Norm: Double;
begin
  Z := (X - Mu) / Sigma;
  Norm := 1.0 / (Sigma * Math387.Sqrt(2.0 * Pi));
  Result := Norm * Math387.Exp(-0.5 * Z * Z);
end;

Calls to Sqrt and Exp now route through Math387's optimized implementations. The code is otherwise unchanged. The per-call cost of Exp and Sqrt is bounded by the documented per-function performance numbers above; the application-level wall-clock difference will depend on what fraction of your hot loop is spent in these calls — easily measurable by profiling before and after.

For applications that already extensively use the standard Math unit, the qualified-call pattern (Math387.Sin rather than Sin) is recommended for the inner-loop calls where the performance matters; the rest of the application can continue to use the standard unit if it prefers. Mixing the two is fine — Math and Math387 have no namespace conflicts.

Try It

Math387 ships as part of the standard MtxVec distribution. Download the evaluation, add Math387 to a representative numerical hot loop in your application, and measure the difference. The performance step is immediate; the accuracy improvement is silent but present.

For questions about specific functions, performance characteristics on your target hardware, or integration patterns with your existing codebase, contact This email address is being protected from spambots. You need JavaScript enabled to view it.. The team is happy to help you evaluate whether Math387 meets your application's needs.

The math is the foundation. Make sure it's solid.