Case study: Ultrasonic Reference Block Study
NIST info: This example illustrates the construction of a non-linear regression model for ultrasonic calibration data. This case study demonstrates fitting a non-linear model and the use of transformations and weighted fits to deal with the violation of the assumption of constant standard deviations for the errors. This assumption is also called homogeneous variances for the errors. Full description can be found here.
Data: The ultrasonic reference block data consist of a response variable and a predictor variable. The response variable is ultrasonic response and the predictor variable is metal distance. These data were provided by the NIST scientist Dan Chwirut (avaiable here).
In this example we will follow the suggested steps, listed at NIST pages.
1) Read in the data: For non-linear regression we'll use tMtxNonLinReg control. First we have to create the control and populate it's X and Y vectors with data.
2) Plot the y=y(x) chart to get an idea about the fit model. For this, you can use the MtxVecTee.DrawValues method. These two steps can be acomplished by the following code class:
Uses MtxVec, StatTools, Math387, MtxVecTee;
procedure FitModel;
var nlr: TMtxNonLinReg;
begin
nlr := TMtxNonLinReg.Create(nil);
Chart1.FreeAllSeries; Chart1.AddSeries(TLineSeries.Create(Chart1));
try
nlr.X.LoadFromFile('xdata.vec');
nlr.Y.LoadFromFile('ydata.vec');
DrawValues(nlr.X,nlrY,Chart1.Series[0]);
#include "MtxVecCpp.h"
#include "StatTools.hpp"
#include "Math387.hpp"
#include "MtxVec.hpp"
#include "MtxVecTee.hpp"
void FitModel(void); {
TMtxNonLinReg* nlr = new TMtxNonLinReg(NULL);
Chart1->FreeAllSeries();
Chart1.AddSeries(new TLineSeries(Chart1));
try {
nlr->X->LoadFromFile("xdata.vec");
nlr->Y->LoadFromFile("ydata.vec");
DrawValues(nlr->X,nlr->Y,Chart1->Series[0],0,1,false);
using Dew.Math;
using Dew.Math.Units;
using Dew.Stats.Units;
using Dew.Stats;
namespace Dew.Tests
{
private void FitModel() {
TMtxNonLinReg nlr = new TMtxNonLinReg(null);tChart1.Series.Clear();
tChart1.Series.Add(new Steema.TeeChart.Styles.Line(tChart1.Chart));
try {
nlr.X.LoadFromFile("xdata.vec");
nlr.Y.LoadFromFile("ydata.vec");
MtxVecTee.DrawValues(nlr.X, nlr.Y, tChart1[0],0,1,false);
The y=y(x) plot shows an exponentially decaying pattern in the data. This suggests that some type of exponential function might be an appropriate model for the data. As suggested, the Exp(-a*x)/(b+c*x) function is a good choice. To perform a non-linear fit, we'll have to define the regression function and starting values for regression routine. We will determine starting values based on a grid of the parameter values. As suggested, in this case te grid is 0.1 to 1.0 in increments of 0.1, where the best parameter values minimize residual sum of squares In this case we will also "vectorize" the regression function so that all y(x) values will be calculated in one go (speed gain for large sample sizes). Below is the implementation:
// Scalar version => simpler, but slower for multiple x
function RegFun(const b: Array of TSample; x: TSample): TSample; overload;
begin
Result := Exp(-b[0]*x)/(b[1]+x*b[2]);
end;
// Vector version => more complex, but faster for multiple x
procedure RegFun(B,X,Y: TVec); overload;
var wrk1:Vector;
begin
// wrk1 = Exp(-B[0]*x)
wrk1.Normalize(X,0.0,-1.0/B.Values[0]);
wrk1.Exp;
// 1/(B[1]+B[2]*x)
Y.Copy(X);
Y.Mul(B.Values[2]);
Y.Add(B.Values[1]);
Y.Inv;
// Y = Exp(-B[0]*x)/(B[1]+B[2]*x)
Y.Mul(wrk1);
end;
procedure InitialEstimates(X,Y: TVec; B: TVec);
var rss, rssmin: TSample;
i: Integer;
parval: TSample;
wrk1,Residuals,YCalc: Vector;
begin
wrk1.Size(b);
wrk1.SetZero;
b.Copy(wrk1);
rssmin := 1.0E+10;
// iterate for all parameters
for i := 0 to b.Length - 1 do
begin
// evaluate on interval [0,1], step = 0.1
parval := 0.0;
while (parval<=1.0) do
begin
wrk1.Values[i] := parval;
// evaluate function (vectorized version => faster)
RegFun(wrk1,X,YCalc);
// calculate RSS
Residuals.Sub(YCalc,Y);
rss := Residuals.SumOfSquares;
parval := parval + 0.1;
// minimal rss ?
if (rss<rssmin) then
begin
b.Copy(wrk1);
rssmin := rss;
end;
end;
end;
end;
// Scalar version => simpler, but slower for multiple x
void double RegFun(const double* b, double x);
{
return Math::Exp(-b[0]*x)/(b[1]+x*b[2]);
}
// Vector version => more complex, but faster for multiple x
void RegFun(Vector B,Vector X,Vector Y);
{
Vector wrk1;
// wrk1 = Exp(-B[0]*x)
wrk1->Normalize(X,0.0,-1.0/B->Values[0]);
wrk1->Exp();
// 1/(B[1]+B[2]*x)
Y->Copy(X);
Y->Mul(B->Values[2]);
Y->Add(B->Values[1]);
Y->Inv();
// Y = Exp(-B[0]*x)/(B[1]+B[2]*x)
Y->Mul(wrk1);
}
void InitialEstimates(Vector X,Vector* Y, Vector B);
{
Vector residuals, wrk2, ycalc;
wrk1->Size(b);
wrk1->SetZero();
b->Copy(wrk1);
double rssmin = 1.0e+10;
double parval,rss;
// iterate for all parameters
(for int i=0; iLength; i++)
{
// evaluate on interval [0,1], step = 0.1
parval = 0.0;
while (parval<=1.0)
{
wrk1->Values[i] = parval;
// evaluate function (vectorized version => faster)
RegFun(wrk1,X,ycalc);
// calculate RSS
residuals->Sub(ycalc,Y);
rss = Residuals->SumOfSquares();
parval += 0.1;
// minimal rss ?
if (rss<rssmin)
{
b->Copy(wrk1);
rssmin = rss;
}
}
}
}
// Scalar version
double RegFun(double[] b, double x)
{
return Math.Exp(-b[0]*x)/(b[1]+b[2]*x);
}
// Vectorized version
void RegFun(TVec B, TVec X, TVec Y)
{
Vector wrk1 = new Vector(0);
// wrk1 = Exp(-B[0]*x)
wrk1.Normalize(X,0.0,-1.0/B.Values[0]);
wrk1.Exp();
// 1/(B[1]+B[2]*x)
Y.Copy(X);
Y.Mul(B.Values[2]);
Y.Add(B.Values[1]);
Y.Inv();
// Y = Exp(-B[0]*x)/(B[1]+B[2]*x)
Y.Mul(wrk1);
}
void InitialEstimates(TVec X, TVec Y, TVec B)
{
Vector wrk1 = new Vector(0);
Vector residuals = new Vector(0);
Vector ycalc = new Vector(0);
wrk1.Size(B);
wrk1.SetZero();
B.Copy(wrk1);
double rssmin = 1.0E+10;
double rss, parval;
for (int i=0; i<B.Length; i++)
{
parval = 0.0;
while (parval<=1.0)
{
wrk1.Values[i] = parval;
// evaluate function (vectorized version => faster)
RegFun(wrk1,X,ycalc);
// calculate RSS
residuals.Sub(ycalc,Y);
rss = residuals.SumOfSquares();
parval += 0.1;
// minimal rss ?
if (rss<rssmin)
{
B.Copy(wrk1);
rssmin = rss;
}
}
}
}
3) Perform non-linear regression on data: Now that we have the regression function and initial estimates for regression coefficients, we can perform the actual non-linear regression.
procedure FitModel;
var nlr: TMtxNonLinReg;
residuals: Vector;
rss: TSample;
begin
nlr := TMtxNonLinReg.Create(nil);
try
// load data
nlr.X.LoadFromFile('x.vec');
nlr.Y.LoadFromFile('y.vec');
// Define and connect to regression function
nlr.B.Length := 3;
nlr.RegressFunction := RegFun;
// initial values for coefficients
InitialEstimates(nlr.X, nlr.Y,nlr.B);
// Perform non-linear regression
nlr.Recalc;
// Calculate residuals
residuals.Sub(nlr.YCalc,nlr.Y);
rss := residuals.SumOfSquares;
finally
nlr.Free;
end;
end;
void FitModel(void);
{
Vector residuals;
TMtxNonLinReg* nlr = new TMtxNonLinReg(NULL);
try
{
nlr->LoadFromFile("x.vec");
nlr->LoadFromFile("y.vec");
// connect to regression function
nlr->RegresFun = RegFun;
// initial estimates for coefficients
nlr->B->Length = 3;
InitialEstimates(nlr->X,nlr->Y,nlr->B);
// perform non-linear regression
nlr->Recalc();
residuals->Sub(nlr->YCalc,nlr->Y);
double rss = residuals->SumOfSquares();
}
__finally
{
nlr->Free();
}
}
private void FitModel()
{
TMtxNonLinReg nlr = new TMtxNonLinReg(this);
Vector residuals = new Vector(0);
// load data
nlr.X.LoadFromFile(@".\Data\x.vec");
nlr.Y.LoadFromFile(@".\Data\y.vec");
// connect to regression function
nlr.RegressFun += new Dew.Stats.TRegressFun(RegFun);
// initial values for coefficients
nlr.B.Length = 3;
InitialEstimates(nlr.X,nlr.Y,nlr.B);
// Perform non-linear regression
nlr.Recalc();
// Calculate residuals
residuals.Sub(nlr.YCalc,nlr.Y);
double rss = residuals.SumOfSquares();
}
Additional things to try :
- Try transforming the y values before applying the non-linear regression. Suggested transformations: sqrt(y), ln(y), 1.0/y. This can be done with a single line of code class: nlr.Y.Sqrt();
- Perform a weigted regression on the same data. Define weights as specified at NIST pages.
- Plot data, fitted data and residuals by using DrawValues routine.
- Plot normal probability plot of residuals.
- Plot histogram of residuals.
- Compare the results with quoted results at NIST pages.