You are here: Symbol Reference > Regress Namespace > Functions > Regress.StepwiseRegression Function
Stats Master VCL
ContentsIndex
PreviousUpNext
Regress.StepwiseRegression Function

Stepwise regression is an optimization aglorithm aiming to improve the quality of the multiple linear regression by excluding noisy variables.

Pascal
procedure StepwiseRegression(const aList: TVecList; const stdDevA: TVec; const sMethod: TStepwiseMethod; const VariableMask: TVecInt; const reportSSE: TMtx; const reportCoeff: TMtx; const MaxIter: integer = 1000; const InitMask: boolean = true; const CriteriaFun: TStepwiseQualityCriteria = nil; const CriteriaOwner: TObject = nil);
Parameters 
Description 
aList 
Contains all independent variables and the dependent variable as the last item in the list. 
stdDevA 
Holds standard deviation of all aList items on input. 
sMethod 
Specifies the stepwise regression method. 
VariableMask 
Length must be equal to number of independent variables (aList.Count-1). This vector needs to be allocated in "bit" mode: VariableMask.BitCount := NumberOfIndependentVars;  
reportSSE 
Matrix size of IterCount x (VarCount + 7). Each row starts with Step number followed by selection list of variables in columns followed by Standard Error (quality criteria). We strive to reduce standard error and the model with the smallest standard error is considered best. Additional columns are as follows:
  • [0] Standard error = sqrt(SSE/dFE), or custom quality criteria
  • SSE = Residual sum of squares.
  • [2] SSR = Regression sum of squares.
  • [3] SST = Total sum of squares = SSE + SSR
  • [4] R2 = Coefficient of determination
  • [5] Adjusted R2 = Adjusted coefficient of determination
  • [6] MSE = Residual variance
 
reportCoeff 
Matrix size of (IterCount*VarCount) x 5. Each iteration adds the independent variable count rows. The columns are as follows:
  • [0] Iteration Step
  • [1] Variable index
  • [2] variable selection where 0 means excluded and 1 means included.
  • [3] holds the normalized coeffients and Fourth column the
  • [4] corresponding t-values for each coefficient
  • [5] two tailed p-values. Bigger p-values suggest the probability that the model would better, if the variable would be excluded.
 
MaxIter 
Limits the maximum number of iterations. The function will raise an exception if this limit is reached. 
InitMask 
If True, the VariableMask will be initialized to all vars excluded for Forward search and all vars included for Backward search. If False, the search can start with preselected variables within VariableMask.  
CriteriaFun 
Optional extra callback function to use quality criteria other than the default "Standard Error" 
CriteriaOwner 
An optional object parameter to be passed to the CriteriaFun 

Optimal result is possible only when using the "exhaustive" search method, which will check all posibilities. After the final variable selection has been obtained, run the MulLinRegress followed by RegressTest, if detailed statistics data is required. 

There are many methods to solve this problem. This function implements four approches: exhaustive, forward, backward and stepwise. For models with less than 15 variables, the exhaustive search is the recommended method. Alternatively it is possible to perform "backward search" by starting with all and removing one by one variable or "forward search" by starting with none and adding one by one variable. Both backward and forward search can have selected variables already pre-included (or pre-excluded). Single step mode allows the user to manually include or exclude individual variables from the model after each step. 

To use quality criteria other than default "Standard Error", you can pass extra callback with the CriteriaFun. The return value will be used to determine, if the result is better or worse and a smaller value will be considered better.

We are looking for best fit of:  

b0 + b1*x1 + b2*x2 + b3*x3 + b4*x4 + b5*x5 = y  

where b1..b5 can be either zero or not, but would like to know which are best set to zero. Last column of aSrc in the code example below is the dependent variable (y). 

 

Uses MtxExpr, Regress, StatTools, Math387;
procedure TForm78.RunButtonClick(Sender: TObject);
var aSrc, reportCoeff, reportSSE: Matrix;
    stdDevA: Vector;
    aList: TVecList;
    i: integer;
    bi: VectorInt;
    sMethod: TStepwiseMethod;
begin
    Memo.Lines.BeginUpdate;
    Memo.Lines.Clear;

    aList := TVecList.Create;
    try
        aSrc.SetIt(15,6,false, [83,34, 65, 63, 64, 106,
                                73, 19, 73, 48, 82, 92,
                                54, 81, 82, 65, 73, 102,
                                96, 72, 91, 88, 94, 121,
                                84, 53, 72, 68, 82, 102,
                                86, 72, 63, 79, 57, 105,
                                76, 62, 64, 69, 64, 97,
                                54, 49, 43, 52, 84, 92,
                                37, 43, 92, 39, 72, 94,
                                42, 54, 96, 48, 83, 112,
                                71, 63, 52, 69, 42, 130,
                                63, 74, 74, 71, 91, 115,
                                69, 81, 82, 75, 54, 98,
                                81, 89, 64, 85, 62, 96,
                                50, 75, 72, 64, 45, 103]);

        aList.DecomposeColumnMatrix(aSrc);
        stdDevA.Size(aList.Count);
        for i := 0 to aList.Count-1 do stdDevA[i] := aList[i].StdDev;
        bi.BitCount := aList.Count-1;
        sMethod := swBackward;

        StepwiseRegression(aList, stdDevA, sMethod, bi, reportSSE, reportCoeff);

        Memo.Lines.Add('');
        reportSSE.ValuesToStrings(Memo.Lines, '', ftaRightAlign, '0.###', '0.###', true);
        Memo.Lines.Add('');
        reportCoeff.ValuesToStrings(Memo.Lines, '', ftaRightAlign, '0.###', '0.###', true);
        Memo.Lines.Add('');
    finally
        Memo.Lines.EndUpdate;
        aList.Free;
    end;
end;
Examples on GitHub
Copyright (c) 1999-2025 by Dew Research. All rights reserved.
What do you think about this topic? Send feedback!