You are here: Symbol Reference > MtxForLoop Namespace > Functions > MtxForLoop.ClusteredKNN Function
MtxVec VCL
ContentsIndex
PreviousUpNext
MtxForLoop.ClusteredKNN Function

Exact brute-force K-NN with support for symmetric and asymetric multi-threading.

Pascal
procedure ClusteredKNN(computeThreads: TMtxForLoop; const srcData: TMtx; const srcCompareData: TMtx; const K: integer; const dstNearIdx: TMtxInt; const dstNearDistance: TMtx); overload;
Parameters 
Description 
computeThreads 
A TMtxFoorLoop object, which is a threading library. ComputeThreads.BlockGranularity is required to be set to 4 for this algorithm. Additionally this object allows the user to specify thread-count and thread-affinity etc..  
srcData 
A matrix where each row is one vector of the original data.  
srcCompareData 
A matrix where each row is one vector. We want to find K vectors from srcData for each srcCompareData vector.  
The number of nearest neighbours/rows from SrcData to find for each row in SrcCompareData.  
dstNearIdx 
Contains K indexes of rows from srcData for each row in SrcCompareData on output. The actual distances are stored in dstNearDistance parameter.  
dstNearDistance 
Contains K distance of rows from srcData for each row in SrcCompareData on output. The distances are not sorted.  

K-nearest-neighbours will be computing the square of L2-Norm (euclidian norm or the distance) between rows in srcData and rows in srcCompareData to find K nearest rows from srcData for each row in the srcCompareData matrix. The SrcData.Cols and SrcCompareData.Cols need to match in size or an exception will be raised. The sizes of dstNearIdx and dstNearDistance are set automatically. 

Performance notes:

  • The ideal ComputeThreads.ThreadCount passed to the routine will equal the number of hyper-threaded cores.
  • The algorithm assumes existence of 32KB L1 cache. This is true for all Intel CPUs since Sandy Bridge and AMD Zen. The performance will drop by roughly 10x, if not present.
  • MtxVec.Controller.ThreadDimension is to be set to value larger than ComputeThreads.ThreadCount or an error will be raised.
  • computeThreads.BlockGranularity needs to be 4 or an error will be raised.
  • Multi-threading is performed only across the srcCompareData rows. If that matrix has less than 10-20 rows, the user can still perform multi-threading in his own code. The maximum performance gain in this case however will scale with thread count only up to the systems memory channel count (2-8).
  • The algorithm is set to scale linearly with core count (32, 64, 128 etc...), provided that srcCompareData.Rows is large enough.
  • For a 950 000 x 950 000 problem with, K = 50, Vector length = 30, 32bit float, on 8 SkyLake cores, the expected time to finish is 90s.
  • The speed-up over naive and not threaded KNN implementation starts at around 10x for small problems in grows to 1000x and more for very large problems.
  • Double precision is about 1.5x slower than single precision.
uses MtxExpr, MtxVec, MtxExprInt, MtxForLoop; var Data, cmpData, nearDistance: Matrix; nearIdx: MatrixInt; threads: TMtxForLoop; begin // 1.) Create once ahead of time. This is the thread pool and is slow to create. threads := TMtxForLoop.Create; threads.ThreadCount := Controller.CpuCores*2; //hyperthreaded core count threads.BlockGranularity := 4; //required // 2.) This sets the "minimum" required. Also slow and needs to be set upfront and // it can only be done before any allocations are made by MtxVec. Controller.ThreadDimension := Max(threads.ThreadCount + 1, Controller.ThreadDimension); // 3.) Compiler assertions switch needs to be "off", when compling the source code data := [[1,2,3], [2,3,1], [3,2,1]]; cmpData := [[2,2,3], [2,3,4]]; ClusteredKNN(threads, Data, cmpData, 2, nearIdx, nearDistance); // Results: // nearIdx = [[0, 1], // data rows 0 and 1 are closest vectors to cmpData row 0. // [0, 1]] // data rows 0 and 1 are closest vectors to cmpData row 1. // nearDistance = [[1, 5], // [0,0] = sqr(2-1) + sqr(2-2) + sqr(3-3), distance from cmpData row 0 to Data row 0 is 1 // [3, 9]] // [1,0] = sqr(2-1) + sqr(3-2) + sqr(4-3), distance from cmpData row 1 to Data row 0 is 3 threads.Free;
Examples on GitHub
Copyright (c) 1999-2025 by Dew Research. All rights reserved.
What do you think about this topic? Send feedback!