## What is the Most Efficient Way to Compute the (euclidean) Distance of the Nearest Neighbor in a List of (x,y,z) points?

What is the most efficient way compute (euclidean) distance of the nearest neighbor for each point in an array?

I have a list of 100k (X,Y,Z) points and I would like to compute a list of nearest neighbor distances. The index of the distance would correspond to the index of the point.

I've looked into PYOD and sklearn neighbors, but those seem to require "teaching". I think my problem is simpler than that. For each point: find nearest neighbor, compute distance.

Example data:

points = [ (0 0 1322.1695 0.006711111 0 1322.1696 0.026844444 0 1322.1697 0.0604 0 1322.1649 0.107377778 0 1322.1651 0.167777778 0 1322.1634 0.2416 0 1322.1629 0.328844444 0 1322.1631 0.429511111 0 1322.1627...)]

compute k = 1 nearest neighbor distances

result format:

results = [nearest neighbor distance]

example results:

results = [ 0.005939372 0.005939372 0.017815632 0.030118587 0.041569616 0.053475883 0.065324964 0.077200014 0.089077602) ]

UPDATE:

I've implemented two of the approaches suggested.

- Use the scipy.spatial.cdist to compute the full distances matrices
- Use a nearest X neighbors in radius R to find subset of neighbor distances for every point and return the smallest.

Results are that Method 2 is faster than Method 1 but took a lot more effort to implement (makes sense).

It seems the limiting factor for Method 1 is the memory needed to run the full computation, especially when my data set is approaching 10^5 (x, y, z) points. For my data set of 23k points, it takes ~ 100 seconds to capture the minimum distances.

For method 2, the speed scales as n_radius^2. That is, "neighbor radius squared", which really means that the algorithm scales ~ linearly with number of included neighbors. Using a Radius of ~ 5 (more than enough given application) it took 5 seconds, for the set of 23k points, to provide a list of mins in the same order as the point_list themselves. The difference matrix between the "exact solution" and Method 2 is basically zero.

Thanks for everyones' help!

Similar to Caleb's answer, but you could stop the iterative loop if you get a distance greater than some previous minimum distance (sorry - no code).

I used to program video games. It would take too much CPU to calculate the actual distance between two points. What we did was divide the "screen" into larger Cartesian squares and avoid the actual distance calculation if the Delta-X or Delta-Y was "too far away" - That's just subtraction, so maybe something like that to qualify where the actual Eucledian distance metric calculation is needed (extend to n-dimensions as needed)?

EDIT - expanding "too far away" candidate pair selection comments. For brevity, I'll assume a 2-D landscape. Take the point of interest (X0,Y0) and "draw" an nxn square around that point, with (X0,Y0) at the origin.

Go through the initial list of points and form a list of candidate points that are within that square. While doing that, if the DeltaX [ABS(Xi-X0)] is outside of the square, there is no need to calculate the DeltaY.

If there are no candidate points, make the square larger and iterate.

If there is exactly one candidate point and it is within the radius of the circle incribed by the square, that is your minimum.

If there are "too many" candidates, make the square smaller, *but you only need to reexamine the candidate list from this iteration, not all the points.*

If there are not "too many" candidates, then calculate the distance for that list. When doing so, first calculate DeltaX^2 + DeltaY^2 for the first candidate. If for subsequent candidates the DetlaX^2 is greater than the minumin so far, no need to calculate the DeltaY^2.

The minimum from that calculation is the minimum if it is within the radius of the circle inscribed by the square.

If not, you need to go back to a previous candidate list that includes points within the circle that has the radius of that minimum. For example, if you ended with one candidate in a 2x2 square that happened to be on the vertex X=1, Y=1, distance/radius would be SQRT(2). So go back to a previous candidate list that has a square greated or equal to 2xSQRT(2).

If warranted, generate a new candidate list that only includes points withing the +/- SQRT(2) square. Calculate distance for those candidate points as described above - omitting any that exceed the minimum calcluated so far.

No need to do the square root of the sum of the Delta^2 until you have only one candidate.

How to size the initial square, or if it should be a rectangle, and how to increase or decrease the size of the square/rectangle could be influenced by application knowledge of the data distribution.

I would consider recursive algorithms for some of this if the language you are using supports that.

**KNN (K-Nearest Neighbors) #1,** 1) [True or False] k-NN algorithm does more computation on test time rather than train time. 2) In the image below, which would be the best value for k assuming that Euclidean Distance between the two data point A(1,3) and B(2,3)? instead of 3-KNN which of the following x=1 and y=1 will belong to? The k-nearest neighbor classifier is a conventional non-parametric classifier that is said to yield good performance for optimal values of k [2]. if K=1 then then it takes the minimum distance of all points and classifies as the same class of the minimum distance data point.

How about this?

from scipy.spatial import distance A = (0.003467119 ,0.01422762 ,0.0101960126) B = (0.007279433 ,0.01651597 ,0.0045558849) C = (0.005392258 ,0.02149997 ,0.0177409387) D = (0.017898802 ,0.02790659 ,0.0006487222) E = (0.013564214 ,0.01835688 ,0.0008102952) F = (0.013375397 ,0.02210725 ,0.0286032185) points = [A, B, C, D, E, F] results = [] for point in points: distances = [{'point':point, 'neighbor':p, 'd':distance.euclidean(point, p)} for p in points if p != point] results.append(min(distances, key=lambda k:k['d']))

results will be a list of objects, like this:

results = [ {'point':(x1, y1, z1), 'neighbor':(x2, y2, z2), 'd':"distance from point to neighbor"}, ...]

Where `point`

is the reference point and `neighbor`

is point's closest neighbor.

**30 Questions to test a data scientist on K-Nearest Neighbors (kNN),** Learn more about how Average Nearest Neighbor Distance works Mean Distance, Expected Mean Distance, Nearest Neighbor Index, z-score, and p-value. Consequently, the Average Nearest Neighbor tool is most effective for EUCLIDEAN_DISTANCE —The straight-line distance between two points (as the crow Case-based reasoning (CBR) can be an effective approach to achieve reliable accuracy in cost estimation for construction projects, especially in the e…

The fastest option available to you may be `scipy.spatial.distance.cdist`

, which finds the pairwise distances between all of the points in its input. While finding all of those distances may not be the fastest algorithm to find the nearest neighbors, `cdist`

is implemented in C, so it is likely run faster than anything you try in Python.

import scipy as sp import scipy.spatial from scipy.spatial.distance import cdist points = sp.array(...) distances = sp.spatial.distance.cdist(points) # An element is not its own nearest neighbor sp.fill_diagonal(distances, sp.inf) # Find the index of each element's nearest neighbor mins = distances.argmin(0) # Extract the nearest neighbors from the data by row indexing nearest_neighbors = points[mins, :] # Put the arrays in the specified shape results = np.stack((points, nearest_neighbors), 1)

You could theoretically make this run faster (mostly by combining all of the steps into one algorithm), but unless you're writing in C, you won't be able to compete with SciPy/NumPy.

(`cdist`

runs in Θ(n2) time (if the size of each point is fixed), and every other part of the algorithm in O(n) time, so even if you did try to optimize the code in Python, you wouldn't notice the change for small amounts of data, and the improvements would be overshadowed by `cdist`

for more data.)

**Average Nearest Neighbor—Help,** Store the training samples in an array of data points arr[]. This means each element of this array represents a tuple (x, y). for i=0 to m: Calculate Euclidean distance I have the concatenated coordinates in a single cell. In a 3 dimensional plane, the distance between points (X 1, Y 1, Z 1) and (X 2, Y 2, Z 2) is given by: d = ( x 2 − x 1) 2 + ( y 2 − y 1) 2 + ( z 2 − z 1) 2. distance metric used to identify nearest neighbors. Euclidean metric is the “ordinary” straight-line distance between two points.

**K-Nearest Neighbours,** Use pdist2 to find the distance between a set of data and query points. The Euclidean distance is a special case of the Minkowski distance, where p = 2. Given a set X of n points and a distance function, k-nearest neighbor (kNN) search lets For more examples using knnsearch methods and function, see the individual It is the most obvious way of representing distance between two points. The Pythagorean Theorem can be used to calculate the distance between two points, as shown in the figure below. If the points ( x 1, y 1) and ( x 2, y 2) are in 2-dimensional space, then the Euclidean distance between them is ( x 2 − x 1) 2 + ( y 2 − y 1) 2.

**Classification Using Nearest Neighbors,** Let's consider Q=(x,y,z) to be your query point and Pi=(ai,bi,ci) are all possible neighbors for i=1,…n. considering that in best case, Δ is towards Pi, and that in the worst case Δ where minidi is the distance to the nearest neighbor in the previous step. If you have sorted the nearest neighbors in the first step, your list of $\begingroup$ Yes, exactly, {x, y, z}, so all distances are calculated on all dimensions, so given query point also. $\endgroup$ – Evil Oct 3 '16 at 20:51 $\begingroup$ My neptune, I'm reading all unnecessary advanced papers, thank you some much!!! $\endgroup$ – Plain_Dude_Sleeping_Alone Oct 3 '16 at 20:53

**How to find the next nearest neighbor in a given direction,** We have not tried to exhibit a functional form for the objective func - tion with Z - moves path , choosing ori - single link are 105 ( x ) and 96 ( y ) . entations at random . along hi - followed annealing with Z - moves gave the best by cooling for one so that the average distance between each city and its nearest neighbor is Calculating distance: To calculate the distance between two points (your new sample and all the data you have in your dataset) is very simple, as said before, there are several ways to get this value, in this article we will use the Euclidean distance. The Euclidean distance’s formule is like the image below: