## left_join based on closest LAT_LON in R

I am trying to find the ID of the closest LAT_LON in a data.frame with reference to my original data.frame. I have already figured this out by merging both data.frames on a unique identifier and the calculating the distance based on the `distHaverSine`

function from `geosphere`

. Now, I want to take step further and join the data.frames without the unique identifier and find ID the nearest LAT-LON.
I have used the following code after merging:

`v3 <-v2 %>% mutate(CTD = distHaversine(cbind(LON.x, LAT.x), cbind(LON.y, LAT.y)))`

DATA:

loc <- data.frame(station = c('Baker Street','Bank'), lat = c(51.522236,51.5134047), lng = c(-0.157080, -0.08905843), postcode = c('NW1','EC3V'))

stop <- data.frame(station = c('Angel','Barbican','Barons Court','Bayswater'), lat = c(51.53253,51.520865,51.490281,51.51224), lng = c(-0.10579,-0.097758,-0.214340,-0.187569), postcode = c('EC1V','EC1A', 'W14', 'W2'))

As a final result I would like something like this:

df <- data.frame(loc = c('Baker Street','Bank','Baker Street','Bank','Baker Street','Bank','Baker Street','Bank'), stop = c('Angel','Barbican','Barons Court','Bayswater','Angel','Barbican','Barons Court','Bayswater'), dist = c('x','x','x','x','x','x','x','x'), lat = c(51.53253,51.520865,51.490281,51.51224,51.53253,51.520865,51.490281,51.51224), lng = c(-0.10579,-0.097758,-0.214340,-0.187569,-0.10579,-0.097758,-0.214340,-0.187569), postcode = c('EC1V','EC1A', 'W14', 'W2','EC1V','EC1A', 'W14', 'W2') )

Any help is appreciated. Thanks.

As the distances between the object are small we can speed up the computation by using the euclidian distance between the coordinates. As we are not around the equator, the lng coordinates are squished a bit; we can make the comparison slightly better by scaling the lng a bit.

cor_stop <- stop[, c("lat", "lng")] cor_stop$lng <- cor_stop$lng * sin(mean(cor_stop$lat, na.rm = TRUE)/180*pi) cor_loc <- loc[, c("lat", "lng")] cor_loc$lng <- cor_loc$lng * sin(mean(cor_loc$lat, na.rm = TRUE)/180*pi)

We can then calculate the closest stop for each location using the `FNN`

package which uses tree based search to quickly find the closest K neighbours. This should scale to big data sets (I have used this for datasets with millions of records):

library(FNN) matches <- knnx.index(cor_stop, cor_loc, k = 1) matches

## [,1] ## [1,] 4 ## [2,] 2

We can then construct the end result:

res <- loc res$stop_station <- stop$station[matches[,1]] res$stop_lat <- stop$lat[matches[,1]] res$stop_lng <- stop$lng[matches[,1]] res$stop_postcode <- stop$postcode[matches[,1]]

And calculate the actual distance:

library(geosphere) res$dist <- distHaversine(res[, c("lng", "lat")], res[, c("stop_lng", "stop_lat")]) res

## station lat lng postcode stop_station stop_lat stop_lng ## 1 Baker Street 51.52224 -0.15708000 NW1 Bayswater 51.51224 -0.187569 ## 2 Bank 51.51340 -0.08905843 EC3V Barbican 51.52087 -0.097758 ## stop_postcode dist ## 1 W2 2387.231 ## 2 EC1A 1026.091

I you are unsure that the closest point in lat-long is also the closest point 'as the bird flies', you could use this method to first select the K closest points in lat-long; then calculate the distances for those points and then selecting the closest point.

**left_join based on closest LAT_LON in R,** left_join based on closest LAT_LON in R I want to take step further and join the data-frames without the unique identifier and find ID the nearest LAT-LON. Here, I will use a left_join(). As noted above, a left_join() only keeps the observations from the first data frame in the function. In other words, the result of a left_join() will have the same number of rows as the original left data frame, while adding the longitude and latitude columns from the locations data frame. The data frames will be

All of the joining, distance calculations, and plotting can be done with available R packages.

library(tidyverse) library(sf) #> Linking to GEOS 3.6.2, GDAL 2.2.3, PROJ 4.9.3 library(nngeo) library(mapview) ## Original data loc <- data.frame(station = c('Baker Street','Bank'), lat = c(51.522236,51.5134047), lng = c(-0.157080, -0.08905843), postcode = c('NW1','EC3V')) stop <- data.frame(station = c('Angel','Barbican','Barons Court','Bayswater'), lat = c(51.53253,51.520865,51.490281,51.51224), lng = c(-0.10579,-0.097758,-0.214340,-0.187569), postcode = c('EC1V','EC1A', 'W14', 'W2')) df <- data.frame(loc = c('Baker Street','Bank','Baker Street','Bank','Baker Street','Bank','Baker Street','Bank'), stop = c('Angel','Barbican','Barons Court','Bayswater','Angel','Barbican','Barons Court','Bayswater'), dist = c('x','x','x','x','x','x','x','x'), lat = c(51.53253,51.520865,51.490281,51.51224,51.53253,51.520865,51.490281,51.51224), lng = c(-0.10579,-0.097758,-0.214340,-0.187569,-0.10579,-0.097758,-0.214340,-0.187569), postcode = c('EC1V','EC1A', 'W14', 'W2','EC1V','EC1A', 'W14', 'W2') ) ## Create sf objects from lat/lon points loc_sf <- loc %>% st_as_sf(coords = c('lng', 'lat'), remove = T) %>% st_set_crs(4326) stop_sf <- stop %>% st_as_sf(coords = c('lng', 'lat'), remove = T) %>% st_set_crs(4326) # Use st_nearest_feature to cbind loc to stop by nearest points joined_sf <- stop_sf %>% cbind( loc_sf[st_nearest_feature(stop_sf, loc_sf),]) ## mutate to add column showing distance between geometries joined_sf %>% mutate(dist = st_distance(geometry, geometry.1, by_element = T)) #> Simple feature collection with 4 features and 5 fields #> Active geometry column: geometry #> geometry type: POINT #> dimension: XY #> bbox: xmin: -0.21434 ymin: 51.49028 xmax: -0.097758 ymax: 51.53253 #> epsg (SRID): 4326 #> proj4string: +proj=longlat +datum=WGS84 +no_defs #> station postcode station.1 postcode.1 geometry #> 1 Angel EC1V Bank EC3V POINT (-0.10579 51.53253) #> 2 Barbican EC1A Bank EC3V POINT (-0.097758 51.52087) #> 3 Barons Court W14 Baker Street NW1 POINT (-0.21434 51.49028) #> 4 Bayswater W2 Baker Street NW1 POINT (-0.187569 51.51224) #> geometry.1 dist #> 1 POINT (-0.08905843 51.5134) 2424.102 [m] #> 2 POINT (-0.08905843 51.5134) 1026.449 [m] #> 3 POINT (-0.15708 51.52224) 5333.417 [m] #> 4 POINT (-0.15708 51.52224) 2390.791 [m] ## Use nngeo and mapview to plot lines on a map # NOT run for reprex, output image attached #connected <- st_connect(stop_sf, loc_sf) # mapview(connected) + # mapview(loc_sf, color = 'red') + # mapview(stop_sf, color = 'black')

Created on 2020-01-21 by the reprex package (v0.3.0)

**r - 基于R中最接近的LAT_LON的left_join,** left_join based on closest LAT_LON in R. I am trying to find the ID of the closest LAT_LON in a data.frame with reference to my original data.frame. I have already � Figure 3: dplyr left_join Function. The difference to the inner_join function is that left_join retains all rows of the data table, which is inserted first into the function (i.e. the X-data). Have a look at the R documentation for a precise definition: Example 3: right_join dplyr R Function. Right join is the reversed brother of left join:

You can avoid searching for nearest neighbours completely if you *are* able to use a projected coordinate system. If you can, then you can cheaply construct Voronoi polygons around each location - these polygons define areas that are closest to each of the input points.

You can then just use GIS intersections to find which points lie in which polygons and then calculate the distances for known pairs of closest points. I think this should be much faster. However, you can't use Voronoi polygons with geographic coordinates.

loc <- data.frame(station = c('Baker Street','Bank'), lat = c(51.522236,51.5134047), lng = c(-0.157080, -0.08905843), postcode = c('NW1','EC3V')) stop <- data.frame(station = c('Angel','Barbican','Barons Court','Bayswater'), lat = c(51.53253,51.520865,51.490281,51.51224), lng = c(-0.10579,-0.097758,-0.214340,-0.187569), postcode = c('EC1V','EC1A', 'W14', 'W2')) # Convert to a suitable PCS (in this case OSGB) stop <- st_as_sf(stop, coords=c('lng','lat'), crs=4326) stop <- st_transform(stop, crs=27700) loc <- st_as_sf(loc, coords=c('lng','lat'), crs=4326) loc <- st_transform(loc, crs=27700) # Extract Voronoi polygons around locations and convert to an sf object loc_voronoi <- st_collection_extract(st_voronoi(do.call(c, st_geometry(loc)))) loc_voronoi <- st_sf(loc_voronoi, crs=crs(loc)) # Match Voronoi polygons to locations and select that geometry loc$voronoi <- loc_voronoi$loc_voronoi[unlist(st_intersects(loc, loc_voronoi))] st_geometry(loc) <- 'voronoi' # Find which stop is closest to each location stop$loc <- loc$station[unlist(st_intersects(stop, loc))] # Reset locs to use the point geometry and get distances st_geometry(loc) <- 'geometry' stop$loc_dist <- st_distance(stop, loc[stop$loc,], by_element=TRUE)

That gives you the following output:

Simple feature collection with 4 features and 4 fields geometry type: POINT dimension: XY bbox: xmin: 524069.7 ymin: 178326.3 xmax: 532074.6 ymax: 183213.9 epsg (SRID): 27700 proj4string: +proj=tmerc +lat_0=49 +lon_0=-2 +k=0.9996012717 +x_0=400000 +y_0=-100000 +ellps=airy +towgs84=446.448,-125.157,542.06,0.15,0.247,0.842,-20.489 +units=m +no_defs station postcode geometry loc loc_dist 1 Angel EC1V POINT (531483.8 183213.9) Bank 2423.722 [m] 2 Barbican EC1A POINT (532074.6 181931.2) Bank 1026.289 [m] 3 Barons Court W14 POINT (524069.7 178326.3) Baker Street 5332.478 [m] 4 Bayswater W2 POINT (525867.7 180813.9) Baker Street 2390.377 [m]

**Spatial Joins In R,** Spatial joins are based on the intersection between two spatial objects, USAF WBAN STN_NAME CTRY STATE CALL LAT LON ELEV_M ## 1 008268 How about picking the station nearest the centroid of the county. left_join based on closest LAT_LON in R. Ask Question Asked 14 days ago. Active 7 days ago. Viewed 188 times 3. 1. I am trying to find the ID of the closest LAT_LON

**fuzzyjoin package,** Join tables together based not on whether columns match exactly, but with 1,427 more rows closest %>% count(distance) #> # A tibble: 3 x 2 #> distance n� Note that the radius is the distance based on the decimal between lon/lat coordinates. Having had a look at the Wikipedia page on decimal degrees (mpre precisely: the table about degree precision versus length), we can see that 3 decimal places (0.001 degrees) correspond to 111.32 m in N/S and 78.71 m E/W at 45N/S.

**Reverse Geocoding in R. Free Without the Google or Bing API,** I found a great package in R called revgeo and thought this would be geocoded data.cities <- revgeo(latlong$longitude, latlong$latitude,� matrix with distance and lon/lat of the nearest point on the line. Distance is in the same unit as r in the distfun (default is meters). If line is a Spatial* object, the ID (index) of (one of) the nearest objects is also returned.

**[PDF] Spatial Queries with k-Nearest-Neighbor and Relational Predicates,** tional predicates, i.e., ones that have selects, joins and group-by's. One major mental results that are based on queries from the TPC-H benchmark and real spatial amongst its k-closest. ⋈. kNN. H. R seafood= (H ⋈kNN R) ∩ (seafood =)(R) r6 r7 r8 r1 r5 data represents the (lat, long) coordinates of real GPS data col-. Introduction. In this post in the R:case4base series we will look at one of the most common operations on multiple data frames - merge, also known as JOIN in SQL terms.. We will learn how to do the 4 basic types of join - inner, left, right and full join with base R and show how to perform the same with tidyverse’s dplyr and data.table’s methods.

##### Comments

- This might be helpful stackoverflow.com/questions/21977720/…
- @RonakShah, it soes not solve the question as my dataset is too large. keeps computing for a long time
- Here is another potential option. stackoverflow.com/questions/58831578/…. This is a M*N problem, as either dataframe grows the it just takes longer. To improve performance, reduce the size of the problem, either using a divide and conquer algorithm or reduce the precision of the starting locations from 5 decimal places down to three places. If you round the starting locations, you may find a large number of duplicates and thus save the time of recalculating.
- Thanks for that @Dave2e. I cannot reduce the precision as I am dealing with objects very close to each other.I can reduce the size of the problem, does
`distmatrix`

calculate`Haversine distance`

by default? Thanks - I believe it uses the
`distGeo`

method which assumes an ellipsoidal and not a sphere. - Thanks, @Jan van der Laan, Will check this today. bit occupied with something else. Thanks
- @Jan van der Laan, any idea why our distance calculations are so far apart? ~600m off for Baker to Bayswater.
- @mrhellmann I switched around the long and lat in the distHaversine call. I'll correct once I get to a computer .