R - Getting row numbers of pairs of coordinates in a dataframe of pairs of coordinates

pairs plot r
how to read pairs plot in r
pairs() r
pairs function in r example
get coordinates of raster cells r
pairs.panels r example
pairs in r color
pairs in r with labels

Suppose I have a dataframe called "edges" of pairs of points, say :

  x0       y0       x1       y1
1 2.464286 2.464286 2.583333 1.750000
2 0.700000 3.787500 2.464286 2.464286
3 2.464286 2.464286 3.500000 3.500000
4 3.500000 3.500000 4.300000 3.900000
5 2.250000 4.750000 3.500000 3.500000

Each row of the dataframe is an edge going from the point (x0,y0) to the point (x1,y1), e.g. my first edge goes from the point of coordinates (2.464286,2.464286) to the point (2.583333,1.750000)

From that dataframe, I can easily extract another dataframe, call it "vertices", in which each point appears only once:

  x        y
1 2.464286 2.464286
2 0.700000 3.787500
3 3.500000 3.500000
4 2.250000 4.750000
5 2.583333 1.750000
6 4.300000 3.900000

How can I label each point in "vertices" with the row numbers in which it appears in the dataframe "edges", indifferently as left endpoint or right endpoint ? That is, I would like to get something like this :

  x        y            occurrences
1 2.464286 2.464286     1,2,3
2 0.700000 3.787500     2
3 3.500000 3.500000     3,4,5
4 2.250000 4.750000     5
5 2.583333 1.750000     1
6 4.300000 3.900000     4

I've tried to use %in% and which, but it considers only elementwise comparisons, hence two points with the same x-coordinate or y-coordinate could be regarded as the same.

Also, this is a labellisation I'll have to do quite a great amount of times in my simulations, so I'm hoping to get a better solution than a for-loop/if based one.

Here is a solution that uses dplyr. There may be a way to clean this up but this should get you most of the way there.

library(dplyr)

edgedf <- read.table(header = TRUE,stringsAsFactors = FALSE, text = "
x0       y0       x1       y1
2.464286 2.464286 2.583333 1.750000
0.700000 3.787500 2.464286 2.464286
2.464286 2.464286 3.500000 3.500000
3.500000 3.500000 4.300000 3.900000
2.250000 4.750000 3.500000 3.500000")


vertdf <- read.table(header = TRUE,stringsAsFactors = FALSE, text = "
x        y
2.464286 2.464286
0.700000 3.787500
3.500000 3.500000
2.250000 4.750000
2.583333 1.750000
4.300000 3.900000")

# Add row numbers
tmp_edgedf <- edgedf %>% mutate(id = 1:n())
# Stack the x0,y0 and x1,y1 coords as x,y then join
# with vertices "vertdf". Grouping by x,y and summarise
# concatenating the row numbers as occurrences.
rbind(tmp_edgedf %>%
        select(id, x0, y0) %>%
        rename(x = x0, y = y0),
      tmp_edgedf %>%
        select(id, x1, y1) %>%
        rename(x = x1, y = y1)) %>%
  right_join(vertdf, by = c("x", "y")) %>%
  group_by(x, y) %>%
  summarise(occurrences = paste(sort(id), collapse = ",")) %>%
  data.frame() # Remove rounding by tibble object.

Results

##          x        y occurrences
## 1 0.700000 3.787500           2
## 2 2.250000 4.750000           5
## 3 2.464286 2.464286       1,2,3
## 4 2.583333 1.750000           1
## 5 3.500000 3.500000       3,4,5
## 6 4.300000 3.900000           4

EDIT

Here is a variant and perhaps simpler solution below. The first inner_join joins vertices to (x0, y0) and the second to (x1, y1). A row number is added to the edgedf data structure (temporarily) keeping track of row number. The edgedf data frame can just have it added before the join, eliminating the duplicate addition.

rbind(
    inner_join(vertdf, 
               edgedf %>% transmute(id = 1:n(), x0, y0),
               by = c(x = "x0", y = "y0")),
    inner_join(vertdf,
               edgedf %>% transmute(id = 1:n(), x1, y1),
               by = c(x = "x1", y = "y1"))
  ) %>%
  group_by(x,y) %>%
  summarise(occurrances = paste(sort(id), collapse = ",")) %>%
  data.frame()

pairs function, Sign up for an annual subscription at 62% off to get unlimited learning for less. the coordinates of points given as numeric columns of a matrix or data frame. Each term will give a separate variable in the pairs plot, so terms should be Should the layout be matrix-like with row 1 at the top, or graph-like with row 1 at the  column number; or vector of column numbers. If missing, the x coordinates for all columns are returned. row: row number; or vector of row numbers. If missing, the y coordinates for all rows are returned. cell: cell number(s) spatial: If spatial=TRUE, xyFromCell returns a SpatialPoints object instead of a matrix additional arguments. None implemented. obj

Hope this helps!

library(dplyr)

edges %>%
  rowwise() %>%
  mutate(occurrences = paste(rownames(vertices)[unlist(lapply(apply(vertices, 1, paste, collapse=","), 
                                  function(i) grepl(paste(x, y, sep=','), i)))], collapse = ",")) %>%
  data.frame()

Output is:

         x        y occurrences
1 2.464286 2.464286       1,2,3
2 0.700000 3.787500           2
3 3.500000 3.500000       3,4,5
4 2.250000 4.750000           5
5 2.583333 1.750000           1
6 4.300000 3.900000           4

Sample data:

edges <- structure(list(x = c(2.464286, 0.7, 3.5, 2.25, 2.583333, 4.3), 
    y = c(2.464286, 3.7875, 3.5, 4.75, 1.75, 3.9)), class = "data.frame", row.names = c("1", 
"2", "3", "4", "5", "6"))

vertices <- structure(list(x0 = c(2.464286, 0.7, 2.464286, 3.5, 2.25), y0 = c(2.464286, 
3.7875, 2.464286, 3.5, 4.75), x1 = c(2.583333, 2.464286, 3.5, 
4.3, 3.5), y1 = c(1.75, 2.464286, 3.5, 3.9, 3.5)), class = "data.frame", row.names = c("1", 
"2", "3", "4", "5"))

xyFromCell: Coordinates from a row, column or cell number in raster , These functions get coordinates of the center of raster cells for a row, column, or cell number of a Raster* object. column number; or vector of column numbers. If missing xyFromCell: matrix(x,y) with coordinate pairs #using a new default raster (1 degree global) r <- raster() xFromCol(r, c(1, 120, 180))  Dear all, I have 2 data frames, both with 14 columns of data and differing numbers of rows. The first two columns are 'Latitude' and 'Longitude'. I want to find the pairs of Latitude and Longitude coordinates which are common to both datasets, and output a new data frame which is composed of these coincident rows.

Here's a one-line approach that doesn't require dplyr:

vertices[, 'occurrences'] <- apply(vertices, 1, function(V) 
  paste(which(apply(edges, 1, function (E, V) 
    isTRUE(all.equal(V, E[1:2], check.attributes=FALSE)) || 
    isTRUE(all.equal(V, E[3:4], check.attributes=FALSE)), V=V)),
  collapse=',')
)

The code takes each row of vertices in turn, then checks for a match in each row of edges, examining each end of the line in turn. isTRUE is necessary to strip the results of the comparison into a simple "does it match or not"; which converts the string of TRUEs and FALSEs into integers corresponding to the lines, and paste converts this series of integers into a character string separated by commas.

Sample data
vertices<- structure(list(
    x = c(2.464286, 0.7, 3.5, 2.25, 2.583333, 4.3), 
    y = c(2.464286, 3.7875, 3.5, 4.75, 1.75, 3.9)),
    class = "data.frame", 
    row.names = c("1", "2", "3", "4", "5", "6")
)

edges <- structure(list(
   x0 = c(2.464286, 0.7, 2.464286, 3.5, 2.25),
    y0 = c(2.464286, 3.7875, 2.464286, 3.5, 4.75),
    x1 = c(2.583333, 2.464286, 3.5, 4.3, 3.5),
    y1 = c(1.75, 2.464286, 3.5, 3.9, 3.5)),
    class = "data.frame", 
    row.names = c("1", "2", "3", "4", "5")
)
Output:
> vertices

         x        y occurrences
1 2.464286 2.464286       1,2,3
2 0.700000 3.787500           2
3 3.500000 3.500000       3,4,5
4 2.250000 4.750000           5
5 2.583333 1.750000           1
6 4.300000 3.900000           4

SpatialPointsDataFrame properties and operators in R, (e.g., @data contains attributes, @coords contain coordinate pairs, etc. The @data slot is always a data.frame object and @coords is a matrix However, as previously indicated, this is not the case for the other slots (e.g., coordinates, One convenient characteristic is that you can subset a spatial object through a row  Each row of the dataframe is an edge going from the point (x0,y0) to the point (x1,y1), e.g. my first edge goes from the point of coordinates (2.464286,2.464286) to the point (2.583333,1.750000) From that dataframe, I can easily extract another dataframe, call it "vertices", in which each point appears only once:

Chapter 1 Introduction to spatial data in R, A Polygon object is a spaghetti collection of 2D coordinates with equal first and last column is a list with the same length as the number of rows in the data frame. It contains: (a) the center coordinate pair of the south-west raster cell, (b) the  Efficiently finding matching pairs of objects; R - Getting row numbers of pairs of coordinates in a dataframe of pairs of coordinates; Subset data by inverse pairs; Serialize a subset of environment variables; Counting the number of ordered pairs in matrix in R; Remove pairs of values from NumPy array; Generate matrices for pairs of values in Numpy

[PDF] raster, Computation of row, column and cell numbers to coordinates and vice versa. • Reading and adjacent(r, cells=c(1, 55), directions=8, pairs=TRUE) a <- adjacent(r Get a data.frame with raster cell values, or coerce SpatialPolygons,. Lines, or  the coordinates of points given as numeric columns of a matrix or data frame. Logical and factor columns are converted to numeric in the same way that data.matrix does. formula: a formula, such as ~ x + y + z. Each term will give a separate variable in the pairs plot, so terms should be numeric vectors.

[PDF] sp, R AAA.R Class-CRS.R CRS-methods.R Class-Spatial.R data.frame object with attributes The ordering of the grid cells is as in coordinates() of the same object​, bubble plot; if identify is TRUE, returns the indexes (row numbers) of labpt: Object of class "numeric"; an x, y coordinate pair forming the  the coordinates of points given as numeric columns of a matrix or data frame. Logical and factor columns are converted to numeric in the same way that data.matrix does. formula. a formula, such as ~ x + y + z. Each term will give a separate variable in the pairs plot, so terms should be numeric vectors.

Comments
  • I get the following error: Error in data.table(.) : could not find function "data.table" I added library(data.table) at the beginning of my session. But then it masks the following: The following objects are masked from ‘package:dplyr’: between, first, last The following objects are masked from ‘package:reshape2’: dcast, melt The following object is masked from ‘package:spatstat’: shift Turns out I need some of these functions in the rest of my code. Will that be a problem? Otherwise, works fine.
  • First, that was a typo I just corrected; data.table should have been data.frame as this was a autocomplete error in R studio I didn't catch. I will address the "mask" thing in a different comment.
  • In R in general, this masking pops up a lot and using dplyr and data.table is no exception. You should not have issues from within these packages as they should handle the usage of there own package functions correctly, but you could have a problem if you use a masked function directly and you may have to specify the package as in dplyr::first and data.table::first. I can't say my way of handling this is the best way but I often load packages in an order that helps me reduce the need to specify the package name first (i.e. dplyr::). This is off topic for this question though.
  • Sorry for the delay in response, had no access to the internet for the past few days. Thanks for the answer! Another question related: turns out that my example was a rather simple one, as I didn't want to have too long dataframes fur the purpose of explaining my problem. In reality, my 'vertices' and 'edges' dataframes are much longer (say 10k+ rows) and their type is dataframe, not lists. Shall I convert them as lists first as you did in your example? I tried to run a as.list(vertices) and as.list(edges), but then the rest didn't work out. Any idea?
  • Sample data in my answer is not a list but dataframe. It's the output of dput(edges) & dput(vertices) (it's the best way to share your data with others). In short you can run my code as is on your data. If it doesn't work then you may need to share data using dput(head(edges))/ dput(head(vertices)) along with the error message.
  • > dput(edges) structure(list(x1 = c(1.5, 2.083333, 0.85, 1.5, 2.083333), y1 = c(2, 2.583333, 2.65, 2, 2.583333), x2 = c(1.5, 2.65, 1.5, 2.083333, 1.825), y2 = c(1.9, 2.3, 2, 2.583333, 3.1)), .Names = c("x1", "y1", "x2", "y2"), class = "data.frame", row.names = c(NA, -5L)) > dput(vertices) structure(list(x = c(1.5, 2.083333, 0.85, 1.5, 2.65, 1.825), y = c(2, 2.583333, 2.65, 1.9, 2.3, 3.1)), .Names = c("x", "y"), row.names = c(NA, -6L), class = "data.frame") Error in mutate_impl(.data, dots) : Evaluation error: object 'x' not found.
  • For this new sample data you just need to swap edges with vertices (& vice-versa) in my code.
  • Sorry, didn't indeed catch we had reverse examples! Works perfectly fine, thank you!