## Reshape three column data frame to matrix ("long" to "wide" format)

3 column data frame to matrix r

3 column data frame to matrix pandas

r transpose data frame column names

reshape data frame r

r reshape long to wide

opposite of melt in r

reshape data frame from long to wide

**This question already has answers here**:

There are many ways to do this. This answer starts with what is quickly becoming the standard method, but also includes older methods and various other methods from answers to similar questions scattered around this site.

tmp <- data.frame(x=gl(2,3, labels=letters[24:25]), y=gl(3,1,6, labels=letters[1:3]), z=c(1,2,3,3,3,2))

**Using the tidyverse:**

The new cool new way to do this is with `pivot_wider`

from `tidyr 1.0.0`

. It returns a data frame, which is probably what most readers of this answer will want. For a heatmap, though, you would need to convert this to a true matrix.

library(tidyr) pivot_wider(tmp, names_from = y, values_from = z) ## # A tibble: 2 x 4 ## x a b c ## <fct> <dbl> <dbl> <dbl> ## 1 x 1 2 3 ## 2 y 3 3 2

The old cool new way to do this is with `spread`

from `tidyr`

. It similarly returns a data frame.

library(tidyr) spread(tmp, y, z) ## x a b c ## 1 x 1 2 3 ## 2 y 3 3 2

**Using reshape2**:

One of the first steps toward the tidyverse was the reshape2 package.

To get a matrix use `acast`

:

library(reshape2) acast(tmp, x~y, value.var="z") ## a b c ## x 1 2 3 ## y 3 3 2

Or to get a data frame, use `dcast`

, as here: Reshape data for values in one column.

dcast(tmp, x~y, value.var="z") ## x a b c ## 1 x 1 2 3 ## 2 y 3 3 2

**Using plyr**:

In between reshape2 and the tidyverse came `plyr`

, with the `daply`

function, as shown here: https://stackoverflow.com/a/7020101/210673

library(plyr) daply(tmp, .(x, y), function(x) x$z) ## y ## x a b c ## x 1 2 3 ## y 3 3 2

**Using matrix indexing:**

This is kinda old school but is a nice demonstration of matrix indexing, which can be really useful in certain situations.

with(tmp, { out <- matrix(nrow=nlevels(x), ncol=nlevels(y), dimnames=list(levels(x), levels(y))) out[cbind(x, y)] <- z out })

**Using xtabs:**

xtabs(z~x+y, data=tmp)

**Using a sparse matrix:**

There's also `sparseMatrix`

within the `Matrix`

package, as seen here: R - convert BIG table into matrix by column names

with(tmp, sparseMatrix(i = as.numeric(x), j=as.numeric(y), x=z, dimnames=list(levels(x), levels(y)))) ## 2 x 3 sparse Matrix of class "dgCMatrix" ## a b c ## x 1 2 3 ## y 3 3 2

**Using reshape:**

You can also use the base R function `reshape`

, as suggested here: Convert table into matrix by column names, though you have to do a little manipulation afterwards to remove an extra columns and get the names right (not shown).

reshape(tmp, idvar="x", timevar="y", direction="wide") ## x z.a z.b z.c ## 1 x 1 2 3 ## 4 y 3 3 2

**Reshape three column data frame to matrix (“long” to “wide” format ,** To reshape columns from a data frame from long to wide format matrix, you can use the spread function from the tidyverse package as follows:. To reshape columns from a data frame from long to wide format matrix, you can use the spread function from the tidyverse package as follows: df <- data.frame(x=gl(2,3, labels=letters[24:25]), y=gl(3,1,6, labels=letters[1:3]),

The question is some years old but maybe some people are still interested in alternative answers.

If you don't want to load any packages, you might use this function:

#' Converts three columns of a data.frame into a matrix -- e.g. to plot #' the data via image() later on. Two of the columns form the row and #' col dimensions of the matrix. The third column provides values for #' the matrix. #' #' @param data data.frame: input data #' @param rowtitle string: row-dimension; name of the column in data, which distinct values should be used as row names in the output matrix #' @param coltitle string: col-dimension; name of the column in data, which distinct values should be used as column names in the output matrix #' @param datatitle string: name of the column in data, which values should be filled into the output matrix #' @param rowdecreasing logical: should the row names be in ascending (FALSE) or in descending (TRUE) order? #' @param coldecreasing logical: should the col names be in ascending (FALSE) or in descending (TRUE) order? #' @param default_value numeric: default value of matrix entries if no value exists in data.frame for the entries #' @return matrix: matrix containing values of data[[datatitle]] with rownames data[[rowtitle]] and colnames data[coltitle] #' @author Daniel Neumann #' @date 2017-08-29 data.frame2matrix = function(data, rowtitle, coltitle, datatitle, rowdecreasing = FALSE, coldecreasing = FALSE, default_value = NA) { # check, whether titles exist as columns names in the data.frame data if ( (!(rowtitle%in%names(data))) || (!(coltitle%in%names(data))) || (!(datatitle%in%names(data))) ) { stop('data.frame2matrix: bad row-, col-, or datatitle.') } # get number of rows in data ndata = dim(data)[1] # extract rownames and colnames for the matrix from the data.frame rownames = sort(unique(data[[rowtitle]]), decreasing = rowdecreasing) nrows = length(rownames) colnames = sort(unique(data[[coltitle]]), decreasing = coldecreasing) ncols = length(colnames) # initialize the matrix out_matrix = matrix(NA, nrow = nrows, ncol = ncols, dimnames=list(rownames, colnames)) # iterate rows of data for (i1 in 1:ndata) { # get matrix-row and matrix-column indices for the current data-row iR = which(rownames==data[[rowtitle]][i1]) iC = which(colnames==data[[coltitle]][i1]) # throw an error if the matrix entry (iR,iC) is already filled. if (!is.na(out_matrix[iR, iC])) stop('data.frame2matrix: double entry in data.frame') out_matrix[iR, iC] = data[[datatitle]][i1] } # set empty matrix entries to the default value out_matrix[is.na(out_matrix)] = default_value # return matrix return(out_matrix) }

How it works:

myData = as.data.frame(list('dim1'=c('x', 'x', 'x', 'y','y','y'), 'dim2'=c('a','b','c','a','b','c'), 'values'=c(1,2,3,3,3,2))) myMatrix = data.frame2matrix(myData, 'dim1', 'dim2', 'values') myMatrix > a b c > x 1 2 3 > y 3 3 2

**Convert a Data Frame to a Numeric Matrix,** Return the matrix obtained by converting all the variables in a data frame to numeric mode and then binding them together as the columns of a matrix. Chapter 3 of Statistical Models in S eds J. M. Chambers and T. J. Hastie, Wadsworth Pandas has excellent tool sets to wrangle data and reshape it to tidy format. In this post, we will see three examples of tidying data by reshaping data frame in wide form to long form. The three examples aim to reshape the data as shown below, but with different levels of complexities.

##### base R, `unstack`

unstack(df, V3 ~ V2) # a b c # 1 1 2 3 # 2 3 3 2

This may not be a general solution but works well in this case.

##### data

df<-structure(list(V1 = structure(c(1L, 1L, 1L, 2L, 2L, 2L), .Label = c("x", "y"), class = "factor"), V2 = structure(c(1L, 2L, 3L, 1L, 2L, 3L), .Label = c("a", "b", "c"), class = "factor"), V3 = c(1L, 2L, 3L, 3L, 3L, 2L)), .Names = c("V1", "V2", "V3"), class = "data.frame", row.names = c(NA, -6L))

**How to reshape three column (userid, bookid, ratings) data frame to ,** How can I reshape three column (userid, bookid, ratings) data frame to matrix (“long” to “wide” format) (rows=userId, columns=bookid and values (ratings)?. Pandas use various methods to reshape the dataframe and series. Let’s see about the some of that reshaping method. Stack method works with the MultiIndex objects in DataFrame, it returning a DataFrame with an index with a new inner-most level of row labels. It changes the wide table to a long table. unstack is similar to stack method, It also

For sake of completeness, there's a `tapply()`

solution around.

with(d, tapply(z, list(x, y), sum)) # a b c # x 1 2 3 # y 3 3 2

**Data**

d <- structure(list(x = structure(c(1L, 1L, 1L, 2L, 2L, 2L), .Label = c("x", "y"), class = "factor"), y = structure(c(1L, 2L, 3L, 1L, 2L, 3L), .Label = c("a", "b", "c"), class = "factor"), z = c(1, 2, 3, 3, 3, 2)), class = "data.frame", row.names = c(NA, -6L))

**Reshape,** Use the t() function to transpose a matrix or a data frame. In the later case, rownames become variable (column) names. # example using built-in dataset mtcars In data analysis data preparation is a very important early step. It involves casting data to the right format for downstream use. Reshaping data into the proper format in R is easier said than done. This article shows how to convert a dataset between wide and long format in R. reshape numeric vectors. To reshape numeric vectors it is best to use the matrix command. let’s see an example:

From `tidyr 0.8.3.9000`

, a new function called `pivot_wider()`

is introduced. It is basically an upgraded version of the previous `spread()`

function (which is, moreover, no longer under active development). From pivoting vignette:

This vignette describes the use of the new pivot_longer() and pivot_wider() functions. Their goal is to improve the usability of gather() and spread(), and incorporate state-of-the-art features found in other packages.

For some time, it’s been obvious that there is something fundamentally wrong with the design of spread() and gather(). Many people don’t find the names intuitive and find it hard to remember which direction corresponds to spreading and which to gathering. It also seems surprisingly hard to remember the arguments to these functions, meaning that many people (including me!) have to consult the documentation every time.

How to use it (using the data from @Aaron):

pivot_wider(data = tmp, names_from = y, values_from = z) x a b c <fct> <dbl> <dbl> <dbl> 1 x 1 2 3 2 y 3 3 2

Or in a "full" `tidyverse`

fashion:

tmp %>% pivot_wider(names_from = y, values_from = z)

**How to reshape data in R? Quick Reference – sixhat.net,** To reshape numeric vectors it is best to use the matrix command. let's see an example: This is where it becomes interesting… data frames are objects where each line one observation characterised by several different properties (each column entry). id age height sex 1 1 19 89 M 2 2 34 65 F 3 3 40 74 M 4 4 20 65 M NumPy provides the reshape() function on the NumPy array object that can be used to reshape the data. The reshape() function takes a single argument that specifies the new shape of the array. In the case of reshaping a one-dimensional array into a two-dimensional array with one column, the tuple would be the shape of the array as the first dimension (data.shape[0]) and 1 for the second dimension.

**7.5 Extracting a subset of a data frame,** 5.4.1 “Summarising” a variable · 5.4.2 “Summarising” a data frame · 5.4.3 Notice that each row now sums to 1, but that's not true for each column. vectors together into a data frame (or matrix), and flipping a data frame (or matrix) on its side. Framed in the most general way, reshaping the data means taking the data in Reshape a 4-by-4 square matrix into a matrix that has 2 columns. Specify [] for the first dimension to let reshape automatically calculate the appropriate number of rows.

**[PDF] reshape,** Cast a molten data frame into the reshaped or aggregated form you want. Usage 3. Details. Along with melt and recast, this is the only function you should ever need to use. Once you By default they will appear as the last column variable. These will be returned as values in rows of the resulting data frame or matrix. The reshape() function, which is confusingly not part of the reshape2 package; it is part of the base install of R. stack() and unstack() Sample data. These data frames hold the same data, but in wide and long formats. They will each be converted to the other format below.

**How to Create a Data Frame from a Matrix in R,** With data frames, each variable is a column, but in the original matrix, the rows str(baskets.df) 'data.frame': 6 obs. of 2 variables: $ Granny : num 12 4 5 6 9 3 Efficient reshaping using data.tables 2019-12-08. This vignette discusses the default usage of reshaping functions melt (wide to long) and dcast (long to wide) for data.tables as well as the new extended functionalities of melting and casting on multiple columns available from v1.9.6.

##### Comments

- @AnandaMahto also has a great answer about this here: stackoverflow.com/a/14515736/210673
`acast(tmp, x~y, value.var="z")`

will give a matrix output, with`x`

as the row.names- Can you comment on the advantages/disadvantages of different methods?
- In most small data sets, the primary consideration should be coding in a way that is clear to future analysts (including future you) and the least susceptible to human coding mistakes. Although that will depend on your strengths and needs, generally this is considered one of the strengths of the new tidyverse set of packages. Another consideration (though not really an advantage/disadvantage) is whether you want a matrix or a data frame as a result; this question specifically asks for a matrix, and you can see in the answer that some techniques give that directly while some give a data frame.
- Computing time may also be a consideration for large data sets, especially when the code needs to be repeated multiple times or on multiple data sets. I suspect that depends in part, though, on the specific characteristics of the data set. If that is a concern for you, I suggest asking another question about optimizing for your particular situation; questions like that at one point were like catnip for this crowd. :) But I'll repeat my previous point: optimizing for the user is (usually) more important than optimizing for the computer.
- see the anwer from @Aaron
- Somehow managed to miss the part at the end where he covered spread. Nice catch, thanks.
- tidyverse solutions now moved to the top.