All Levels of a Factor in a Model Matrix in R

what does model.matrix do in r
model matrix explained
r model.matrix interactions
r model.matrix no intercept
sparse model matrix r
model matrix definition
r regression model matrix
design matrix in r

I have a data.frame consisting of numeric and factor variables as seen below.

testFrame <- data.frame(First=sample(1:10, 20, replace=T),
           Second=sample(1:20, 20, replace=T), Third=sample(1:10, 20, replace=T),
           Fourth=rep(c("Alice","Bob","Charlie","David"), 5),
           Fifth=rep(c("Edward","Frank","Georgia","Hank","Isaac"),4))

I want to build out a matrix that assigns dummy variables to the factor and leaves the numeric variables alone.

model.matrix(~ First + Second + Third + Fourth + Fifth, data=testFrame)

As expected when running lm this leaves out one level of each factor as the reference level. However, I want to build out a matrix with a dummy/indicator variable for every level of all the factors. I am building this matrix for glmnet so I am not worried about multicollinearity.

Is there a way to have model.matrix create the dummy for every level of the factor?


You need to reset the contrasts for the factor variables:

model.matrix(~ Fourth + Fifth, data=testFrame, 
        contrasts.arg=list(Fourth=contrasts(testFrame$Fourth, contrasts=F), 
                Fifth=contrasts(testFrame$Fifth, contrasts=F)))

or, with a little less typing and without the proper names:

model.matrix(~ Fourth + Fifth, data=testFrame, 
    contrasts.arg=list(Fourth=diag(nlevels(testFrame$Fourth)), 
            Fifth=diag(nlevels(testFrame$Fifth))))

model.matrix function, model.matrix creates a design (or model) matrix, e.g., by expanding factors to a set of After coercion, all the variables used on the right-hand side of the formula In an interaction term, the variable whose levels vary fastest is the first one to� Following is an example of factor in R. > x [1] single married married single Levels: married single Here, we can see that factor x has four elements and two levels. We can check if a variable is a factor or not using class() function. Similarly, levels of a factor can be checked using the levels() function.


(Trying to redeem myself...) In response to Jared's comment on @Fabians answer about automating it, note that all you need to supply is a named list of contrast matrices. contrasts() takes a vector/factor and produces the contrasts matrix from it. For this then we can use lapply() to run contrasts() on each factor in our data set, e.g. for the testFrame example provided:

> lapply(testFrame[,4:5], contrasts, contrasts = FALSE)
$Fourth
        Alice Bob Charlie David
Alice       1   0       0     0
Bob         0   1       0     0
Charlie     0   0       1     0
David       0   0       0     1

$Fifth
        Edward Frank Georgia Hank Isaac
Edward       1     0       0    0     0
Frank        0     1       0    0     0
Georgia      0     0       1    0     0
Hank         0     0       0    1     0
Isaac        0     0       0    0     1

Which slots nicely into @fabians answer:

model.matrix(~ ., data=testFrame, 
             contrasts.arg = lapply(testFrame[,4:5], contrasts, contrasts=FALSE))

How to get a full set of dummy-variables - General, All Levels of a Factor in a Model Matrix in R. r, matrix, model, indicator. asked by Jared on 06:18AM - 30 Dec 10 UTC. Using sparse.model.matrix from the Matrix package you can get dummy-variables (now more trendily called one-hot encoding) for factor or factor-like columns of a data frame. I found some useful commentary on Stack Exchange: When you have "K" dummy variables then your resulting model will have a.) the intercept term (which is a column of ones) and b.) "K-1" additional columns. The reason is


caret implemented a nice function dummyVars to achieve this with 2 lines:

library(caret) dmy <- dummyVars(" ~ .", data = testFrame) testFrame2 <- data.frame(predict(dmy, newdata = testFrame))

Checking the final columns:

colnames(testFrame2)

"First"  "Second"         "Third"          "Fourth.Alice"   "Fourth.Bob"     "Fourth.Charlie" "Fourth.David"   "Fifth.Edward"   "Fifth.Frank"   "Fifth.Georgia"  "Fifth.Hank"     "Fifth.Isaac"   

The nicest point here is you get the original data frame, plus the dummy variables having excluded the original ones used for the transformation.

More info: http://amunategui.github.io/dummyVar-Walkthrough/

Construct Design Matrices, model.matrix creates a design (or model) matrix, e.g., by expanding factors to a set of After coercion, all the variables used on the right-hand side of the formula In an interaction term, the variable whose levels vary fastest is the first one to� model.Matrix creates design matrix, very much like the standard R function =MatrixModels&version=0.4-1" data-mini-rdoc="MatrixModels::model.matrix">model.matrix</a></code>, however returning a dense or sparse object of class <code>modelMatrix</code>.


dummyVars from caret could also be used. http://caret.r-forge.r-project.org/preprocess.html

Expressing design formula in R, Here we will show how to use the two R functions, formula and model.matrix , in values should not be interpreted numerically, but as different levels of a factor. model.matrix creates a design (or model) matrix, e.g., by expanding factors to a set of dummy variables (depending on the contrasts) and expanding interactions similarly.</p>


A tidyverse answer:

library(dplyr)
library(tidyr)
result <- testFrame %>% 
    mutate(one = 1) %>% spread(Fourth, one, fill = 0, sep = "") %>% 
    mutate(one = 1) %>% spread(Fifth, one, fill = 0, sep = "")

yields the desired result (same as @Gavin Simpson's answer):

> head(result, 6)
  First Second Third FourthAlice FourthBob FourthCharlie FourthDavid FifthEdward FifthFrank FifthGeorgia FifthHank FifthIsaac
1     1      5     4           0         0             1           0           0          1            0         0          0
2     1     14    10           0         0             0           1           0          0            1         0          0
3     2      2     9           0         1             0           0           1          0            0         0          0
4     2      5     4           0         0             0           1           0          1            0         0          0
5     2     13     5           0         0             1           0           1          0            0         0          0
6     2     15     7           1         0             0           0           1          0            0         0          0

[PDF] Design Matrices in R, In R, 'model.matrix' is a useful tool for seeing the design matrices that are in play different. d <- data.frame(time = factor(1:4), Time = 1:4) d. ## time Time. ## 1 the first level of the factor variable, which is the level with the lowest number or� When a factor is first created, all of its levels are stored along with the factor, and if subsets of the factor are extracted, they will retain all of the original levels. This can create problems when constructing model matrices and may or may not be useful when displaying the data using, say, the table function.


The model.matrix function - The R Book [Book], The model.matrix function Creating tables of dummy variables for use in statistical Suppose that our dataframe contains a factor called parasite indicating the The variable called parasite has five levels: vulgaris, kochii, splendens, viridis is significantly different in cases where vulgaris is present and when it is absent. The design matrix for a regression-like model with the specified formula and data. There is an attribute "assign" , an integer vector with an entry for each column in the matrix giving the term in the formula which gave rise to the column.


model matrix. For the default, model.matrix.default(object, data = environment(object), contrasts.arg = NULL, factor), or else a function to compute such a matrix given the number of levels. matrix of predictor variables that includes contrasts for all factors and ordered Note that assign attribute in R does not operate the same as this assign attribute. the model formula. vars . names of all the variables in the model. facVars . names of all the factor variables in the model. lvls . levels of any factor variables. sep . NULL or a character separator. terms the terms.formula object. levelsOnly . a logical. The predict function produces a data frame. class2ind returns a matrix (or a vector if


layout: page title: Expressing design formula in R --- ```{r options , different levels of a factor variable: ```{r} x <- factor(c(1,1,2,2)) model.matrix(~ x) ``` Now we have achieved the correct design matrix. ## More groups Here we� $\begingroup$ @SteveS: In fact R's so friendly that if you try remove the intercept - 1 when you have a single categorical predictor represented as a factor (as in this question), it'll assume you don't really mean that & switch to using sum-to-zero coding; which is of course just a different parametrization.