## All Levels of a Factor in a Model Matrix in R

I have a `data.frame`

consisting of numeric and factor variables as seen below.

testFrame <- data.frame(First=sample(1:10, 20, replace=T), Second=sample(1:20, 20, replace=T), Third=sample(1:10, 20, replace=T), Fourth=rep(c("Alice","Bob","Charlie","David"), 5), Fifth=rep(c("Edward","Frank","Georgia","Hank","Isaac"),4))

I want to build out a `matrix`

that assigns dummy variables to the factor and leaves the numeric variables alone.

model.matrix(~ First + Second + Third + Fourth + Fifth, data=testFrame)

As expected when running `lm`

this leaves out one level of each factor as the reference level. However, I want to build out a `matrix`

with a dummy/indicator variable for every level of all the factors. I am building this matrix for `glmnet`

so I am not worried about multicollinearity.

Is there a way to have `model.matrix`

create the dummy for every level of the factor?

You need to reset the `contrasts`

for the factor variables:

model.matrix(~ Fourth + Fifth, data=testFrame, contrasts.arg=list(Fourth=contrasts(testFrame$Fourth, contrasts=F), Fifth=contrasts(testFrame$Fifth, contrasts=F)))

or, with a little less typing and without the proper names:

model.matrix(~ Fourth + Fifth, data=testFrame, contrasts.arg=list(Fourth=diag(nlevels(testFrame$Fourth)), Fifth=diag(nlevels(testFrame$Fifth))))

**model.matrix function,** model.matrix creates a design (or model) matrix, e.g., by expanding factors to a set of After coercion, all the variables used on the right-hand side of the formula In an interaction term, the variable whose levels vary fastest is the first one to� Following is an example of factor in R. > x [1] single married married single Levels: married single Here, we can see that factor x has four elements and two levels. We can check if a variable is a factor or not using class() function. Similarly, levels of a factor can be checked using the levels() function.

(Trying to redeem myself...) In response to Jared's comment on @Fabians answer about automating it, note that all you need to supply is a named list of contrast matrices. `contrasts()`

takes a vector/factor and produces the contrasts matrix from it. For this then we can use `lapply()`

to run `contrasts()`

on each factor in our data set, e.g. for the `testFrame`

example provided:

> lapply(testFrame[,4:5], contrasts, contrasts = FALSE) $Fourth Alice Bob Charlie David Alice 1 0 0 0 Bob 0 1 0 0 Charlie 0 0 1 0 David 0 0 0 1 $Fifth Edward Frank Georgia Hank Isaac Edward 1 0 0 0 0 Frank 0 1 0 0 0 Georgia 0 0 1 0 0 Hank 0 0 0 1 0 Isaac 0 0 0 0 1

Which slots nicely into @fabians answer:

model.matrix(~ ., data=testFrame, contrasts.arg = lapply(testFrame[,4:5], contrasts, contrasts=FALSE))

**How to get a full set of dummy-variables - General,** All Levels of a Factor in a Model Matrix in R. r, matrix, model, indicator. asked by Jared on 06:18AM - 30 Dec 10 UTC. Using sparse.model.matrix from the Matrix package you can get dummy-variables (now more trendily called one-hot encoding) for factor or factor-like columns of a data frame. I found some useful commentary on Stack Exchange: When you have "K" dummy variables then your resulting model will have a.) the intercept term (which is a column of ones) and b.) "K-1" additional columns. The reason is

`caret`

implemented a nice function `dummyVars`

to achieve this with 2 lines:

```
library(caret)
dmy <- dummyVars(" ~ .", data = testFrame)
testFrame2 <- data.frame(predict(dmy, newdata = testFrame))
```

Checking the final columns:

colnames(testFrame2) "First" "Second" "Third" "Fourth.Alice" "Fourth.Bob" "Fourth.Charlie" "Fourth.David" "Fifth.Edward" "Fifth.Frank" "Fifth.Georgia" "Fifth.Hank" "Fifth.Isaac"

The nicest point here is you get the original data frame, plus the dummy variables having excluded the original ones used for the transformation.

More info: http://amunategui.github.io/dummyVar-Walkthrough/

**Construct Design Matrices,** model.matrix creates a design (or model) matrix, e.g., by expanding factors to a set of After coercion, all the variables used on the right-hand side of the formula In an interaction term, the variable whose levels vary fastest is the first one to� model.Matrix creates design matrix, very much like the standard R function =MatrixModels&version=0.4-1" data-mini-rdoc="MatrixModels::model.matrix">model.matrix</a></code>, however returning a dense or sparse object of class <code>modelMatrix</code>.

`dummyVars`

from `caret`

could also be used. http://caret.r-forge.r-project.org/preprocess.html

**Expressing design formula in R,** Here we will show how to use the two R functions, formula and model.matrix , in values should not be interpreted numerically, but as different levels of a factor. model.matrix creates a design (or model) matrix, e.g., by expanding factors to a set of dummy variables (depending on the contrasts) and expanding interactions similarly.</p>

A `tidyverse`

answer:

library(dplyr) library(tidyr) result <- testFrame %>% mutate(one = 1) %>% spread(Fourth, one, fill = 0, sep = "") %>% mutate(one = 1) %>% spread(Fifth, one, fill = 0, sep = "")

yields the desired result (same as @Gavin Simpson's answer):

> head(result, 6) First Second Third FourthAlice FourthBob FourthCharlie FourthDavid FifthEdward FifthFrank FifthGeorgia FifthHank FifthIsaac 1 1 5 4 0 0 1 0 0 1 0 0 0 2 1 14 10 0 0 0 1 0 0 1 0 0 3 2 2 9 0 1 0 0 1 0 0 0 0 4 2 5 4 0 0 0 1 0 1 0 0 0 5 2 13 5 0 0 1 0 1 0 0 0 0 6 2 15 7 1 0 0 0 1 0 0 0 0

**[PDF] Design Matrices in R,** In R, 'model.matrix' is a useful tool for seeing the design matrices that are in play different. d <- data.frame(time = factor(1:4), Time = 1:4) d. ## time Time. ## 1 the first level of the factor variable, which is the level with the lowest number or� When a factor is first created, all of its levels are stored along with the factor, and if subsets of the factor are extracted, they will retain all of the original levels. This can create problems when constructing model matrices and may or may not be useful when displaying the data using, say, the table function.

**The model.matrix function - The R Book [Book],** The model.matrix function Creating tables of dummy variables for use in statistical Suppose that our dataframe contains a factor called parasite indicating the The variable called parasite has five levels: vulgaris, kochii, splendens, viridis is significantly different in cases where vulgaris is present and when it is absent. The design matrix for a regression-like model with the specified formula and data. There is an attribute "assign" , an integer vector with an entry for each column in the matrix giving the term in the formula which gave rise to the column.

**model matrix. For the default,** model.matrix.default(object, data = environment(object), contrasts.arg = NULL, factor), or else a function to compute such a matrix given the number of levels. matrix of predictor variables that includes contrasts for all factors and ordered Note that assign attribute in R does not operate the same as this assign attribute. the model formula. vars . names of all the variables in the model. facVars . names of all the factor variables in the model. lvls . levels of any factor variables. sep . NULL or a character separator. terms the terms.formula object. levelsOnly . a logical. The predict function produces a data frame. class2ind returns a matrix (or a vector if

**layout: page title: Expressing design formula in R --- ```{r options ,** different levels of a factor variable: ```{r} x <- factor(c(1,1,2,2)) model.matrix(~ x) ``` Now we have achieved the correct design matrix. ## More groups Here we� $\begingroup$ @SteveS: In fact R's so friendly that if you try remove the intercept - 1 when you have a single categorical predictor represented as a factor (as in this question), it'll assume you don't really mean that & switch to using sum-to-zero coding; which is of course just a different parametrization.