dummyVars.Rd
dummyVars
creates a full set of dummy variables (i.e. less than full
rank parameterization)
dummyVars(formula, ...)
# S3 method for default
dummyVars(formula, data, sep = ".", levelsOnly = FALSE, fullRank = FALSE, ...)
# S3 method for dummyVars
print(x, ...)
# S3 method for dummyVars
predict(object, newdata, na.action = na.pass, ...)
contr.ltfr(n, contrasts = TRUE, sparse = FALSE)
class2ind(x, drop2nd = FALSE)
An appropriate R model formula, see References
additional arguments to be passed to other methods
A data frame with the predictors of interest
An optional separator between factor variable names and their
levels. Use sep = NULL
for no separator (i.e. normal behavior of
model.matrix
as shown in the Details section)
A logical; TRUE
means to completely remove the
variable names from the column names
A logical; should a full rank or less than full rank
parameterization be used? If TRUE
, factors are encoded to be
consistent with model.matrix
and the resulting there
are no linear dependencies induced between the columns.
A factor vector.
An object of class dummyVars
A data frame with the required columns
A function determining what should be done with missing
values in newdata
. The default is to predict NA
.
A vector of levels for a factor, or the number of levels.
A logical indicating whether contrasts should be computed.
A logical indicating if the result should be sparse.
A logical: if the factor has two levels, should a single binary vector be returned?
The output of dummyVars
is a list of class 'dummyVars' with
elements
the function call
the model formula
names of all the variables in the model
names of all the factor variables in the model
levels of any factor variables
NULL
or a character separator
the terms.formula
object
a logical
The predict
function produces a data frame.
class2ind
returns a matrix (or a vector if drop2nd = TRUE
).
contr.ltfr
generates a design matrix.
Most of the contrasts
functions in R produce full rank
parameterizations of the predictor data. For example,
contr.treatment
creates a reference cell in the data
and defines dummy variables for all factor levels except those in the
reference cell. For example, if a factor with 5 levels is used in a model
formula alone, contr.treatment
creates columns for the
intercept and all the factor levels except the first level of the factor.
For the data in the Example section below, this would produce:
(Intercept) dayTue dayWed dayThu dayFri daySat daySun
1 0 0 0 0 0 0
1 0 0 0 0 0 0
1 0 0 0 0 0 0
1 0 1 0 0 0 0
1 0 1 0 0 0 0
1 0 0 0 1 0 0
1 0 0 0 0 1 0
1 0 0 0 0 1 0
1 0 0 0 1 0 0
In some situations, there may be a need for dummy variables for all the levels of the factor. For the same example:
dayMon dayTue dayWed dayThu dayFri daySat daySun
1 0 0 0 0 0 0
1 0 0 0 0 0 0
1 0 0 0 0 0 0
0 0 1 0 0 0 0
0 0 1 0 0 0 0
0 0 0 0 1 0 0
0 0 0 0 0 1 0
0 0 0 0 0 1 0
0 0 0 0 1 0 0
Given a formula and initial data set, the class dummyVars
gathers all
the information needed to produce a full set of dummy variables for any data
set. It uses contr.ltfr
as the base function to do this.
class2ind
is most useful for converting a factor outcome vector to a
matrix (or vector) of dummy variables.
when <- data.frame(
time = c(
"afternoon", "night", "afternoon",
"morning", "morning", "morning",
"morning", "afternoon", "afternoon"
),
day = c(
"Mon", "Mon", "Mon",
"Wed", "Wed", "Fri",
"Sat", "Sat", "Fri"
),
stringsAsFactors = TRUE
)
levels(when$time) <- list(
morning = "morning",
afternoon = "afternoon",
night = "night"
)
levels(when$day) <- list(
Mon = "Mon", Tue = "Tue", Wed = "Wed", Thu = "Thu",
Fri = "Fri", Sat = "Sat", Sun = "Sun"
)
## Default behavior:
model.matrix(~day, when)
#> (Intercept) dayTue dayWed dayThu dayFri daySat daySun
#> 1 1 0 0 0 0 0 0
#> 2 1 0 0 0 0 0 0
#> 3 1 0 0 0 0 0 0
#> 4 1 0 1 0 0 0 0
#> 5 1 0 1 0 0 0 0
#> 6 1 0 0 0 1 0 0
#> 7 1 0 0 0 0 1 0
#> 8 1 0 0 0 0 1 0
#> 9 1 0 0 0 1 0 0
#> attr(,"assign")
#> [1] 0 1 1 1 1 1 1
#> attr(,"contrasts")
#> attr(,"contrasts")$day
#> [1] "contr.treatment"
#>
mainEffects <- dummyVars(~ day + time, data = when)
mainEffects
#> Dummy Variable Object
#>
#> Formula: ~day + time
#> <environment: 0x55a9001d2568>
#> 2 variables, 2 factors
#> Variables and levels will be separated by '.'
#> A less than full rank encoding is used
predict(mainEffects, when[1:3, ])
#> day.Mon day.Tue day.Wed day.Thu day.Fri day.Sat day.Sun time.morning
#> 1 1 0 0 0 0 0 0 0
#> 2 1 0 0 0 0 0 0 0
#> 3 1 0 0 0 0 0 0 0
#> time.afternoon time.night
#> 1 1 0
#> 2 0 1
#> 3 1 0
when2 <- when
when2[1, 1] <- NA
predict(mainEffects, when2[1:3, ])
#> day.Mon day.Tue day.Wed day.Thu day.Fri day.Sat day.Sun time.morning
#> 1 1 0 0 0 0 0 0 NA
#> 2 1 0 0 0 0 0 0 0
#> 3 1 0 0 0 0 0 0 0
#> time.afternoon time.night
#> 1 NA NA
#> 2 0 1
#> 3 1 0
predict(mainEffects, when2[1:3, ], na.action = na.omit)
#> day.Mon day.Tue day.Wed day.Thu day.Fri day.Sat day.Sun time.morning
#> 2 1 0 0 0 0 0 0 0
#> 3 1 0 0 0 0 0 0 0
#> time.afternoon time.night
#> 2 0 1
#> 3 1 0
interactionModel <- dummyVars(~ day + time + day:time,
data = when,
sep = "."
)
predict(interactionModel, when[1:3, ])
#> day.Mon day.Tue day.Wed day.Thu day.Fri day.Sat day.Sun time.morning
#> 1 1 0 0 0 0 0 0 0
#> 2 1 0 0 0 0 0 0 0
#> 3 1 0 0 0 0 0 0 0
#> time.afternoon time.night dayMon:timemorning dayTue:timemorning
#> 1 1 0 0 0
#> 2 0 1 0 0
#> 3 1 0 0 0
#> dayWed:timemorning dayThu:timemorning dayFri:timemorning daySat:timemorning
#> 1 0 0 0 0
#> 2 0 0 0 0
#> 3 0 0 0 0
#> daySun:timemorning dayMon:timeafternoon dayTue:timeafternoon
#> 1 0 1 0
#> 2 0 0 0
#> 3 0 1 0
#> dayWed:timeafternoon dayThu:timeafternoon dayFri:timeafternoon
#> 1 0 0 0
#> 2 0 0 0
#> 3 0 0 0
#> daySat:timeafternoon daySun:timeafternoon dayMon:timenight dayTue:timenight
#> 1 0 0 0 0
#> 2 0 0 1 0
#> 3 0 0 0 0
#> dayWed:timenight dayThu:timenight dayFri:timenight daySat:timenight
#> 1 0 0 0 0
#> 2 0 0 0 0
#> 3 0 0 0 0
#> daySun:timenight
#> 1 0
#> 2 0
#> 3 0
noNames <- dummyVars(~ day + time + day:time,
data = when,
levelsOnly = TRUE
)
predict(noNames, when)
#> Mon Tue Wed Thu Fri Sat Sun morning afternoon night dayMon:timemorning
#> 1 1 0 0 0 0 0 0 0 1 0 0
#> 2 1 0 0 0 0 0 0 0 0 1 0
#> 3 1 0 0 0 0 0 0 0 1 0 0
#> 4 0 0 1 0 0 0 0 1 0 0 0
#> 5 0 0 1 0 0 0 0 1 0 0 0
#> 6 0 0 0 0 1 0 0 1 0 0 0
#> 7 0 0 0 0 0 1 0 1 0 0 0
#> 8 0 0 0 0 0 1 0 0 1 0 0
#> 9 0 0 0 0 1 0 0 0 1 0 0
#> dayTue:timemorning dayWed:timemorning dayThu:timemorning dayFri:timemorning
#> 1 0 0 0 0
#> 2 0 0 0 0
#> 3 0 0 0 0
#> 4 0 1 0 0
#> 5 0 1 0 0
#> 6 0 0 0 1
#> 7 0 0 0 0
#> 8 0 0 0 0
#> 9 0 0 0 0
#> daySat:timemorning daySun:timemorning dayMon:timeafternoon
#> 1 0 0 1
#> 2 0 0 0
#> 3 0 0 1
#> 4 0 0 0
#> 5 0 0 0
#> 6 0 0 0
#> 7 1 0 0
#> 8 0 0 0
#> 9 0 0 0
#> dayTue:timeafternoon dayWed:timeafternoon dayThu:timeafternoon
#> 1 0 0 0
#> 2 0 0 0
#> 3 0 0 0
#> 4 0 0 0
#> 5 0 0 0
#> 6 0 0 0
#> 7 0 0 0
#> 8 0 0 0
#> 9 0 0 0
#> dayFri:timeafternoon daySat:timeafternoon daySun:timeafternoon
#> 1 0 0 0
#> 2 0 0 0
#> 3 0 0 0
#> 4 0 0 0
#> 5 0 0 0
#> 6 0 0 0
#> 7 0 0 0
#> 8 0 1 0
#> 9 1 0 0
#> dayMon:timenight dayTue:timenight dayWed:timenight dayThu:timenight
#> 1 0 0 0 0
#> 2 1 0 0 0
#> 3 0 0 0 0
#> 4 0 0 0 0
#> 5 0 0 0 0
#> 6 0 0 0 0
#> 7 0 0 0 0
#> 8 0 0 0 0
#> 9 0 0 0 0
#> dayFri:timenight daySat:timenight daySun:timenight
#> 1 0 0 0
#> 2 0 0 0
#> 3 0 0 0
#> 4 0 0 0
#> 5 0 0 0
#> 6 0 0 0
#> 7 0 0 0
#> 8 0 0 0
#> 9 0 0 0
head(class2ind(iris$Species))
#> setosa versicolor virginica
#> 1 1 0 0
#> 2 1 0 0
#> 3 1 0 0
#> 4 1 0 0
#> 5 1 0 0
#> 6 1 0 0
two_levels <- factor(rep(letters[1:2], each = 5))
class2ind(two_levels)
#> a b
#> 1 1 0
#> 2 1 0
#> 3 1 0
#> 4 1 0
#> 5 1 0
#> 6 0 1
#> 7 0 1
#> 8 0 1
#> 9 0 1
#> 10 0 1
class2ind(two_levels, drop2nd = TRUE)
#> 1 2 3 4 5 6 7 8 9 10
#> 1 1 1 1 1 0 0 0 0 0