QA: What are the rules for variable interactions in RevoScaleR formulas?

Here are the basic rules, and they apply to formulas in all of the analysis functions in RevoScaleR: 

1. The interaction of two continuous variables is equivalent to the multiplication of those variables, and is thus continuous. That is, w:x is the same as w*x. 

2. The interaction of two factor (categorical variables) is a categorical variable whose categories are all possible combinations of the categories of the original two variables. Thus, age:sex, if both are categorical, contains all age and sex categories. 

3. The interaction of a continuous variable and a categorical variable results in an "interaction" variable in which the continuous variable is operated on within each category. Thus rxSummary( ~income:sex ) gives summary statistics for income within each sex category; rxCube( ~income:sex) computes average income within each sex category. For both rxSummary (this is a very recent change) and rxCube/rxCrossTab, ~income:sex is equivalent to income~sex. That is, the continuous variable can be put on the left hand side of the ~. 

These rules apply to multiple continuous and categorical variables. All of the continuous variables are multiplied by each other, and all of the categorical variables are interacted to give a combined categorical variable, and then the resulting continuous variable is operated on within each category of the resulting categorical variable. 

Article ID: 3104248 - Last Review: 29 Oct 2015 - Revision: 1