Dummy or binary Indicator variable creation for character variables
All those variables, for whom number of distinct categories are <= the chosen threshold are converted onto binary indicator (or dummy) variables. Say a variable called my_var has distinct categories A, B, C and D, then the program will generate 4 binary indicator variables like my_var_A, my_var_B, my_var_C and my_var_D – note distinct colour of the variable names. First part indicating the original variable, which has been used to make binary indicator variable and second part indicates the distinct categories of the original variable. The data will look like below
Original table
….. | My_var |
A | |
B | |
C | |
D | |
A | |
C |
New table –
….. | My_var | My_var_A | My_var_B | My_var_C | My_var_D |
A | 1 | 0 | 0 | 0 | |
B | 0 | 1 | 0 | 0 | |
C | 0 | 0 | 1 | 0 | |
D | 0 | 0 | 0 | 1 | |
A | 1 | 0 | 0 | 0 | |
C | 0 | 0 | 1 | 0 |