Whether to drop the last category in the encoded vector (default: true)
Whether to drop the last category in the encoded vector (default: true)
The columns to group by if a group encoding should be used.
The columns to group by if a group encoding should be used.
The group weighting to use.
The group weighting to use.
"equi" means dividint by the group size "sqrt" means dividing by the square root of the group size "one" means no division
Default: "sqrt"
Param for how to handle invalid data during transform().
Param for how to handle invalid data during transform(). Options are 'keep' (invalid data presented as an extra categorical feature) or 'error' (throw an error). Note that this Param is only used during transform; during fitting, invalid data will result in an error. Default: "error"
The weight to use instead of 1.0 for hot encoding, for each column
The weight to use instead of 1.0 for hot encoding, for each column
A weighted one-hot encoder that maps a column of category indices to a column of vectors, with at most a single weight value per row that indicates the input category index. For example with 5 categories and a weight of 0.3, an input value of 2.0 would map to an output vector of
[0.0, 0.0, 0.3, 0.0]. The last category is not included by default (configurable viadropLast), because it makes the vector entries linearly dependent. So an input value of 4.0 maps to[0.0, 0.0, 0.0, 0.0].When
handleInvalidis configured to 'keep', an extra "category" indicating invalid values is added as last category. So whendropLastis true, invalid values are encoded as all-zeros vector.A group encoding is also possible. In this case there can be multiple weight values, with one weight value per row of the group. All rows of a group have the same vector. The weight values of a group can further be weighted by the group size.
When encoding multi-column by using
inputColsandoutputColsparams, input/output cols come in pairs, specified by the order in the arrays, and each pair is treated independently.StringIndexerfor converting categorical values into category indices