A weighted one-hot encoder that maps a column of category indices to a column of vectors, with at most a single weight value per row that indicates the input category index.
A weighted one-hot encoder that maps a column of category indices to a column of vectors, with
at most a single weight value per row that indicates the input category index.
For example with 5 categories and a weight of 0.3, an input value of 2.0 would map to an output vector of
[0.0, 0.0, 0.3, 0.0].
The last category is not included by default (configurable via dropLast),
because it makes the vector entries linearly dependent.
So an input value of 4.0 maps to [0.0, 0.0, 0.0, 0.0].
When handleInvalid is configured to 'keep', an extra "category" indicating invalid values is
added as last category. So when dropLast is true, invalid values are encoded as all-zeros
vector.
A group encoding is also possible. In this case there can be multiple weight values, with one weight value per row of the group. All rows of a group have the same vector. The weight values of a group can further be weighted by the group size.
When encoding multi-column by using inputCols and outputCols params, input/output cols
come in pairs, specified by the order in the arrays, and each pair is treated independently.
StringIndexer for converting categorical values into category indices