Class/Object

org.apache.spark.ml.feature

WeightHotEncoderEstimator

Related Docs: object WeightHotEncoderEstimator | package feature

Permalink

class WeightHotEncoderEstimator extends Estimator[WeightHotEncoderModel] with WeightHotEncoderBase with DefaultParamsWritable

A weighted one-hot encoder that maps a column of category indices to a column of vectors, with at most a single weight value per row that indicates the input category index. For example with 5 categories and a weight of 0.3, an input value of 2.0 would map to an output vector of [0.0, 0.0, 0.3, 0.0]. The last category is not included by default (configurable via dropLast), because it makes the vector entries linearly dependent. So an input value of 4.0 maps to [0.0, 0.0, 0.0, 0.0].

When handleInvalid is configured to 'keep', an extra "category" indicating invalid values is added as last category. So when dropLast is true, invalid values are encoded as all-zeros vector.

A group encoding is also possible. In this case there can be multiple weight values, with one weight value per row of the group. All rows of a group have the same vector. The weight values of a group can further be weighted by the group size.

Annotations
@Since( "2.3.0" )
Note

When encoding multi-column by using inputCols and outputCols params, input/output cols come in pairs, specified by the order in the arrays, and each pair is treated independently.

See also

StringIndexer for converting categorical values into category indices

Linear Supertypes
DefaultParamsWritable, MLWritable, WeightHotEncoderBase, HasOutputCols, HasInputCols, HasHandleInvalid, Estimator[WeightHotEncoderModel], PipelineStage, Logging, Params, Serializable, Serializable, Identifiable, AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. WeightHotEncoderEstimator
  2. DefaultParamsWritable
  3. MLWritable
  4. WeightHotEncoderBase
  5. HasOutputCols
  6. HasInputCols
  7. HasHandleInvalid
  8. Estimator
  9. PipelineStage
  10. Logging
  11. Params
  12. Serializable
  13. Serializable
  14. Identifiable
  15. AnyRef
  16. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Instance Constructors

  1. new WeightHotEncoderEstimator()

    Permalink
    Annotations
    @Since( "2.3.0" )
  2. new WeightHotEncoderEstimator(uid: String)

    Permalink
    Annotations
    @Since( "2.3.0" )

Value Members

  1. final def !=(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  3. final def $[T](param: Param[T]): T

    Permalink
    Attributes
    protected
    Definition Classes
    Params
  4. final def ==(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  5. final def asInstanceOf[T0]: T0

    Permalink
    Definition Classes
    Any
  6. final def clear(param: Param[_]): WeightHotEncoderEstimator.this.type

    Permalink
    Definition Classes
    Params
  7. def clone(): AnyRef

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  8. def copy(extra: ParamMap): WeightHotEncoderEstimator

    Permalink
    Definition Classes
    WeightHotEncoderEstimator → Estimator → PipelineStage → Params
    Annotations
    @Since( "2.3.0" )
  9. def copyValues[T <: Params](to: T, extra: ParamMap): T

    Permalink
    Attributes
    protected
    Definition Classes
    Params
  10. final def defaultCopy[T <: Params](extra: ParamMap): T

    Permalink
    Attributes
    protected
    Definition Classes
    Params
  11. final val dropLast: BooleanParam

    Permalink

    Whether to drop the last category in the encoded vector (default: true)

    Whether to drop the last category in the encoded vector (default: true)

    Definition Classes
    WeightHotEncoderBase
    Annotations
    @Since( "2.3.0" )
  12. final def eq(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  13. def equals(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  14. def explainParam(param: Param[_]): String

    Permalink
    Definition Classes
    Params
  15. def explainParams(): String

    Permalink
    Definition Classes
    Params
  16. final def extractParamMap(): ParamMap

    Permalink
    Definition Classes
    Params
  17. final def extractParamMap(extra: ParamMap): ParamMap

    Permalink
    Definition Classes
    Params
  18. def finalize(): Unit

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  19. def fit(dataset: Dataset[_]): WeightHotEncoderModel

    Permalink
    Definition Classes
    WeightHotEncoderEstimator → Estimator
    Annotations
    @Since( "2.3.0" )
  20. def fit(dataset: Dataset[_], paramMaps: Array[ParamMap]): Seq[WeightHotEncoderModel]

    Permalink
    Definition Classes
    Estimator
    Annotations
    @Since( "2.0.0" )
  21. def fit(dataset: Dataset[_], paramMap: ParamMap): WeightHotEncoderModel

    Permalink
    Definition Classes
    Estimator
    Annotations
    @Since( "2.0.0" )
  22. def fit(dataset: Dataset[_], firstParamPair: ParamPair[_], otherParamPairs: ParamPair[_]*): WeightHotEncoderModel

    Permalink
    Definition Classes
    Estimator
    Annotations
    @Since( "2.0.0" ) @varargs()
  23. final def get[T](param: Param[T]): Option[T]

    Permalink
    Definition Classes
    Params
  24. final def getClass(): Class[_]

    Permalink
    Definition Classes
    AnyRef → Any
  25. final def getDefault[T](param: Param[T]): Option[T]

    Permalink
    Definition Classes
    Params
  26. def getDropLast: Boolean

    Permalink

    Definition Classes
    WeightHotEncoderBase
    Annotations
    @Since( "2.3.0" )
  27. def getGroupCols: Array[String]

    Permalink

    Definition Classes
    WeightHotEncoderBase
  28. def getGroupWeighting: String

    Permalink

    Definition Classes
    WeightHotEncoderBase
  29. final def getHandleInvalid: String

    Permalink
    Definition Classes
    HasHandleInvalid
  30. final def getInputCols: Array[String]

    Permalink
    Definition Classes
    HasInputCols
  31. final def getOrDefault[T](param: Param[T]): T

    Permalink
    Definition Classes
    Params
  32. final def getOutputCols: Array[String]

    Permalink
    Definition Classes
    HasOutputCols
  33. def getParam(paramName: String): Param[Any]

    Permalink
    Definition Classes
    Params
  34. def getWeights: Array[Double]

    Permalink

    Definition Classes
    WeightHotEncoderBase
  35. final val groupCols: StringArrayParam

    Permalink

    The columns to group by if a group encoding should be used.

    The columns to group by if a group encoding should be used.

    Definition Classes
    WeightHotEncoderBase
  36. final val groupWeighting: Param[String]

    Permalink

    The group weighting to use.

    The group weighting to use.

    "equi" means dividint by the group size "sqrt" means dividing by the square root of the group size "one" means no division

    Default: "sqrt"

    Definition Classes
    WeightHotEncoderBase
  37. val handleInvalid: Param[String]

    Permalink

    Param for how to handle invalid data during transform().

    Param for how to handle invalid data during transform(). Options are 'keep' (invalid data presented as an extra categorical feature) or 'error' (throw an error). Note that this Param is only used during transform; during fitting, invalid data will result in an error. Default: "error"

    Definition Classes
    WeightHotEncoderBase → HasHandleInvalid
    Annotations
    @Since( "2.3.0" )
  38. final def hasDefault[T](param: Param[T]): Boolean

    Permalink
    Definition Classes
    Params
  39. def hasParam(paramName: String): Boolean

    Permalink
    Definition Classes
    Params
  40. def hashCode(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  41. def initializeLogIfNecessary(isInterpreter: Boolean, silent: Boolean): Boolean

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  42. def initializeLogIfNecessary(isInterpreter: Boolean): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  43. final val inputCols: StringArrayParam

    Permalink
    Definition Classes
    HasInputCols
  44. final def isDefined(param: Param[_]): Boolean

    Permalink
    Definition Classes
    Params
  45. final def isInstanceOf[T0]: Boolean

    Permalink
    Definition Classes
    Any
  46. final def isSet(param: Param[_]): Boolean

    Permalink
    Definition Classes
    Params
  47. def isTraceEnabled(): Boolean

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  48. def log: Logger

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  49. def logDebug(msg: ⇒ String, throwable: Throwable): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  50. def logDebug(msg: ⇒ String): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  51. def logError(msg: ⇒ String, throwable: Throwable): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  52. def logError(msg: ⇒ String): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  53. def logInfo(msg: ⇒ String, throwable: Throwable): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  54. def logInfo(msg: ⇒ String): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  55. def logName: String

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  56. def logTrace(msg: ⇒ String, throwable: Throwable): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  57. def logTrace(msg: ⇒ String): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  58. def logWarning(msg: ⇒ String, throwable: Throwable): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  59. def logWarning(msg: ⇒ String): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  60. final def ne(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  61. final def notify(): Unit

    Permalink
    Definition Classes
    AnyRef
  62. final def notifyAll(): Unit

    Permalink
    Definition Classes
    AnyRef
  63. final val outputCols: StringArrayParam

    Permalink
    Definition Classes
    HasOutputCols
  64. lazy val params: Array[Param[_]]

    Permalink
    Definition Classes
    Params
  65. def save(path: String): Unit

    Permalink
    Definition Classes
    MLWritable
    Annotations
    @Since( "1.6.0" ) @throws( ... )
  66. final def set(paramPair: ParamPair[_]): WeightHotEncoderEstimator.this.type

    Permalink
    Attributes
    protected
    Definition Classes
    Params
  67. final def set(param: String, value: Any): WeightHotEncoderEstimator.this.type

    Permalink
    Attributes
    protected
    Definition Classes
    Params
  68. final def set[T](param: Param[T], value: T): WeightHotEncoderEstimator.this.type

    Permalink
    Definition Classes
    Params
  69. final def setDefault(paramPairs: ParamPair[_]*): WeightHotEncoderEstimator.this.type

    Permalink
    Attributes
    protected
    Definition Classes
    Params
  70. final def setDefault[T](param: Param[T], value: T): WeightHotEncoderEstimator.this.type

    Permalink
    Attributes
    protected
    Definition Classes
    Params
  71. def setDropLast(value: Boolean): WeightHotEncoderEstimator.this.type

    Permalink

    Annotations
    @Since( "2.3.0" )
  72. def setGroupCols(value: Array[String]): WeightHotEncoderEstimator.this.type

    Permalink

  73. def setGroupWeighting(value: String): WeightHotEncoderEstimator.this.type

    Permalink

  74. def setHandleInvalid(value: String): WeightHotEncoderEstimator.this.type

    Permalink

    Annotations
    @Since( "2.3.0" )
  75. def setInputCols(values: Array[String]): WeightHotEncoderEstimator.this.type

    Permalink

    Annotations
    @Since( "2.3.0" )
  76. def setOutputCols(values: Array[String]): WeightHotEncoderEstimator.this.type

    Permalink

    Annotations
    @Since( "2.3.0" )
  77. def setWeights(value: Array[Double]): WeightHotEncoderEstimator.this.type

    Permalink

  78. final def synchronized[T0](arg0: ⇒ T0): T0

    Permalink
    Definition Classes
    AnyRef
  79. def toString(): String

    Permalink
    Definition Classes
    Identifiable → AnyRef → Any
  80. def transformSchema(schema: StructType): StructType

    Permalink
    Definition Classes
    WeightHotEncoderEstimator → PipelineStage
    Annotations
    @Since( "2.3.0" )
  81. def transformSchema(schema: StructType, logging: Boolean): StructType

    Permalink
    Attributes
    protected
    Definition Classes
    PipelineStage
    Annotations
    @DeveloperApi()
  82. val uid: String

    Permalink
    Definition Classes
    WeightHotEncoderEstimator → Identifiable
    Annotations
    @Since( "2.3.0" )
  83. def validateAndTransformSchema(schema: StructType, dropLast: Boolean, keepInvalid: Boolean): StructType

    Permalink
    Attributes
    protected
    Definition Classes
    WeightHotEncoderBase
  84. final def wait(): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  85. final def wait(arg0: Long, arg1: Int): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  86. final def wait(arg0: Long): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  87. final val weights: DoubleArrayParam

    Permalink

    The weight to use instead of 1.0 for hot encoding, for each column

    The weight to use instead of 1.0 for hot encoding, for each column

    Definition Classes
    WeightHotEncoderBase
  88. def write: MLWriter

    Permalink
    Definition Classes
    DefaultParamsWritable → MLWritable

Inherited from DefaultParamsWritable

Inherited from MLWritable

Inherited from WeightHotEncoderBase

Inherited from HasOutputCols

Inherited from HasInputCols

Inherited from HasHandleInvalid

Inherited from Estimator[WeightHotEncoderModel]

Inherited from PipelineStage

Inherited from Logging

Inherited from Params

Inherited from Serializable

Inherited from Serializable

Inherited from Identifiable

Inherited from AnyRef

Inherited from Any

getParam

param

setParam

Ungrouped