org.apache.spark.ml.recommendation
The per-worker mini-batch size Default: 256
The per-worker mini-batch size Default: 256
Destroys the model and releases the underlying distributed models and broadcasts.
Destroys the model and releases the underlying distributed models and broadcasts. This model can't be used anymore afterwards.
If other clients should be terminated. This is necessary if a glint cluster in another Spark application should be terminated.
The regularization rate for the latent factor weights Default: 0.001f
The regularization rate for the latent factor weights Default: 0.001f
The name of the integer arrays column containing the itemCol ids of the items to filter from recommendations.
The name of the integer arrays column containing the itemCol ids of the items to filter from recommendations. If empty, recommendations are not filtered. Usually the arrays will contain the ids of the items of the user
Default: ""
The name of the item id column of integers from 0 to number of items in training dataset Default: "itemid"
The name of the item id column of integers from 0 to number of items in training dataset Default: "itemid"
The name of the item feature column of sparse vectors Default: "itemfeatures"
The name of the item feature column of sparse vectors Default: "itemfeatures"
The regularization rate for the linear weights Default: 0.01f
The regularization rate for the linear weights Default: 0.01f
Whether the meta data of the data frame to fit should be loaded from HDFS.
Whether the meta data of the data frame to fit should be loaded from HDFS. This allows skipping the meta data computation stages when fitting on the same data frame with different parameters. Meta data for "cross-batch" and "uniform" sampling is intercompatible but "exp" requires its own meta data
Default: false
The HDFS path to load meta data for the fit data frame from or to save the fitted meta data to.
The HDFS path to load meta data for the fit data frame from or to save the fitted meta data to. Default: ""
The number of latent factor dimensions (k) Default: 150
The number of latent factor dimensions (k) Default: 150
The number of parameter servers Default: 3
The number of parameter servers Default: 3
The parameter server configuration.
The parameter server configuration. Allows for detailed configuration of the parameter servers with the default configuration as fallback. Default: ConfigFactory.empty()
The master host of the running parameter servers.
The master host of the running parameter servers. If this is not set a standalone parameter server cluster is started in this Spark application. Default: ""
Returns top numItems items recommended for each user id in the input data set
Returns top numItems items recommended for each user id in the input data set
The dataset containing a column of user ids and user context features. The column names must match userCol, userctxFeaturesCol and, if filtering should be used, also filterItemsCol.
The maximum number of recommendations for each user
A dataframe of (userCol: Int, recommendations), where recommendations are stored as an array of (itemCol: Int, score: Float) rows.
The rho value to use for the "exp" sampler.
The rho value to use for the "exp" sampler. Has to be between 0.0 and 1.0 Default: 1.0
The sampler to use.
The sampler to use.
"uniform" means sampling negative items uniformly, as originally proposed for BPR.
"exp" means sampling negative items with probability proportional to their exponential popularity distribution, as proposed in LambdaFM.
"crossbatch" means sampling negative items uniformly, but sharing them across the mini-batch as crossbatch-BPR loss, as proposed in my masters thesis.
Default: "uniform"
The name of the column of integers to use for sampling.
The name of the column of integers to use for sampling. If empty all items are accepted as negative items otherwise only items where there does not exist an interaction between the user and the sampling column value of the item. Usually the sampling column is the same as itemCol but it may also be another column with an n-to-1 relation from item column value to sampling column value.
Consider the example of playlists with "pid" as user column amd tracks with "traid" as item column. Another column "artid" holds the artist of the track. With "traid" as sampling column, only tracks which are not in the playlist are accepted as negative items. With "artid" as sampling column, only tracks whose artists are not in the playlist are accepted as negative item.
Default: ""
Whether the meta data of the fitted data frame should be saved to HDFS.
Whether the meta data of the fitted data frame should be saved to HDFS. Default: false
The depth to use for tree reduce when computing the meta data.
The depth to use for tree reduce when computing the meta data. To avoid OOM errors, this has to be set sufficiently large but lower depths might lead to faster runtimes
The UID
The UID
The name of the user id column of integers Default: "userid"
The name of the user id column of integers Default: "userid"
The name of the user and context feature column of sparse vectors Default: "userctxfeatures"
The name of the user and context feature column of sparse vectors Default: "userctxfeatures"
Model fitted by GlintFMPair.
For simplicity, this implementation uses the parameter servers for recommendation. Real use cases will require a different implementation which exports the linear weights and the latent factors, uses approaches like locality-sensitive hashing for recommendation in sublinear time and does not use parameter servers at recommendation time.