Leave One Out¶

class
category_encoders.leave_one_out.
LeaveOneOutEncoder
(verbose=0, cols=None, drop_invariant=False, return_df=True, handle_unknown='value', handle_missing='value', random_state=None, sigma=None)[source]¶ Leave one out coding for categorical features.
This is very similar to target encoding but excludes the current row’s target when calculating the mean target for a level to reduce the effect of outliers.
 Parameters
 verbose: int
integer indicating verbosity of the output. 0 for none.
 cols: list
a list of columns to encode, if None, all string columns will be encoded.
 drop_invariant: bool
boolean for whether or not to drop columns with 0 variance.
 return_df: bool
boolean for whether to return a pandas DataFrame from transform (otherwise it will be a numpy array).
 handle_missing: str
options are ‘error’, ‘return_nan’ and ‘value’, defaults to ‘value’, which returns the target mean.
 handle_unknown: str
options are ‘error’, ‘return_nan’ and ‘value’, defaults to ‘value’, which returns the target mean.
 sigma: float
adds normal (Gaussian) distribution noise into training data in order to decrease overfitting (testing data are untouched). Sigma gives the standard deviation (spread or “width”) of the normal distribution. The optimal value is commonly between 0.05 and 0.6. The default is to not add noise, but that leads to significantly suboptimal results.
References
 1
Strategies to encode categorical variables with many categories, from
https://www.kaggle.com/c/caterpillartubepricing/discussion/15748#143154.
Methods
fit
(X, y, **kwargs)Fit encoder according to X and y.
fit_transform
(X[, y])Encoders that utilize the target must make sure that the training data are transformed with:
Returns the names of all transformed / added columns.
get_params
([deep])Get parameters for this estimator.
set_params
(**params)Set the parameters of this estimator.
transform
(X[, y, override_return_df])Perform the transformation to new categorical data.
transform_leave_one_out
(X_in, y[, mapping])Leave one out encoding uses a single column of floats to represent the means of the target variables.
fit_column_map
fit_leave_one_out

fit
(X, y, **kwargs)[source]¶ Fit encoder according to X and y.
 Parameters
 Xarraylike, shape = [n_samples, n_features]
Training vectors, where n_samples is the number of samples and n_features is the number of features.
 yarraylike, shape = [n_samples]
Target values.
 Returns
 selfencoder
Returns self.

get_feature_names
()[source]¶ Returns the names of all transformed / added columns.
 Returns
 feature_names: list
A list with all feature names transformed or added. Note: potentially dropped features are not included!

transform
(X, y=None, override_return_df=False)[source]¶ Perform the transformation to new categorical data.
 Parameters
 Xarraylike, shape = [n_samples, n_features]
 yarraylike, shape = [n_samples] when transform by leave one out
None, when transform without target information (such as transform test set)
 Returns
 parray, shape = [n_samples, n_numeric + N]
Transformed values with encoding applied.