optuna_integration.lightgbm.LightGBMTuner

class optuna_integration.lightgbm.LightGBMTuner(params, train_set, num_boost_round=1000, valid_sets=None, valid_names=None, feval=None, feature_name=None, categorical_feature=None, keep_training_booster=False, callbacks=None, time_budget=None, sample_size=None, study=None, optuna_callbacks=None, model_dir=None, *, show_progress_bar=True, optuna_seed=None)[source]

Hyperparameter tuner for LightGBM.

It optimizes the following hyperparameters in a stepwise manner: lambda_l1, lambda_l2, num_leaves, feature_fraction, bagging_fraction, bagging_freq and min_child_samples.

You can find the details of the algorithm and benchmark results in this blog article by Kohei Ozaki, a Kaggle Grandmaster.

Note

Arguments and keyword arguments for lightgbm.train() can be passed. For params, please check the official documentation for LightGBM.

Warning

Arguments feature_name and categorical_feature were deprecated in v4.2.2 and will be removed in the future. The removal of these arguments is currently scheduled for v6.0.0, but this schedule is subject to change. See https://github.com/optuna/optuna-integration/releases/tag/v4.2.2.

The arguments that only LightGBMTuner has are listed below:

Parameters:

time_budget (int | None) – A time budget for parameter tuning in seconds.
study (optuna.study.Study | None) – A Study instance to store optimization results. The Trial instances in it has the following user attributes: elapsed_secs is the elapsed time since the optimization starts. average_iteration_time is the average time of iteration to train the booster model in the trial. lgbm_params is a JSON-serialized dictionary of LightGBM parameters used in the trial.
optuna_callbacks (list[Callable[[Study, FrozenTrial], None]] | None) – List of Optuna callback functions that are invoked at the end of each trial. Each function must accept two parameters with the following types in this order: Study and FrozenTrial. Please note that this is not a callbacks argument of lightgbm.train() .
model_dir (str | None) – A directory to save boosters. By default, it is set to None and no boosters are saved. Please set shared directory (e.g., directories on NFS) if you want to access get_best_booster() in distributed environments. Otherwise, it may raise ValueError. If the directory does not exist, it will be created. The filenames of the boosters will be {model_dir}/{trial_number}.pkl (e.g., ./boosters/0.pkl).
show_progress_bar (bool) –
Flag to show progress bars or not. To disable progress bar, set this False.

Note

Progress bars will be fragmented by logging messages of LightGBM and Optuna. Please suppress such messages to show the progress bars properly.
optuna_seed (int | None) –
seed of TPESampler for random number generator that affects sampling for num_leaves, bagging_fraction, bagging_freq, lambda_l1, and lambda_l2.

Note

The deterministic parameter of LightGBM makes training reproducible. Please enable it when you use this argument.
params (dict[str, Any])
train_set (lgb.Dataset)
num_boost_round (int)
valid_sets (list['lgb.Dataset'] | tuple['lgb.Dataset', ...] | 'lgb.Dataset' | None)
valid_names (Any | None)
feval (Callable[..., Any] | None)
feature_name (str | None)
categorical_feature (str | None)
keep_training_booster (bool)
callbacks (list[Callable[..., Any]] | None)
sample_size (int | None)

Methods

`compare_validation_metrics`(val_score, best_score)
`get_best_booster`()	Return the best booster.
`higher_is_better`()
`run`()	Perform the hyperparameter-tuning with given parameters.
`sample_train_set`()	Make subset of self.train_set Dataset object.
`tune_bagging`([n_trials])
`tune_feature_fraction`([n_trials])
`tune_feature_fraction_stage2`([n_trials])
`tune_min_data_in_leaf`()
`tune_num_leaves`([n_trials])
`tune_regularization_factors`([n_trials])

Attributes

`best_params`	Return parameters of the best booster.
`best_score`	Return the score of the best booster.

property best_params: dict[str, Any]: Return parameters of the best booster.

property best_score: float: Return the score of the best booster.

get_best_booster()[source]

Return the best booster.

If the best booster cannot be found, ValueError will be raised. To prevent the errors, please save boosters by specifying the model_dir argument of __init__(), when you resume tuning or you run tuning in parallel.

Return type:: lgb.Booster

run()

Perform the hyperparameter-tuning with given parameters.

Return type:: None

sample_train_set()

Make subset of self.train_set Dataset object.

Return type:: None