optuna_integration.lightgbm.LightGBMTunerCV
- class optuna_integration.lightgbm.LightGBMTunerCV(params, train_set, num_boost_round=1000, folds=None, nfold=5, stratified=True, shuffle=True, feval=None, feature_name='auto', categorical_feature='auto', fpreproc=None, seed=0, callbacks=None, time_budget=None, sample_size=None, study=None, optuna_callbacks=None, *, show_progress_bar=True, model_dir=None, return_cvbooster=False, optuna_seed=None)[source]
Hyperparameter tuner for LightGBM with cross-validation.
It employs the same stepwise approach as
LightGBMTuner
.LightGBMTunerCV
invokes lightgbm.cv() to train and validate boosters whileLightGBMTuner
invokes lightgbm.train(). See a simple example which optimizes the validation log loss of cancer detection.Note
Arguments and keyword arguments for lightgbm.cv() can be passed except
metrics
,init_model
andeval_train_metric
. Forparams
, please check the official documentation for LightGBM.The arguments that only
LightGBMTunerCV
has are listed below:- Parameters:
time_budget (int | None) – A time budget for parameter tuning in seconds.
study (optuna.study.Study | None) – A
Study
instance to store optimization results. TheTrial
instances in it has the following user attributes:elapsed_secs
is the elapsed time since the optimization starts.average_iteration_time
is the average time of iteration to train the booster model in the trial.lgbm_params
is a JSON-serialized dictionary of LightGBM parameters used in the trial.optuna_callbacks (list[Callable[[Study, FrozenTrial], None]] | None) – List of Optuna callback functions that are invoked at the end of each trial. Each function must accept two parameters with the following types in this order:
Study
andFrozenTrial
. Please note that this is not acallbacks
argument of lightgbm.train() .model_dir (str | None) – A directory to save boosters. By default, it is set to
None
and no boosters are saved. Please set shared directory (e.g., directories on NFS) if you want to accessget_best_booster()
in distributed environments. Otherwise, it may raiseValueError
. If the directory does not exist, it will be created. The filenames of the boosters will be{model_dir}/{trial_number}.pkl
(e.g.,./boosters/0.pkl
).show_progress_bar (bool) –
Flag to show progress bars or not. To disable progress bar, set this
False
.Note
Progress bars will be fragmented by logging messages of LightGBM and Optuna. Please suppress such messages to show the progress bars properly.
return_cvbooster (bool) – Flag to enable
get_best_booster()
.optuna_seed (int | None) –
seed
ofTPESampler
for random number generator that affects sampling fornum_leaves
,bagging_fraction
,bagging_freq
,lambda_l1
, andlambda_l2
.Note
The deterministic parameter of LightGBM makes training reproducible. Please enable it when you use this argument.
train_set (lgb.Dataset)
num_boost_round (int)
folds (Generator[tuple[int, int], None, None] | Iterator[tuple[int, int]] | 'BaseCrossValidator' | None)
nfold (int)
stratified (bool)
shuffle (bool)
feval (Callable[..., Any] | None)
feature_name (str)
categorical_feature (str)
fpreproc (Callable[..., Any] | None)
seed (int)
callbacks (list[Callable[..., Any]] | None)
sample_size (int | None)
Methods
compare_validation_metrics
(val_score, best_score)Return the best cvbooster.
higher_is_better
()run
()Perform the hyperparameter-tuning with given parameters.
Make subset of self.train_set Dataset object.
tune_bagging
([n_trials])tune_feature_fraction
([n_trials])tune_feature_fraction_stage2
([n_trials])tune_min_data_in_leaf
()tune_num_leaves
([n_trials])tune_regularization_factors
([n_trials])Attributes
Return parameters of the best booster.
Return the score of the best booster.
- get_best_booster()[source]
Return the best cvbooster.
If the best booster cannot be found,
ValueError
will be raised. To prevent the errors, please save boosters by specifying both of themodel_dir
and thereturn_cvbooster
arguments of__init__()
, when you resume tuning or you run tuning in parallel.- Return type:
lgb.CVBooster
- run()
Perform the hyperparameter-tuning with given parameters.
- Return type:
None
- sample_train_set()
Make subset of self.train_set Dataset object.
- Return type:
None