September 29, 2023 – mathematical statistics 522

I ran a model on a dataset to predict diabetes based on inactivity and obesity.
The model became overfit as I worked on it.
I used the scikit-learn to test the model after overfitting was identified in the model by performing k-fold cross validation on the data.
Producing a number of partitions of sample observations from the training dataset is the objective of cross validation.
Number of partitions are determined by the number of observations.
I used a different fold as the validation set each time I trained and evaluated the model after folding the data into K(5) folds.
Performance metrics from each fold are summed to determine the model’s generalization performance.
Following cross validation, it does prevent overfitting to some extent.