October 4, 2023 – mathematical statistics 522

Today we work on project.

We have three databases given %diabetes, %obesity, %inactivity. Apart from these databases we will be using region and %child and poverty dataset.
We will be predicting diabetes in this project.
In EDA part, we are going to find out correlation between i) obesity and diabetes ii) Inactivity and diabetes.
We find out mean, median, mode, avg, standard deviation, skewness and outliers.
Then we perform resampling methods on the data.
First we perform K-Fold Cross Validation. Then, Bootstrap.
As there are more than two variables we have to perform multi-linear regression in this case.
Then we check for homoscedasticity and heteroscedasticity for that we use breush-pagan test.