Today we work on project.
- We have three databases given %diabetes, %obesity, %inactivity. Apart from these databases we will be using region and %child and poverty dataset.
- We will be predicting diabetes in this project.
- In EDA part, we are going to find out correlation between i) obesity and diabetes ii) Inactivity and diabetes.
- We find out mean, median, mode, avg, standard deviation, skewness and outliers.
- Then we perform resampling methods on the data.
- First we perform K-Fold Cross Validation. Then, Bootstrap.
- As there are more than two variables we have to perform multi-linear regression in this case.
- Then we check for homoscedasticity and heteroscedasticity for that we use breush-pagan test.