Today’s Class:–
- Geodesic Distance: – It is the shortest curve distance between two points.
- There are Python packages available for it. e.g. geopy, geokernels.
- We can use the imputation method to handle missing values in the dataset.
- I came to know about some impute methods such as i) Next or Previous Value. ii) K Nearest Neighbors. iii) Maximum or Minimum Value. iv) Missing Value Prediction. v) Most Frequent Value. vi) Average or Linear Interpolation.
- In this dataset, if we try to compare race with age, we need to perform a t-test more than 15 times as there are 7-8 variables. This is not a convenient way, so professor suggested to do analysis of variance. For that, we can use Anova test.
- Anova-test:- Is a test used to determine differences between research results from three or more unrelated samples or groups.
In Project,
Basis analysis of the dataset:-
- There are 8768 rows and 12 columns.
- Mean- 37.28, Std-12.99 , Min-2.00, Max-92.00
- There are 582 null values in name, 605 in age, 49 in gender, 210 armed, 57 city, 1190 in flee.