October 11, 2023

Today’s Class:

  1. Geodesic Distance: – It is the shortest curve distance between two points.
  2. There are Python packages available for it. e.g. geopy, geokernels.
  3. We can use the imputation method to handle missing values in the dataset.
  4. I came to know about some impute methods such as i) Next or Previous Value. ii) K Nearest Neighbors. iii) Maximum or Minimum Value. iv) Missing Value Prediction. v) Most Frequent Value. vi) Average or Linear Interpolation.
  5. In this dataset, if we try to compare race with age, we need to perform a t-test more than 15 times as there are 7-8 variables. This is not a convenient way, so professor suggested to do analysis of variance. For that, we can use Anova test.
  6. Anova-test:- Is a test used to determine differences between research results from three or more unrelated samples or groups.

In Project,

Basis analysis of the dataset:-

  1. There are 8768 rows and 12 columns.
  2. Mean- 37.28, Std-12.99 , Min-2.00, Max-92.00
  3. There are 582 null values in name, 605 in age, 49 in gender, 210 armed, 57 city, 1190 in flee.

Leave a Reply

Your email address will not be published. Required fields are marked *