MACHINE LEARNING

APPLICATION OF SUPERVISED LEARNING

MACHINE LEARNING PIPELINE

Question [CLICK ON ANY CHOICE TO KNOW THE RIGHT ANSWER]
When performing an exploratory data analysis on a dataset that will be used for a logistic regression model, you’ve realized that you have one categorical feature with missing values. You decided to deal with that by imputing new data for the missing values. You know that the feature has this distribution for the valid categories:10% of missing values, 5% of ‘A’ value, 70% of ‘B’ value and 15% of ‘C’ value. What would be the best approach for this imputation?
A
Impute ‘A’ for 50% of the missing values and ‘B’ for the rest.
B
Impute ‘B’ for all missing values
C
Impute ‘B’ for 50% of the missing values and ‘C’ for the rest.
D
Impute randomly one of the three values (A, B or C)
Explanation: 

Detailed explanation-1: -If you keep the value of k as 2, it gives the lowest cross validation accuracy.

Detailed explanation-2: -Variable identification: define each variable and its role in the dataset. Univariate analysis: for continuous variables, build box plots or histograms for each variable independently; for categorical variables, build bar charts to show the frequencies. More items

There is 1 question to complete.