When performing an exploratory data analysis on a dataset that will be used for a logistic regression model, youve realized that you have one categorical feature with missing values. You decided to deal with that by imputing new data for the missing values. You know that the feature has this distribution for the valid categories:10% of missing values, 5% of A value, 70% of B value and 15% of C value. What would be the best approach for this imputation?

MACHINE LEARNING

Please wait while the activity loads.
If this activity does not load, try refreshing your browser. Also, this page requires javascript. Please visit using a browser with javascript enabled.

If loading fails, click here to try again

Question [CLICK ON ANY CHOICE TO KNOW THE RIGHT ANSWER]

When performing an exploratory data analysis on a dataset that will be used for a logistic regression model, you’ve realized that you have one categorical feature with missing values. You decided to deal with that by imputing new data for the missing values. You know that the feature has this distribution for the valid categories:10% of missing values, 5% of ‘A’ value, 70% of ‘B’ value and 15% of ‘C’ value. What would be the best approach for this imputation?

A	Impute ‘A’ for 50% of the missing values and ‘B’ for the rest.
B	Impute ‘B’ for all missing values
C	Impute ‘B’ for 50% of the missing values and ‘C’ for the rest.
D	Impute randomly one of the three values (A, B or C)

Explanation:

Detailed explanation-1: -Step 2: Data Cleaning Next, this data flows to the cleaning step. To make sure the data paints a consistent picture that your pipeline can learn from, Cortex automatically detects and scrubs away outliers, missing values, duplicates, and other errors.

Detailed explanation-2: -Feature engineering can help data scientists by accelerating the time it takes to extract variables from data, allowing for the extraction of more variables. Automating feature engineering will help organizations and data scientists create models with better accuracy.

Detailed explanation-3: -What Amazon SageMaker option should the company use to train their ML models that reduces the management and automates the pipeline for future retraining? Create and train your XGBoost algorithm on your local laptop and then use an Amazon SageMaker endpoint to host the ML model.

Detailed explanation-4: -3. How do we perform Bayesian classification when some features are missing? Explanation: When some features are missing, while performing Bayesian classification we don’t use general methods of handling missing values but we integrate the posteriors probabilities over the missing features for better predictions.

There is 1 question to complete.

MCQ IN COMPUTER SCIENCE & ENGINEERING

COMPUTER SCIENCE AND ENGINEERING

MACHINE LEARNING