APPLICATION OF SUPERVISED LEARNING
MACHINE LEARNING PIPELINE
Question
[CLICK ON ANY CHOICE TO KNOW THE RIGHT ANSWER]
|
You are preparing a large set of CSV data for a training job using K-Means. Which of the following are NOT actions that you should expect to take in this scenario?
|
Decide on the number of clusters you want to target.
|
|
Use a mean or median strategy to populate any missing label data.
|
|
Ensure that your IAM role has the iam:PassRole action.
|
|
Convert the data to protobuf RecordIO format.
|
|
Decide on the value you want to assign to k.
|
Explanation:
Detailed explanation-1: -K-Means which is one of the most used clustering methods and K-Means based on MapReduce is considered as an advanced solution for very large dataset clustering. However, the executing time is still an obstacle due to the increasing number of iterations when there is an increase of dataset size and number of clusters.
Detailed explanation-2: -The K-Means clustering algorithm fails to give good results when the data contains outliers, the density spread of data points across the data space is different, and the data points follow non-convex shapes.
There is 1 question to complete.