A Data Scientist at a retail company is using Amazon SageMaker to classify social media posts that mention the company into one of two categories:Posts that require a response from the company, and posts that do not. The Data Scientist is using a training dataset of 10, 000 posts, each of which contain the timestamp, author, and full text of each post. However, the Data Scientist is missing the target labels that are required for training.Which approach can the Data Scientist take to create valid target label data? (Select TWO.)

MACHINE LEARNING

Please wait while the activity loads.
If this activity does not load, try refreshing your browser. Also, this page requires javascript. Please visit using a browser with javascript enabled.

If loading fails, click here to try again

Question [CLICK ON ANY CHOICE TO KNOW THE RIGHT ANSWER]

A	Ask the social media handling team to review each post using Amazon SageMaker GroundTruth and provide the label
B	Use the sentiment analysis natural language processing library to determine whether a post requires a response
C	Use Amazon Mechanical Turk to publish Human Intelligence Tasks that ask Turk workers to label the posts
D	Use the a priori probability distribution of the two classes. Then, use Monte-Carlo simulation to generate the labels
E	Use K-Means to cluster posts into various groups, and pick the most frequent word in each group as its label

Explanation:

Detailed explanation-1: -What Amazon SageMaker option should the company use to train their ML models that reduces the management and automates the pipeline for future retraining? Create and train your XGBoost algorithm on your local laptop and then use an Amazon SageMaker endpoint to host the ML model.

Detailed explanation-2: -Step 2: Data Cleaning Next, this data flows to the cleaning step. To make sure the data paints a consistent picture that your pipeline can learn from, Cortex automatically detects and scrubs away outliers, missing values, duplicates, and other errors.

Detailed explanation-3: -Amazon SageMaker is a fully managed service that provides every developer and data scientist with the ability to build, train, and deploy machine learning (ML) models quickly.

Detailed explanation-4: -Which of the following methods DOES NOT prevent a model from overfitting to the training set? Early stopping is a regularization technique, and can help reduce overfitting. Dropout is a regularization technique, and can help reduce overfitting. Data augmentation can help reduce overfitting by creating a larger dataset.

There is 1 question to complete.

MCQ IN COMPUTER SCIENCE & ENGINEERING

COMPUTER SCIENCE AND ENGINEERING

MACHINE LEARNING