MCQ IN COMPUTER SCIENCE & ENGINEERING

COMPUTER SCIENCE AND ENGINEERING

MACHINE LEARNING

Question [CLICK ON ANY CHOICE TO KNOW THE RIGHT ANSWER]
If the given dataset contains 100 observations out of 50 belongs to class1 and other 50 belongs to class2. What will be the entropy of the given dataset?
A
0
B
1
C
-1
D
0.5
Explanation: 

Detailed explanation-1: -A dataset with a 50/50 split of samples for the two classes would have a maximum entropy (maximum surprise) of 1 bit, whereas an imbalanced dataset with a split of 10/90 would have a smaller entropy as there would be less surprise for a randomly drawn example from the dataset.

Detailed explanation-2: -If we have a set with k different values in it, we can calculate the entropy as follows: Where P(valuei) is the probability of getting the ith value when randomly selecting one from the set. 16 instances: 9 positive, 7 negative. This makes sense – it’s almost a 50/50 split; so, the entropy should be close to 1.

Detailed explanation-3: -This is considered a high entropy, a high level of disorder ( meaning low level of purity). Entropy is measured between 0 and 1. (Depending on the number of classes in your dataset, entropy can be greater than 1 but it means the same thing, a very high level of disorder.

Detailed explanation-4: -The entropy is 0 if all samples of a node belong to the same class, and the entropy is maximal if we have a uniform class distribution. In other words, the entropy of a node (consist of single class) is zero because the probability is 1 and log (1) = 0.

There is 1 question to complete.