FUNDAMENTALS OF COMPUTER

DATABASE FUNDAMENTALS

BASICS OF BIG DATA

Question [CLICK ON ANY CHOICE TO KNOW THE RIGHT ANSWER]
In which of the following scenario a gain ratio is preferred over Information Gain?
A
When a categorical variable has very small number of category
B
When a categorical variable has very large number of category
C
Number of categories is the not the reason
D
None of the mentioned
Explanation: 

Detailed explanation-1: -When high cardinality problems, gain ratio is preferred over Information Gain technique.

Detailed explanation-2: -Q28) Which splitting algorithm is better with categorical variable having high cardinality? When high cardinality problems, gain ratio is preferred over any other splitting technique.

Detailed explanation-3: -The Post-pruning technique allows the decision tree model to grow to its full depth, then removes the tree branches to prevent the model from overfitting. Cost complexity pruning (ccp) is one type of post-pruning technique.

Detailed explanation-4: -How does the Random Forest algorithm work? Step 1: It selects random data samples from a given dataset. Step 2: Then, it constructs a decision tree for each sample and considers all predicted outputs of those decision trees. Step 3: With the help of voting, it picks the most voted result of those decision trees.

Detailed explanation-5: -Random forest algorithm avoids and prevents overfitting by using multiple trees. The results are not accurate. This gives accurate and precise results. Decision trees require low computation, thus reducing time to implement and carrying low accuracy.

There is 1 question to complete.