Q1. What two parameters define a normal distribution?

Answer: It’s mean and it’s standard deviation.

Q2. What is One Hot Encoding?

Answer: The process to encode/transform data into an sparse vector in which one element is set to 1 and all other elements are set to 0.

Card with question: what is one hot encoding and it’s answer

Q3. What is a residual?

Answer: It’s the difference between the observed value and the predicted value of the quantity of interest.

Q4. Explain the basic concept random forest?

Answer: An ensemble approach to finding the decision tree that best fits the training data by creating many decision trees and then determining the “average” one.

Q5. What is Dimensionality Reduction?

Answer: It’s the process of reducing the number of variables under consideration by obtaining a set of principal components.

Q6. What does “random” means in the Random Forest term?

Answer: The “random” part of the term refers to building each of the decision trees from a random selection of features.

Q7. Give three techniques for handling missing values.

Answer: Imputation, predicting the missing values and if there are just a few of missing values you can delete the rows with missing values.

Q8. What is unsupervised learning?

Answer: Unsupervised learning aims to detect patterns in data where no labels are given.

Q9. What is R-Squared?

Answer: R-Squared is a statistical measure of how close the data are to the fitted regression line. It is also known as the coefficient of determination.