Here is a list with 10 of the most common data science interview questions for junior positions. Interviewers usually focus on Statistics fundamentals and Machine Learning concepts.
Q1. What two parameters define a normal distribution?
Answer: It’s mean and it’s standard deviation.
Q2. What is One Hot Encoding?
Answer: The process to encode/transform data into an sparse vector in which one element is set to 1 and all other elements are set to 0.
Q3. What is a residual?
Answer: It’s the difference between the observed value and the predicted value of the quantity of interest.
Q4. Explain the basic concept random forest?
Answer: An ensemble approach to finding the decision tree that best fits the training data by creating many decision trees and then determining the “average” one.
Q5. What is Dimensionality Reduction?
Answer: It’s the process of reducing the number of variables under consideration by obtaining a set of principal components.
Q6. What does “random” means in the Random Forest term?
Answer: The “random” part of the term refers to building each of the decision trees from a random selection of features.
Q7. Give three techniques for handling missing values.
Answer: Imputation, predicting the missing values and if there are just a few of missing values you can delete the rows with missing values.
Q8. What is unsupervised learning?
Answer: Unsupervised learning aims to detect patterns in data where no labels are given.
Q9. What is R-Squared?
Answer: R-Squared is a statistical measure of how close the data are to the fitted regression line. It is also known as the coefficient of determination.
Q10. Why is naive bayes naive?
Answer: Because it assumes that all of the features in a data set are equally important and independent.
Want to keep practicing?
Practice with +190 interview questions carefully crafted by experienced data scientist at datasciencetrivia.com
Follow me on twitter to get more questions like this on your feed.