Top 10 Junior Data Science Interview Questions & Answers

Here is a list with 10 of the most common data science interview questions for junior positions. Interviewers usually focus on Statistics fundamentals and Machine Learning concepts.

Q1. What two parameters define a normal distribution?

Answer: It’s mean and it’s standard deviation.

Q2. What is One Hot Encoding?

Answer: The process to encode/transform data into an sparse vector in which one element is set to 1 and all other elements are set to 0.

Q3. What is a residual?

Answer: It’s the difference between the observed value and the predicted value of the quantity of interest.

Q4. Explain the basic concept random forest?

Answer: An ensemble approach to finding the decision tree that best fits the training data by creating many decision trees and then determining the “average” one.

Q5. What is Dimensionality Reduction?

Answer: It’s the process of reducing the number of variables under consideration by obtaining a set of principal components.

Q6. What does “random” means in the Random Forest term?

Answer: The “random” part of the term refers to building each of the decision trees from a random selection of features.

Q7. Give three techniques for handling missing values.

Answer: Imputation, predicting the missing values and if there are just a few of missing values you can delete the rows with missing values.

Q8. What is unsupervised learning?

Answer: Unsupervised learning aims to detect patterns in data where no labels are given.

Q9. What is R-Squared?

Answer: R-Squared is a statistical measure of how close the data are to the fitted regression line. It is also known as the coefficient of determination.

Q10. Why is naive bayes naive?

Answer: Because it assumes that all of the features in a data set are equally important and independent.

Want to keep practicing?

Practice with +190 interview questions carefully crafted by experienced data scientist at datasciencetrivia.com

Follow me on twitter to get more questions like this on your feed.

Physicist turned data scientist. Creator of http://datasciencetrivia.com a Q&A card game to learn key data science concepts by playing.