Top 10 Junior Data Science Interview Questions & Answers

Here is a list with 10 of the most common data science interview questions for junior positions. Interviewers usually focus on Statistics fundamentals and Machine Learning concepts.

Q1. What two parameters define a normal distribution?

Answer: It’s mean and it’s standard deviation.

Q2. What is One Hot Encoding?

Answer: The process to encode/transform data into an sparse vector in which one element is set to 1 and all other elements are set to 0.

Q3. What is a residual?

Answer: It’s the difference between the observed value and the predicted value of the quantity of interest.

Q4. Explain the basic concept random forest?

Answer: An ensemble approach to finding the decision tree that best fits the training data by creating many decision trees and then determining the “average” one.

Q5. What is Dimensionality Reduction?

Answer: It’s the process of reducing the number of variables under consideration by obtaining a set of principal components.

Q6. What does “random” means in the Random Forest term?

Answer: The “random” part of the term refers to building each of the decision trees from a random selection of features.

Q7. Give three techniques for handling missing values.

Answer: Imputation, predicting the missing values and if there are just a few of missing values you can delete the rows with missing values.

Q8. What is unsupervised learning?

Answer: Unsupervised learning aims to detect patterns in data where no labels are given.

Q9. What is R-Squared?

Answer: R-Squared is a statistical measure of how close the data are to the fitted regression line. It is also known as the coefficient of determination.

Q10. Why is naive bayes naive?

Answer: Because it assumes that all of the features in a data set are equally important and independent.

Want to keep practicing?

Practice with +190 interview questions carefully crafted by experienced data scientist at

Follow me on twitter to get more questions like this on your feed.

Physicist turned data scientist. Creator of a Q&A card game to learn key data science concepts by playing.