The field of data science is ever-evolving, spanning several industries and requiring an extensive skill set that includes mathematics, statistics, programming and marketing. As such, to be a data scientist requires an impressive blend of technical skill, creativity and communication. Job descriptions for data scientists can vary greatly, though all are seeking candidates with a long list of the desirable job skills like critical thinking, problem-solving, data analytics, emotional intelligence, attention to detail and teamwork.
This means that interview questions for data scientists can span several different topics and range from typical soft skills queries to extremely technical discussions.
Preparation is Key
Data science interviews require a lot of preparation. Whether you’re fresh out of college or you’re looking to shift to a different company or industry, you should take time to go over the major concepts of your work. Just as you know how to drive but might have trouble reciting specific rules of the road, you might get stuck in an interview trying to articulate how a specific algorithm works. To help you prepare, we’ve compiled some of the most common questions asked in an interview for the role of a data scientist. From early screenings to second and third-stage video and on-site interviews, you’ll encounter a wide variety of assessment of your technical skills, communication abilities and work style.
Question 1: Why do you want to work for this company?
Even if you were contacted directly through your online portfolio and invited to interview for an open position, the company will still want to know why you’ve accepted and why you think you’ll be a good fit for the job. Aside from brushing up on your technical skills, your preparation for the interview should include research on the business you’re applying to. Information about their industry, mission, staff, exactly what they do and how well they’re doing it will help you craft a specifically tailored response to this question.
Explain how your skill set will help your prospective employer to meet their goals. Find a way to express a passion about one or more aspects of your job role, including the company’s mission, philosophy, innovation or product line. If this is your dream job, it can be worth the time to put together a data science project ahead of the interview that solves a problem for them – like appealing to a new demographic or schedule deliveries more efficiently.
Question 2: Tell us more about the most recent project in your portfolio.
Be prepared in any data science interview to talk extensively about all elements of your CV, portfolio or website. Tailor your response about a project to suit your audience. If it’s an initial screening or a panel with participants from a variety of departments, your focus should be on the ways your work created positive results for the client and their business. When you get to the part of the interview process where you’re meeting with another data scientist, engineer, analyst or another technical person, a more detailed description of the data and processes involved in your work is required.
Question 3: How would you explain a recommendation engine to someone from the Marketing department?
One of the important qualities that set data scientists apart from other technical geniuses is the ability to convert, display and explain data in a way that non-technical people can understand. That makes a query like this one of the most important data scientist interview questions you’ll encounter. Interviewers want to see how well you can communicate concepts like data modelling, decision trees and linear regression to any audience. In this specific case, you’ll want to first explain in simple terms how a recommendation engine works, with examples of both content-based filtering and collaborative filtering. Then, discuss how you can work with the marketing department to combine their skills of appealing to customers with the power of the algorithm that uses collected data to help pinpoint what consumers want.
Question 4: Name data scientists you most admire and explain why.
Knowing the people who are prominent in the field as well as those currently making waves will show the interviewers that you are both knowledgeable and passionate about the industry. It’s useful to discuss data scientists who are valued in the specific area you’re applying for, like finance, medicine or the stock market.
This question is more than just an impressive list of names. The ‘why’ part of the equation will also show your prospective employer what you value in your field and how you’ll approach your work. If your research has shown that the company values innovation, integrity or even a certain statistical method, this is a great opportunity to let them know you share those same values.
Question 5: What are the differences between supervised and unsupervised learning?
The interviewer will want you to go into more detail, so it’s important to list the specific differences and be able to speak about the various algorithms used. You’ll also want to have some examples, either generic or from a specific project, you’ve worked on, to illustrate the differences between these two types of machine learning and in what instances each might be used. For instance, unsupervised learning may be used when launching a new product where the demographics of customer it might appeal to is unknown.
- uses known and labelled data as input
- has a feedback mechanism
- used for prediction
- its common algorithms include decision tree, logistic regression, linear regression, support vector machine and random forest.
- uses unlabelled data as input
- has no feedback mechanism
- used for analysis
- its common algorithms include K-means clustering, hierarchical clustering, auto encoders and association rules
Question 6: How do you avoid selection bias?
This question has many variations. You may be asked to define selection bias, how to avoid it or to give a specific example of how it played a role in a project you worked on. The main issue with selection bias is that conclusions have been drawn from a non-random sample. Obviously, the easiest solution is to always select from a random sample of a clearly defined population. You’ll need to elaborate on why that isn’t always possible.
Be mindful that selection bias can be intentional – with subject selection or data elimination purposely done to prove a pre-conceived theory or projection – this could be an indirect way for the hiring panel to ask one of those difficult interview questions about ethics and integrity at work. You’ll ultimately want to stress how selection bias is more often a case of unintentional or unavoidably biased data.
Elaborate on some of the areas where selection bias can occur, including sampling, time interval, data and attrition. Then give some examples of how leveraging techniques like re-sampling and boosting can help you work around non-random samples. If you’re in the portion of an interview when you’re speaking with representatives from less technical departments, use a simple example which clearly illustrates selection bias.
Question 7: How can outlier values be treated?
This is a common interview question for data scientists, as it reveals how you use the data you’re given, the methods you use to process that data and whether you’re willing to put in the time to evaluate each piece of that data. You’ll first want to talk about what constitutes an outlier, as numbers that exist way outside the cluster of data on a graph, as 2–3 standard deviations away from the mean, and so on.
The next step to dealing with outliers is evaluating why they happened. A small number of outliers that can be attributed to simple human or machine error are easily eliminated. Be sure to note, however, that even a single outlier can be a key data point rather than a problem, as it may indicate the success of a single marketing tactic, new drug ingredient or product line.
Next, you’ll want to explain how to deal with a large number of outliers, which requires more complex solutions. For example, you may need to change the model you’re using, normalise the data to the average or use a random forest algorithm. Once again, try to use a real-life case from your experience as a data scientist to explain the correct tactics.
Question 8: Why is data cleaning important?
Data collection and cleaning are a dominant part of your job as a data scientist, taking up to 80 per cent of your time. Whatever industry you’re applying to, the interview questions will always include one about why data cleaning is important. Interviewers will also ask about your preferred techniques and programs. You should stress why clean data is necessary to draw the correct conclusions, but it’s not just about the numbers. Explain how starting with complete, accurate, valid and uniform data directly impacts their business. Key benefits to discuss include:
- improved decision-making on company objectives
- faster customer acquisition and re-targeting of past customers
- time and resource savings due to eliminating inaccurate or duplicate data
- improved productivity
- boosts team morale thanks to repeated efficient and accurate results
Question 9: What is the goal of A/B testing?
One way to set yourself apart in answering these types of interview questions is to discuss how other data scientists might draw the wrong conclusions from A/B testing. Possible pitfalls include:
- not collecting enough data over a long enough period of time
- testing too many variables at once
- not accounting for external factors that could affect traffic during the testing period
- ignoring small gains that can build over time and combine with other positive changes for increased revenue
- missing big picture interpretations like net financial gains or losses relative to conversion rates
Aside from pointing out these problems, you’ll need to express how you would solve them – or, better still, how you already have avoided them in your previous data science projects.
Question 10: You have 48 hours to solve this coding challenge.
The coding challenge may be an initial way to screen potential data scientists, or it may be a second step in the interview process after you’ve cleared the first hurdle with a recruiter or hiring manager. This can be an on-site test that takes 30 minutes to 2 hours, where you’ll be coding on a whiteboard or at a keyboard within view of the interviewer. You’re often given a choice of language, but be prepared to code in SQL or Python.
Some companies assign longer tasks, with deadlines up to a week. Whiteboard challenges may require writing fairly simple SQL queries, but longer tests are, of course, more complex. Typically, you’ll be given data and asked to make specific predictions using that data, and you’ll have to show your work.
This can be a nerve-wracking interview experience, so prepare yourself by creating and completing practice coding challenges with friends or colleagues in the data science field. You can also visit sites like Leetcode and SQLZOO for coding exercises.
Interview questions for data scientists can be difficult, and the overall process being lengthy and gruelling. One of the most important things to consider is to stay positive, even if you feel that a portion of the interview process went poorly.
We’re often harder on ourselves than others, and you could still land the job despite not getting every answer as perfect as you would have liked. If you miss out on the opportunity, ask for feedback and use it to improve your next interview session. After all, many well-established data scientists were rejected from several positions and still went on to succeed in jobs that ultimately were the better fit!