As Duke Economics professor Dan Ariely once famously said, big data is a lot like teenage sex: everyone talks about it, no one really knows how to do it, everyone thinks everyone else is doing it, so everyone claims they are doing it. As you might have heard, or felt in your own recruiting efforts, Data Scientists are a veritable rara avis to find, let alone recruit. However, one could argue that a large reason for this perceived scarcity is the fact that many people think of Data Science as a catch-all role, and are consequently looking for Data Scientists who can build and maintain a data warehouse, can set up a data pipeline for analysis, run analyses that reveal groundbreaking insights every time, then turn around and build resilient production systems running perfectly optimized machine learning algorithms that automate said analyses and predictions and run them seamlessly every time a customer logs in.
Talk about wishful thinking. Given this job description, it is no wonder that positions remain unfilled for months and even years at a time, while companies sit on unproductive big data assets that could otherwise give them a solid competitive edge in the marketplace.
On the contrary, in my experience, great data scientists are rarely lone wolves, working in solitude and only emerging occasionally to utter brilliant insights. In most companies that use it successfully, data science is a team sport, where data analysts, engineers and scientists work together.
Here is a brief overview of what each of these professionals do, and how they complement each other.
Data analysts are interpreters of structured data. They are spreadsheet whizzes, and most even write SQL queries to extract data from relational databases. Data analysts can be found in many functional groups within a company, including finance, marketing, operations, and business intelligence.
While this role does not get nearly the same level of publicity as the more glamorous-sounding data scientist and data engineer, it can be a very fulfilling job in the right company, as well as a gateway to higher-level analytics positions. Data analysts can have a lot of freedom in choosing the directions they take in their analyses, and they often have the opportunity to see their work directly informing management decision on a daily basis, which can be very satisfying, and a great source of professional pride.
What is more, as a data analyst, you will develop highly transferable skills, which you can apply to many other roles in a variety of industries if you ever wish to change tracks in your career. In addition, this role offers exposure to a variety of tools and analysis techniques, which not only increase your marketability as a data analyst, but are also useful in data science work, which can often be the next step on the career ladder for a seasoned analyst.
Data analysts typically have a bachelor degree, though that is not always required if you are able to convey skills you have acquired in your previous job experience as relevant to this role.
Some good sources for further reading on doing analytics are the following:
(The first two of these are available for free if you have a Kindle Unlimited subscription, and Anil’s book is only available in ebook format, though it is totally worth a read.)
- Data Analytics for Beginners by Paul Kinley
- Data Analysis Made Accessible by Anil Maheshwari
- Storytelling with Data: A Data Visualization Guide for Business Professionals by Cole Nussbaumer Knaflic
It has been said that a data scientist is someone who is better at statistics than any software engineer, and better at software engineering than any statistician. This saying is actually not very far from the truth, in my opinion. However, I would add that a data scientist should also be able to make their results and findings accessible to non-technical audiences, such that business stakeholders can rally around the findings and data products put forth, and see to it that they are used effectively for the benefit of the organization.
In short, data scientists are interpreters of unstructured data. A data scientist is typically able to fetch data from public APIs, integrate heterogeneous data from multiple sources, clean it, and extrapolate from it to fill in missing values. Afterwards, they are able to formulate hypotheses and test them through the use of math, statistics, visualization and predictive modeling. Once they see results, data scientists then communicate them to stakeholders, working with them to translate these results into business action items.
Many data scientists working in industry have a PhD or other advanced degrees, but I have also met many accomplished data science practitioners who started in the job with only a bachelor’s degree and relevant work experience.
In terms of further reading, the following books cover everything from intro to advanced topics in data science. Master the concepts in these three books, and you will know more than 99% of all Data Scientists out there.
- Data Science from Scratch: First Principles with Python by Joel Grus
- Programming Collective Intelligence by Toby Segaran
- Doing Data Science: Straight Talk from the Frontline by Cathy O'Neil and Rachel Schutt
Data engineers are usually data infrastructure engineers, who are responsible for building and maintaining the infrastructure that transports and houses big data. A data engineer will be the one setting up and configuring a Hadoop cluster, building a Spark Streaming pipeline, or migrating a company’s data assets to a public cloud service such as AWS. In some companies, Machine Learning engineers are also called data engineers, though role requirements could be vastly different.
Most of the data engineers I’ve met have started out as backend or full stack developers who have developed an interest in data technologies, and have taught themselves Hadoop, Spark and/or AWS before transitioning to data engineering. Advanced degrees are typically not required for this role.
For further reading on data engineering technologies and how to get started with each, here are some books I recommend: