The Data Scientist’s mindset

Once we have a clear idea of what Data Science is and what are some of its downsides and limitations, we can start to study what it really takes to be a Data Scientist.

In this post you will learn the main traits and qualities you need in Data Science and the mindset you have to develop to become a great Data Scientist.

Post content:

1. Data Science approach
2. Data Scientist qualities
3. Thinking like a Data Scientist
4. Wrong mindset consequences

1. Data Science approach

In our first approach to the field of Data Science is easy to get lost in lists of keywords and technologies that you must learn to become a Data Scientist, but there are several aspects we should learn before we start opening the toolbox.

If you are new to Data Science you probably start by writing on Google “How can I start learning Data Science” or similar. Then you open one of the first links of the search:

And get a list similar to the next one (most valuable skills to learn for a Data Scientist):

  • In-depth knowledge of Python coding. It is the most common language including Perl, Ruby etc.
  • Sound knowledge of SAS/R
  • It is must that Data scientist able to work with unstructured data. Whether it is coming from videos, social media, etc.
  • Sound skill in SQL database coding.
  • Data Scientist should have a good understanding of various analytical functions. For example rank, median, etc.
  • In-depth knowledge of Machine learning requires.
  • A Data scientist should be familiar with Hive, Mahout, Bayesian networks, etc. In data science, knowledge of MySQL is just like an added advantage.

While these skills are great to have for a Data Scientist (some of them are actually necessary as we will see), the first thing to do is learning to think like a Data Scientist, to inmerse in a problem, ask questions and make hypotheses, and only after that start thinking about the tools and technologies.

The languages you learn, the technologies you use, and the way you frame your thoughts will be a byproduct of your attempts to solve the problem.

2. Data Scientist qualities

It is important to note the difference between a skill and a quality or trait. You can find a good description of both in Do You Understand the Difference Between Skills and Traits?:

  • Skill: the ability to do something that comes from training, experience, or practice.
  • Trait: a quality that makes one person or thing different from another.

In Data Science, skills relate to programming languages, libraries, algorithms, etc. Traits are harder to specify to Data Science, because they are more general and can be related to every field. Next there is a list of some of the qualities of a great Data Scientist.

2.1. Curiosity

The work of a Data Scientist is in a way similar to the work of a detective: to ask questions about data and people.

5.1 - Data science detective

It is a capital mistake to theorize before one has data. Insensibly one begins to twist facts to suit theories, instead of theories to suit facts – Sherlock Holmes

Curiosity is a quality related to inquisitive thinking such as exploration, investigation, and learning, evident by observation in humans and other animals. Curiosity is heavily associated with all aspects of human development, in which derives the process of learning and desire to acquire knowledge and skill.

When first approaching a problem or a dataset, a curious Data Scientist will come up with lots of questions and ideas to investigate. This sense of wonder is what keeps the Data Scientist looking for explanations and discoveries about how something works or why is working that way.

The dominant trait among data scientists is an intense curiosity, a desire to go beneath the surface of a problem, find the questions at its heart, and distill them into a very clear set of hypotheses that can be tested – DJ Patil

Curiosity is also necessary when it comes to keep up to date in the field. Data Science is an ever evolving field, and curiosity helps you to build and maintain the necessary skills.

2.2. Creativity

Creativity is the act of turning new and imaginative ideas into reality. Creativity is characterised by the ability to perceive the world in new ways, to find hidden patterns, to make connections between seemingly unrelated phenomena, and to generate solutions.

5.2 - Data science creativity

The ability to find hidden patterns and to make connections is a great trait to have in Data Science, it allows you to make new questions and to find solutions in a new, sometimes unorthodox way. Creativity means being able to come up with something new.

Creativity allows the Data Scientist to think “outside the box”, to look at the problems in a new way and to perceive patterns that are not obvious. The application of creativity to problem solving is known as Lateral Thinking.

Lateral thinking is solving problems through an indirect and creative approach, using reasoning that is not immediately obvious and involving ideas that may not be obtainable by using only traditional step-by-step logic.

2.3. Pragmatism

As a Data Scientist working in a project, is important to be practical and to consider the resources, time and effort needed for a given solution. It’s easy to be tempted of using the latest technology or the most complex algorithm to solve any problem, but a good Data Scientist always keeps in mind what is important for the project and/or the company.

One example of pragmatism in Data Science is the concept of ROI (Return On Investment), a performance measure used to evaluate the efficiency of an investment or to compare the efficiency of a number of different investments. A great Data Scientist will have a data strategy in which the ROI is at the center. From The ROI of a Modern Data Strategy:

A modern data strategy will identify the optimal projects and corresponding implementation order so you can get the fastest ROI. In terms of measuring ROI, one of the best places to start is by looking at decreased costs or increased revenue that result from each project.

5.3 - ROI roadmap project.png

The key (and hardest) part is finding the balance between pragmatism and other desirable qualities such as creativity and work quality. This balance is a little bit like an art, there is a clear tradeoff between perfect, innovative work and limited resources and time.

2.4. Persistence

Data Science is hard. After many days (even months) of work you could end up with an irrelevant solution to your problem. You need to be persistent to stick with a frustrating problem .

Data do not give up their secrets easily. They must be tortured to confess.

Sometimes, specially in big companies, a Data Scientist has to deal with bureaucracy and sticks in the wheels of their projects and ideas. Persistence is necessary here to make you heard and stick with your idea.

Persistence is also known as grit:

Grit is that inner drive that pulls top data scientists over obstacles, recasts setbacks as design constraints, propels them through fear of failure, keeps them walking through actual failure, helps them to resist the impulse to take things personally, and brushes that dirt off their shoulders.

2.5. Clarity

Clarity refers to understanding what we’re doing and why we’re doing it at every step of a Data Science project. As the name implies, Data Science is a science, and the worst thing a Data Scientist can do is to just try things at random with the data.

If you have a clear understanding of the What‘s and Why‘s, it would be easy to explain your work to someone else, specially to non data-related audiences. This is key for any Data Scientist, because, as we saw in our first article, the ability to communicate clearly and effectively about the patterns found in data is a must in Data Science.

A great data scientist can contextualize and translate a problem and its solution to interested parties of wildly varying backgrounds using common ground, metaphor, skillful listening, and storytelling. This includes the written communication that goes into a statement of work or a report, visual communication for clear and intuitive plots and visualization, and spoken communication for presentations, project specifications, check-in meetings, and iterative design.

2.6. Skepticism

Skepticism is generally any questioning attitude or doubt towards one or more items of putative knowledge or belief.

As pragmatism, skepticism also sits on the other side of the balance with creativity. While being creative, a Data Scientist must keep at least one foot on the ground, and that is exactly what some skepticism is for.

This skeptical attitude comes to mind when quoting the statistician George E. P. Box:

All models are wrong but some are useful.

5.4 - Data skepticism

From the ebook On Being a Data Skeptic:

“Data is here, it’s growing, and it’s powerful.” Author Cathy O’Neil argues that the right approach to data is skeptical, not cynical––it understands that, while powerful, data science tools often fail. Data is nuanced, and “a really excellent skeptic puts the term ‘science’ into ‘data science.'” The big data revolution shouldn’t be dismissed as hype, but current data science tools and models shouldn’t be hailed as the end-all-be-all, either.

2.7. Humility

Because of the high level of knowledge required, a Data Scientist could be tempted to feel some kind of intellectual superiority against others. It is common to read about Data Unicorns or Data Wizards, and while mos of the times this comes with some dose of humour, it is important to the Data Scientist to be humble.

Humility allows you to improve and keep looking for new things to learn, because you can recognise that you don’t know everything (and probably you are far from it).

A humble Data Scientist will also adapt to the audience, setting aside pride and exaplining the project without any fancy vocabulary if not necessary.

5.5 - Data science unicorn.png

You don’t have to be a unicorn. We’re looking for people who have one of the major skill sets and some comfort level with the others ; the ability to be creative, handle ambiguity and communicate well.

3. Thinking like a Data Scientist

3.1. Analytical thinking

The ability to think analytically is mandatory to any Data Scientist. Data Science is about solving problems, so a great Data Scientistist spend most of their time asking questions and looking for new problems to solve.

From the traits listed in the previous section, curiosity and skepticism are the ones with more influence in the way a Data Scientist thinks:

  • Curiosity is what keeps a Data Scientist asking questions and then asking some more (What?, Why? and So what?). These questions are the foundation of the hypotheses that become great problems to solve.
  • Skepticism is what allows a Data Scientist to see things with a critical eye, to question everything that is given to him. It is very important to be aware of any assumptions derived from the data.

The real essential skill of a data scientist is the ability to understand the business and the problem, and the intellectual curiosity to want to do so.

3.2. Think like a Bayesian

Thinking like a Bayesian refers to update your personal belief as new information arises. To form new beliefs you must incorporate both newly observed information and prior information formed through intuition and experience.

5.6 - Bayesian inductive cycle

It is common to base decisions on what happened most recently (this is called recency bias), but if you are able to think adopting a Bayesian approach you will improve your understanding of the world with each new piece of evidence.

Another way this thinking approach helps is in preventing from absolut certainty. There is no such thing as perfect certainty, and as a Data Scientist you should think about everything in terms of how probable it is.

3.3. Business communication

Relating to technology, a Data Scientist should think carefully before using a tool (language, algorithm, etc.) The Data Scientist’s toolbox is not useful until you have a clear understanding about the problem to solve.

Before languages and technologies, a Data Scientist should develop some business acumen and be able to comfortably explain to their non-expert colleagues their findings and solutions. Without this, the job of the Data Scientist will not reach the potential impact of the actionable insights.

A Data Scientist has to think of himself like an ambassador between the data and the company.

From The one language a Data Scientist must master….:

[Data Scientists] have to be able to present their findings in a clear and simple way – in the language of their business. Not all people understand the technical jargon. [Data Scientists] who can explain what they have achieved without blowing my mind with jargon are those who usually go far. Accurate numbers and graphs are one thing, but only the Data Scientist understands them well enough to be able to draw the crucial business conclusions. They have to interpret and translate.

4. Wrong mindset consequences

When, as a Data Scientist, you fail to have the right mindset, you can be victim of one (or several) of the multiples cognitive biases.

A cognitive bias is a systematic pattern of deviation from norm or rationality in judgment. Individuals create their own “subjective social reality” from their perception of the input. An individual’s construction of social reality, not the objective input, may dictate their behaviour in the social world. Thus, cognitive biases may sometimes lead to perceptual distortion, inaccurate judgment, illogical interpretation, or what is broadly called irrationality.

5.7 - Cognitive biases

A common example of cognitive bias, related to the think like a Bayesian approach, is the base rate fallacy. Base rate fallacy states that the mind tends to ignore related base rate information (i.e. generic, general information) and focus on specific information (information pertaining only to a certain case) when presented with both.

With the wrong mindset it is also easy to forget that, in Data Science, data is important, but science is key. Or, as stated in Simply Statistics blog, The key word in “Data Science” is not Data, it is Science.

Data science is only useful when the data are used to answer a question. That is the science part of the equation. The problem with this view of data science is that it is much harder than the view that focuses on data size or tools.

This is the third of a series of articles that will introduce you to the data science field. In following articles we will describe the tools and technologies needed for becoming a data scientist.

If you like what you just read, please share it and make sure you are subscribed to our email newsletter.

One thought on “The Data Scientist’s mindset

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s