Introduction

When I started this blog, the first article I published was titled “What is data science?” Now that I have some experience in the field, I thought it would be interesting to explore my current perspective on some commonly used, but frequently undefined terms. The terms that I am choosing to focus on are the following:

  • Analytics
  • AI (Artificial Intelligence)
  • Machine learning
  • Supervised, unsupervised and semi-supervised learning
  • Deep learning
  • Simulation
  • Probability
  • Statistics

Since my first post, I have discovered that people have their own fairly unique perspectives and “mental map” of how the terms, techniques and disciplines above interrelate. The following is the mental map that I currently use. In many cases, the terms here overlap with each other. There are some terms that are, perhaps, missing altogether such as Optimization. I fully realize these facts. I further realize that, as these fields mature and new fields emerge (such as quantum computing), my map will likely change to reflect the “state of the art and science.”

A single image is worth a thousand words.

An Analytics Perspective

'An Analytics Perspective'

Comments about the diagram…

As mentioned above, I choose to use the all encompassing term of “Analytics”. My choice of this term is based on the fact that I have a Master of Science Degree from the Georgia Institute of Technology… in “Analytics”.

At the current time, I see Analytics consisting of Artificial Intelligence, Simulation, and Probability and Statistics. I am considering replacing “Simulation” with the broader term “Operations Research.” I think this substitution will allow me to more clearly place “Optimization” on the diagram.

I view Statistics and Probability as ways to describe the relationship between a population and samples of the population. In probability, we often make statements about the likelihood of some event occurring based on some preconditions. For example, what is the probability that I will pick a green marble out of a bag that contains 10 green marbles, and 10 red marbles? In the reverse case, we might have, say 20 individual samples that we have drawn at random for a big bag of marbles. From these samples, we may want to think about how many of a certain color of marbles there are in the overall population. When we talk about going from populations to samples, we use probability. When we make inferences about populations based on samples, we use statistics.

To me, the term Data Science is redundant. The scientific method involves collecting data by conducting experiments in order to prove or disprove a hypothesis. Based on this definition, it seems to me that for anything to be considered science, it is also data science. You cannot take the data out of science without it becoming something else.

Wikipedia defines data science as: “Data science is an interdisciplinary field that uses scientific methods, processes, algorithms and systems to extract knowledge and insights from noisy, structured and unstructured data,and apply knowledge and actionable insights from data across a broad range of application domains. Data science is related to data mining, machine learning and big data.”

Yeah, ok. To me, the above just seems like an attempt to glorify the term “data science.” That said, I am ok with someone saying “I am doing data science!”, if they are using data and following the scientific method.

The fact that I am being a bit negative about the term “data science” on a blog that is hosted at datascience.netlify.app isn’t lost on me. In full disclosure, I should probably also mention that my current job title is “Lead Data Scientist”. When I first started writing this blog, I was very excited about the term “Data Science”. As I have gained more knowledge in this space, I have also come to realize how huge it is and how much there is that I don’t know. I am tempted to say that, if I had it to do all over again, I may not use “datascience” in the name of my blog. That, however, wouldn’t be truthful. The term “data science”, for better or worse, seems to have stuck. Call it marketing, I guess. I plan on continuing to write about topics in Analytics on my blog: “datascience.netlify.app”!

Disclaimer

Again, this “mental map” is based on my academic work and what I see in my career. It likely will not fit with everyone else, and I am just fine with other definitions.