Talk the Talk: Data Science Jargon for Everyone Else (Part 1: The Basics)

Bite-sized bits of data science for the non-data scientist

Disclaimer: All * terms to be defined at a later point. As well as many others

data scientist: a role that includes basic engineering, analytics, and statistics; often builds machine learning models

  • depending on the company, might be a product analyst, research scientists, statistician, AI specialist, or other
  • a job title made up by a guy at Facebook and a guy at LinkedIn trying to get better candidates for advanced analytics positions

in a sentence: We need to hire a data scientist!

data science: advanced analytics, plus coding and machine learning

in a sentence: We need to hire a data science team!

artificial intelligence: the ability for a machine to produce inference from input without human directive

  • used to describe everything from basic data science to self driving cars to Ava’s Ex Machina
  • no one aggress a definition
  • “AI is whatever hasn’t been done yet.” ~ Douglas Hofstadter 

in a sentence: We just got our AI startup funded–join our team and become our first data scientist!

machine learning: algorithms and statistical models that enable computers to uncover patterns in data

  • claims a large part of old school statistics as its own, plus some fun new algorithms
  • it’s probably logistic regression. or a random forest….or linear regression.
  • AI if you’re feeling fancy

in a sentence: We need you to machine learn [the core of our startup].

model: a hypothesized relationship about the data,  usually associated with an algorithm

  • may be used to refer to the algorithm itself, the relationship, a fit* model, the statistical model, maybe the mathematical model, perhaps Emily Ratajkowski, certainly not a small locomotive
  • it’s probably logistic regression. or random forest….or linear regression.

in a sentence: I’m training a model. (and not how to turn left)

feature: an input to the model; x value; a predictor; a column of input data

  • when a data scientist is ‘engineering,’ this is usually what they’re making
  • if your model says that wine points predicts price, then sommeliers’ ratings is your feature

in a sentence: Feature engineering will make or break this model.

label: the output of a predictive model; y value; the column of output data

  • the thing the startup got funded to predict