Talk the Talk: Data Science Jargon for Everyone Else (Part 1: The Basics)

Bite-sized bits of data science for the non-data scientist

Disclaimer: All * terms to be defined at a later point. As well as many others

data scientist: a role that includes basic engineering, analytics, and statistics; often builds machine learning models

  • depending on the company, might be a product analyst, research scientists, statistician, AI specialist, or other
  • a job title made up by a guy at Facebook and a guy at LinkedIn trying to get better candidates for advanced analytics positions

in a sentence: We need to hire a data scientist!

data science: advanced analytics, plus coding and machine learning

in a sentence: We need to hire a data science team!

artificial intelligence: the ability for a machine to produce inference from input without human directive

  • used to describe everything from basic data science to self driving cars to Ava’s Ex Machina
  • no one aggress a definition
  • “AI is whatever hasn’t been done yet.” ~ Douglas Hofstadter 

in a sentence: We just got our AI startup funded–join our team and become our first data scientist!

machine learning: algorithms and statistical models that enable computers to uncover patterns in data

  • claims a large part of old school statistics as its own, plus some fun new algorithms
  • it’s probably logistic regression. or a random forest….or linear regression.
  • AI if you’re feeling fancy

in a sentence: We need you to machine learn [the core of our startup].

model: a hypothesized relationship about the data,  usually associated with an algorithm

  • may be used to refer to the algorithm itself, the relationship, a fit* model, the statistical model, maybe the mathematical model, perhaps Emily Ratajkowski, certainly not a small locomotive
  • it’s probably logistic regression. or random forest….or linear regression.

in a sentence: I’m training a model. (and not how to turn left)

feature: an input to the model; x value; a predictor; a column of input data

  • when a data scientist is ‘engineering,’ this is usually what they’re making
  • if your model says that wine points predicts price, then sommeliers’ ratings is your feature

in a sentence: Feature engineering will make or break this model.

label: the output of a predictive model; y value; the column of output data

  • the thing the startup got funded to predict

Talk the Talk: A Short Story on Why You Need at Least the Bare Bones of Data Science Jargon

This is a true, grimace-inducing story from a past position. Names and details have been changed to protect the innocent…and everyone else involved.

I’m running a model and taking a late lunch in a near empty cafeteria when my (then) boss slacks me. He wants an immediate video call. Like many, Adam was an engineering manager-turned-data science manager. He was very hands-off for the most part. But, he was prone to sudden bursts of interest.

“Where are you in your project?” His tone and lack of greeting make it clear he is stressed.

Adam’s background didn’t lend him to data science jargon, so I had long before learned to explain my work in straightforward language–particularly when he was already frustrated.

“I’m running a model now. I just finished making more columns–err–input variables–to improve predictions.”

“How long will that take?”

“We’re already using predictions from an early model, but I’ve been working back and forth with some of the subject matter experts, creating new input data, training models, improving performance…”

“Well when are you going to have a deliverable?” he’s agitated.

“We’re already using the predictions. I’m just working on improving–“

“But when will you have something to ship?” he interrupts. “James has already shipped four features this quarter.”

I stare at the screen. He glares back at me from beneath the finger-print smudges, brows knit and demanding an explanation.

James was a data scientist on a different team. We were friendly and caught up regularly around the modern day water cooler (READ: pamplemousse La Croix-filled fridge). As is often the case, we kept abreast of each other’s work out of vague collaborative intention, common interest in the company, and a general sense of camaraderie. Also, snacks.

So, I happen to know that James had already accomplished much more that quarter than building a four columns.

But why is Adam so impressed by feature engineering a few columns? And why is he calling them “shipped”?

“I need to see what features you’ve shipped.” He’s looking down at his phone, the edge of which is in view of the camera.

Why is he calling them “shipped”?

“Are you working on the website or the app?”

What?

“You need to be shipping features.” He gestures with his phone still in his hand.

Why…….ohhhhh shit.

Where do I begin?

“So…a feature–in data science–is a column of input data,” I explain.

Blank stare.

A pair of analysts I know sit at my table. Extra shit. I check my pocket for headphones. No luck.

“What do you mean, a feature in data science?” his aggression is waning but I scoot to the edge of the table nonetheless.

“It’s a column of data–the input values for a model.” I pause to see where that lands. “And feature engineering means building columns of input data. James built four columns of input data.” And to give credit where it’s due, “…He’s also done a lot more than that.”

Adam’s looking at me suspiciously, but not altogether in disbelief.

“In data science a feature isn’t like….a button on an app or a drop down menu or something.” The change in his facial expression tells me–yep, that’s what he was thinking. His phone drops out of sight. “A feature is a predictor–an input value.” I searched for another way to say it. “A column of X.”

Nothing.

“Ok, so today I finished creating change-in-weekly-spend columns because I think they might help predict churn.”

He looks at me blankly. This is getting nowhere.

“So, the features are change-in-weekly-spend. I made one for each of the 10 weeks prior to a set of given dates.” Pause. “So I feature engineered those input values. I’m training a model using them now.”

Silence.

“I shipped 10 features this morning.”

“Ok, see this is what I need to know.” He was relieved. “I need to be showing your value. You should be telling me this. I would have said so this morning.”

After a quick bit of chat commiserating about meetings, the call ends.

I slump on the bench. What else has been lost in translation? I think to myself. What does he think I do? What is he saying at those–

“Did he just ask you what a feature was?” One of the analysts asks.