Skip to content

Onboarding LLM-ployees

You are a new member of the Data team and you are on your first rotation in answering ad-hoc questions from the business. How do you approach the following request?

“What is the number of daily active users?”

You really want this question resolved. After all, it is your first week on the job and you have to show the company you deserve to be here. You search around the existing dashboards using keywords and send the URL over to the stakeholder. Huzzah! Task done!

But the person quickly responds “Yep, I knew about that dashboard but it doesn’t give me what I need. I actually need daily active users separated out by each product. And I really only care about the top 3 markets for the last fiscal year.”

At this point, you have more questions than answers.

  • What does top mean? By number of users, by highest usage percent, by revenue?
  • What does market mean? By geography, by sector, by company size/revenue?
  • When does the fiscal year start/end?
  • How are products differentiated here?
  • What action(s) make a user active?

Being the good little data steward you are, you dive deep into relevant documentation (ie. data catalogs, definitions, lineage, etc). This context helps you answer the bulk of your questions. You write some SQL to calculate the numbers, and excitedly send it over in a hurry.

“Hey thanks for the fast turnaround, but this does not seem in the range of what I was expecting. Can you double check what you sent?”

Oof. Instead of nervously rattling off questions in hopes of clarification, you sheepishly turn to your teammate for assistance. They spot check your numbers and quickly sense that something is wrong because your number is an order of magnitude higher than they expect. Once they review in depth, they highlight the issue and provide you with a recommendation to fix it. This critique helps you form intuition by establishing a baseline. You are more likely to detect an abnormal result because you know what normal looks like. You can return to the stakeholder with more certainty in your future responses.

Let’s rewind on the motives of “you the data analyst” and replace it with “Large Language Model” or the LLM for short:

  1. You the data analyst The LLM really wants to finish a task.
  2. You the data analyst The LLM will make reasonable guesses when faced with an ambiguous request.

Now let’s focus on what made “you the data analyst” better and perform the same replacement:

  1. You the data analyst The LLM needs context to produce more relevant answers tailored to the business.
  2. You the data analyst The LLM needs critical reviews to improve.

Though I am drawing anthropomorphic comparisons, I think the thesis is thought provoking. The unique contextual data and human-in-the-loop critiques used to onboard an LLM will drive how successful they become within an organization. Much like onboarding our human counterparts!