Huh, look at that, OpenAI’s ChatGPT portrays absolute confidence while giving plain wrong answers. But ChatGPT also does provide helpful responses a large number of times. So one kind of does want to use it. Sounds an awful lot like every other machine learning model deployed in 2022.
But really, how do we turn fallible machine learning models into products to be used by humans?
Not by injecting its answers straight into StackOverflow. Its mistakes put too much of a strain onto a system designed to receive answers from self-editing humans.
tl;dr: use it to generate solutions that are hard for humans to find, easy to verify, and easy to correct.
While this is Hillel’s three-point plan to make ChatGPT useful to programmers, it’s also sound product management advice for a broad range of machine learning-based products. Let’s dissect this a bit.
1) Predictions That Are Hard to Find
There are two dimensions to this: a single question / task / observation’s difficulty, and the difficulty of scaling to answer millions of questions. Machine learning comes in handy when it takes a human a long time to process a case, or when a lot of information need to be brought together from different sources before a decision can be made. Also, machine learning is great at looking at all cases and returning a prioritized list—whether that’s products that need to be reordered in a business context, or the feed of videos on the YouTube homepage. The human can then “verify the answers” for the most relevant cases first.
Which brings us to the next point.
2) Predictions That Are Easy to Verify
Unless the human consuming the model prediction is willing to do so blindly (do you always watch whatever YouTube recommends next?), a machine learning product needs to provide sufficient context to let the human make a decision on whether or not to trust the prediction.
I can decide whether or not to watch the next recommended video based on its title and thumbnail; I don’t have to first watch the video to verify that the recommendation suits me. It also allows me to discard poor recommendations with ease. The context is an entirely natural part of the product and arguably core to it, more so than the selection via the model.
An internal business application can be surprisingly similar. For example, when listing forecasts and reorder suggestions that are manually approved, historical sales and current inventory data make a sanity check straightforward.
This kind of context based on facts (in the broadest sense of the word) is faster to parse and easier to trust than “model explanations” which themselves would need to be verified by the human.
Context is something that ChatGPT is missing entirely. ChatGPT offers no sources, no citations. There are only the prompt and the prediction. This puts it into stark contrast to the context offered by the citation in Google’s Featured Snippets, or by votes and comments on StackOverflow answers.
3) Predictions That Are Easy to Correct
When the model fails and the user needs to correct its output is when the product needs to become more than just an API, you need to provide a path forward. Similarly, when augmenting a process you want users to be able to use your product as a tool. That’s only feasible if they are enabled to take its output and take it a step further.
The danger zone are high-stakes situations in which a human can’t judge the model’s recommendation, and can’t do the task without the model’s recommendation due to the task’s complexity. Don’t make people put their rubber-stamp on an API. Provide them with a tool that augments their capabilities.
As ChatGPT is provided as essentially a raw API with some CSS, it demonstrates why a product needs to be more than just a model’s prediction to be trustworthy and effective. Let’s take the API and build more products around it.