Challenge

How could we control the tendency of LLMs to hallucinate so that a RAG-based chatbot could be trustworthy enough for our customers?

Solution

A custom agent-based system that constrains the potential for hallucinations, designed to provide fallbacks when information is lacking.

My Role

Staff AI product designer


 

Personal Takeaways

Launching Fin was a unique experience where something I’d helped build (on a very small team) became a strategic priority for the entire company. It was truly “all hands on deck” and immensely satisfying that so many people found it compelling as a product experience.

 

SITUATION

When GPT3.5 was first launched, we tested it heavily while planning and building our AI Assist features. Intercom had a long history of providing both chat-based products and chatbots, so it was natural for us to hope that the new generation of LLMs could be used for something like an autonomous chatbot. During early explorations of the model’s capabilities for our AI Assist features, we determined that the tech was too liable to hallucinate to be useful as a chatbot.

After shipping the AI Assist features, however, we decided to take another look and see if we could find some way to rein in that hallucinatory tendency. It was important to us to build something that didn’t bullsh*t users—and we suddenly had the time and space to investigate further.

 
 
 

PROJECT GOALS

If we could get the tech to work, we wanted to be first on the market with an LLM-based chatbot. Intercom already had a chatbot product, Resolution Bot, which used different models to allow customers to build scripted answers and then serve them based on questions the bot was trained on. We saw this as an opportunity to provide our customers with an upgraded version of Resolution Bot that required little to no training if it could immediately serve up high-quality answers from an existing knowledge base.

Our primary concern was quality, though. We could foresee competitors who would simply put a UI layer on top of the GPT model and pretend they’d built something new, and put the risk of hallucinations on the customers. We weren’t willing to do that. We wanted to prove whether the tech could be controlled enough to be reliable.

 

DESIGN APPROACH

I began working closely with an engineer a few weeks after he began experimenting with a system design for the bot. For 2-3 weeks, I tested the system and designed improvements to the user flow, fallback experiences, and conversational experience for edge case scenarios and areas where we’d explicitly built guardrails to take over from GPT3.5. At that point, we felt like the prototype had legs and we could show it to the rest of the AI team. The reception was extremely positive, and we decided to move the full team onto the project.

We moved quickly after that to make the prototype more robust. A significant portion of my time went to designing an optimal “answer” output and then working to achieve it through numerous prompt engineering iterations and manual offline review (which I did myself). My rationale was that the quality of the answer was the most important experience aspect of interacting with a chatbot designed to answer questions.

 
 
 

PROTOTYPE + TESTING

With a working prototype, we started backtesting Fin’s answers with real conversations, trying to get a sense for how robustly it performed against actual data. We were pleasantly surprised, and company execs decided to release the bot for the public to test out, with a sample knowledge base as the backend. We called this the marketing launch. We used the conversations from this public usage to better understand shortcomings of the bot and to make improvements.

We launched the bot for paying customers in June 2023, after upgrading it to use the newly-released and more powerful GPT4 model from OpenAI.

In the four or five months after that, I was instrumental in some major improvements to Fin, including multi-source answer generation and disambiguation, both of which had significant improvements to Fin’s resolution rate (the percentage of answers that are deemed correct).

 

LAUNCH + OUTCOMES

Fin had a marketing launch in March 2023, which generated 30,000+ wait list signups. Fin launched to paying customers in June 2023. By December 2023 it had become the best-selling product launch in Intercom’s history.

 
 
 

Details

URL: Launch blog post
Date: Feb 2023 – June 2023, ongoing
Role: Staff AI product designer
Tools: Figma, GPT3.5, GPT4