Leveraging RAG for Personalized and Traceable Enterprise AI Applications

Adobe Firefly-created computer room with database surrounded by robots.

Generative AI has revolutionized how we interact with technology, opening up exciting possibilities for innovation and creativity. Retrieval-augmented generation, or RAG, takes this a step further by combining the power of generative models with the efficiency of retrieving relevant information.

Curious about its use case and applications, I spoke with Dennis Perpetua, Kyndryl’s Global CTO for Digital Workplace Services and Experience Officer. An information technology and service provider, Kyndryl spun out from IBM in 2021. It excels in delivering a spectrum of solutions, from fortified data services to cutting-edge enterprise-grade AI technologies and comprehensive data modernization services, among others.

In my interview, we delved into what RAG is and how the enterprise is leveraging the technology to build smarter applications that push the boundaries of what’s possible in the digital realm.

Editor's Note: Be sure to catch excerpts from the interview with Kyndryl's Dennis Perpetua in the upcoming February 23 edition of my LinkedIn newsletter, "The AI Economy." Subscribe today to receive notifications as soon as it's published.

What is Retrieval-Augmented Generation?

RAG is a process that extends the capabilities of Large Language Models (LLMs) to specific domains or an organization’s internal knowledge base. Patrick Lewis and his team first coined this term in a research paper in 2021. Described by Perpetua as a technique as opposed to a framework or protocol, it may not be something readily used by consumer apps. Instead, it’s perhaps more applicable to the enterprise.

“The two most important things that RAG does, relative to the enterprise, is it allows us to source the answers, and have that be traceable,” Perpetua explains. “And then it allows us to also add context to it. That context could be a handful of things…whether or not the information is current…a way to augment the [LLM] with more current information.”

He references ChatGPT-4’s cutoff date and says using RAG can help modernize the LLM’s responses, while also “providing traceability of the answers.”

The application of RAG in the enterprise hinges on the specific needs of an organization. Perpetua suggests that this could stem from a necessity to delve deeper into particular scenarios within the company. Alternatively, it might arise from a desire to safeguard corporate information that doesn’t warrant inclusion in a public model. “RAG is really important because it allows me to actually give much more personalized answers without having the overhead of going in and tweaking those LLMs,” Perpetua tells me.

Integrating Gen AI with RAG

Is it difficult for developers to implement RAG with their company’s AI tech stack? Perpetua emphasizes that it’s a complex problem, one “vastly overlooked.” First, it’s dependent on the LLM — each one will have a different way to integrate with it. However, I’m told it’s a “relatively simple” process, likely accessible through a Platform as a Service (PaaS) offering.

Perpetua stresses developers must address how traceability is fed back into the system and ensure content audits are made. Additionally, he cautions against overlooking prompt engineering and the knowledge database that has to be managed. Questions that must be answered include how you expire old content, ensure new content is being added and cataloged, and more.

RAG Utility in Closed Systems

When asked about the necessity of a substantial knowledge management library for implementing RAG, Perpetua says it depends on the use case. However, he emphasizes that the issues addressed at Kyndryl typically revolve around highly specific company concerns. In such scenarios, he notes, the utility of RAG increases significantly.

To illustrate his perspective, he recounts Kyndryl’s initial foray into AI over a decade ago. “When we first deployed one, it was for a financial institution. The top question — which we didn’t expect — was ‘What’s the address of the downtown branch location?’…We expected them to ask, “My laptop’s broken. How do I get a new one?’ So we didn’t have an answer to that.”

He goes on to state this is an opportunity where RAG would be beneficial because organizations can specify the type of information to feed into it. If the AI utilizes an open system, e.g., consuming current events, political ramblings, and non-relevant information, there’s a risk of misinformation or worse — look what happened to Air Canada or read social media posts showing some company chatbots being tricked or forced to say outrageous things.

Having a closed system, something Perpetua primarily deals with, any AI will have access to data centered on answering customer-specific questions. Using RAG could provide more personalized responses that are relevant, minimizing support frustration.

Applications Beyond Chatbots

Facebook CEO Mark Zuckerberg on stage at the company's F8 developer conference in 2015 discussing the Facebook Messenger Platform. Photo credit: Ken Yeung
Facebook CEO Mark Zuckerberg on stage at the company’s F8 developer conference in 2015 discussing the Facebook Messenger Platform. Photo credit: Ken Yeung

Chatbots are prevalent in AI applications, but are there others Perpetua sees out in public? He tells me there’s not a “widespread set of use cases, but there are differences between them.” His favorite harkens back to his college days in which the AI generates “crib notes for case summarization.”

“This is pretty popular in a couple of different use cases where we’re doing [things] like ticket summarization, from a support perspective…How do I actually get context around tickets coming in and customer support? Those are two different things…so one would be IT-specific and then the other one would be customer support use cases where you could actually leverage RAG from a product perspective so that you could tailor some of the customer responses based on product-specific information.”

He intimates from a product perspective, RAG could generate recommendations based on a customer’s past purchases. From the support side, it could help summarize cases, ensuring that tickets are categorized correctly and routed to the appropriate team.

Perpetua also presented a use case developed by Kyndryl. In January, the company unveiled Workflow Orchestration Services, a generative AI-powered solution to help modernize business processes. He offers a couple of ways in which RAG techniques affected different business areas:

Employee onboarding: Use RAG to “extend questions that may be coming in from a potential new hire or a new hire asking HR questions, so we can feed it and augment it with HR materials, specifically for new hires.”

Machinery inspection: “There’s a use case where we have to inspect heavy equipment. We’re able to use generative AI there to ask questions about what it is we’re seeing. But we’re able to then use RAG to extend it with very product-specific information that allows it to be traceable to make sure that when somebody’s asking you a question” you can be very specific in assessing the items needed for repair or replacement.

Providing Non-Experts with Quality Information

We all likely turn to the internet to help troubleshoot when there’s a problem with something we own. Whether there’s an issue with a pool drainage system, snowblower, showerhead, or television, we either enter a search query on Google or, these days, ask ChatGPT or Perplexity. In some cases, we need only snap a picture and upload it to find a solution.

While such advancements may benefit consumers, individuals within B2B, industrial, or service sectors may find value in accessing contextual answers that are notably devoid of inaccuracies, making RAG particularly beneficial — it becomes a safety backstop in the absence of having someone with an expert understanding of the domain.

Implementing RAG can potentially lessen the chances of encountering hallucinations, Perpetua tells me.

“Even if there is a hallucination, which is still going to be possible because of just the nature of generative AI — it doesn’t know the meaning of words, it just knows the sequence of words. But the traceability is the assurance that is there to make sure that when you get the answer, you can click on it and say, ‘Alright, this is where this is pulling from.’ It turns into a very accurate and interesting Table of Contents.”

Think Before You Implement RAG

A word of caution from Perpetua before you dash off to inform your IT department you want RAG with your AI solution. Do not become enamored by the proverbial “shiny metal object” of AI. Instead, he counsels them to consider the outcomes they want generated.

He reiterates RAG is a technique companies use to achieve business outcomes. But to maximize the chances of an AI project being successful, organizations must first identify “what that business objective is and then pick the right thing.” You may not need generative AI — Perpetua says that sometimes things can be achieved instead of using traditional AI or even traditional sentiment analysis.

“There’s a massive cost difference between traditional AI and generative AI…so it’s really kind of being a tool in your toolbox — pick the right tools for it,” he remarks. “And in some cases, if you look at the problem RAG is solving, traditional AI could actually fit in there and be a much less complex scenario.”

Advice For AI Startups

Towards the end of our interview, I asked Perpetua about the advice he would offer to AI startups. He shared with me two suggestions:

The ROI on generative AI is relatively elusive to calculate hard savings: “In general, folks see it. And they can viscerally feel, ‘Hey, this is cool.’ I think the number one thing that folks should include in their use case, ideation, and when they’re getting started is to start with the problem, but also include how you’re going to quantify the impact associated with what you’re doing.”

Additionally, Perpetua recommends reviewing the technical debt associated with the overhead of the chosen LLM. He references a pre-gen AI time when companies might use a natural language processing model developed by a “hyper scalar” that might be around in five or 10 years. “If you’ve created your own NLU model, I’m nervous about the technical debt and the stickiness…The more you can use the foundations that have been created, the better off and more longevity I see afforded and less overhead in terms of running your startup because building your own NLU — or even if you’re ambitious enough to do your own LLM — the more technical debt and costs are going to be associated with your solution.”

Learn more about Kyndryl and Dennis Perpetua.

Subscribe to “The AI Economy”

New issues published on Fridays, exclusively on LinkedIn

Leave a Reply

Discover more from Ken Yeung

Subscribe now to keep reading and get access to the full archive.

Continue reading