Article summary
Chat GPT has been out for over two years now and the world is starting to adjust to what it and other LLMs can do. Most current use cases rely on them being used as chat bots. The user inputs a question or a task and the LLM uses its context and training to answer the prompt. I would like to explore ways LLMs can be baked into system architectures and used the build both prototype and production apps. These are four technical architectures that incorporate LLMs at varying levels of complexity.
LLM Only
This is the simplest way to build an application with an LLM, use the AI as an advanced chat bot. By providing a system prompt and adequate context the LLM can walk through basic workflows and search the web for answers.
Pros
Simple to implement because there are no additional technical parts.
It is flexible since all the normal behaviors of an LLM can be accessed.
Cons
Limited context window. If the user interacts with the application for a long period of time without clearing the context window, the LLM will eventually forget details. This can be troublesome if the user is entering values that should theoretically be persisted.
There is no persistent memory outside the LLM to prevent loss of data or losing where it was at in a process.
Using LLM system prompts and contexts for an app is a good place to prototype. However, if any features require long term memory or following a specific process then a more complex architecture will be required.
LLM as Frontend
The next level of complexity is using the LLM as a frontend for the user to interact with. Open AI and Gemini, both support giving the LLM access to custom functions that the LLM can call when the user’s prompt requires it. This grants the ability to persist data outside the LLM, perform custom searches, or send requests to an external system.
Pros
Minimal frontend development required, libraries like streamlit give good starting points for a chat interface.
Grants the LLM access to traditional computing resources for persistent storage or deterministic data processing.
The user can describe what they want to the LLM and it takes the expected action. This can be a smoother user experience instead of the user looking for a specific button, they just tell the LLM which button they would like to click.
Cons
It is difficult to test why the LLM is or is not calling specific functions. This can lead to regressions as new functions are added or existing ones modified.
Using the chat interface limits possible user interactions and they still have to learn the correct terminology to talk to the LLM.
LLM as a frontend is good for adding basic interactions with outside systems for a small scale app. If there are many behaviors added then a traditional web interface with the LLM in the backend may offer more consistency. Which will be covered next.
RAG app
A RAG app or Retrieval-Augmented Generation application is a software system that gives an LLM access to a traditional source of data like a database or a document so the data source can be used to respond to prompts or perform actions. This can take many forms but one option is for a UI to take in a search or question, and an LLM processes the question to determine what information is relevant. Once a decision has been made, traditional computing queries are used to gather the relevant information and add it to the context. After the context is built, a second LLM answers the original question with all the provided context.
Pros
This allows for persistent data storage in the layers between the LLMs. This storage can help the LLMs track historical prompts.
Gives checkpoints to perform validation before returning answers to the user. LLMs have the ability to make up information or return poorly shaped data. RAG apps enable developers to put safeguards between the LLMs and traditional software.
LLMs integrations can be added to existing applications instead of needing to start from scratch.
Cons
Handrolled LLM interactions and state management can be tricky. These interactions include calling functions that the LLM said needed to be called as well as tracking the overall state of the current
A RAG app is the most likely form a production app will take when it includes an LLM component. The reason is RAG allows for testing at the boundaries of the LLMs, and RAG has the most flexibility to take the shape that a use-case requires. A tool like Langgrah can be used to address the challenge of managing interactions between an application and multiple LLMs.
RAG app combined with Langgraph
Langgraph is most suited for when each interaction with an LLM can be drawn out as a state diagram. Langgraph specializes in defining a series of stages for each user prompt to follow. Each stage can be an interaction with an LLM or arbitrary traditional code. It also handles if the LLM needs to call functions. Building an application with Langgraph is the best way to handle workflows with multiple branching paths or multiple LLMs but can be overkill for a prototype or simple application.
Overall, LLMs offer a variety of integration levels depending on how complex the application is and what level of use is expected. As the technology matures there will be more established patterns for how to interact with it. For now the best way to figure out how to work with LLMs is to experiment and find the best uses.