Article summary
In my current project, I’ve been diving deep into the world of agents and exploring the abundant resources available within the LangChain ecosystem. I wanted to expand my toolkit and get hands-on experience with Retrieval Augmented Generation (RAG), and it turns out that while RAG may sound intense, using Pinecone makes it remarkably straightforward to implement.
Pinecone
Pinecone offers two products: a managed vector database, and an Assistant. The vector database has APIs for uploading text to an index and then searching for matching text in that index. The Assistant product has a simpler API where you can upload files directly and then interact with a chatbot-style API powered by those documents. While the Assistant product is really easy to use, I wanted to dig deeper into implementing RAG so this article will focus on the vector database approach using the Python SDK.
Uploading Text
We need an index for storing the text. You can create it using one of the various SDKs or directly in the Pinecone console. I used the console and selected “multilingual-e5-large” for the configuration. This creates a dense vector which supports semantic search.
Next you’ll need to create a Pinecone API key and add it to a .env
file in the root of your project:
PINECONE_API_KEY=<your api key>
Then install the following packages:
pip install langchain-pinecone langchain_community pypdf pinecone
Rather than using Pinecone’s Python SDK directly, we’re using a utilities from a couple of LangChain packages to simplify this process, as well as PyPDF for converting the PDF content to text.
Now we can create a Python script for converting a PDF to smaller chunks of text and then upload the text:
from dotenv import load_dotenv
from langchain_community.document_loaders import PyPDFLoader
from langchain_pinecone import PineconeEmbeddings, PineconeVectorStore
from langchain_text_splitters import RecursiveCharacterTextSplitter
# Ensure that PINECONE_API_KEY is loaded from .env
load_dotenv()
def upload_file(file_path: str):
# Load the PDF
loader = PyPDFLoader(file_path)
data = loader.load()
# Chunk the document text
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=20)
documents = text_splitter.split_documents(data)
index_name = "my-index"
namespace = "default"
# Initialize embeddings
embeddings = PineconeEmbeddings(model="multilingual-e5-large")
# Upload documents
PineconeVectorStore.from_documents(
documents=documents,
index_name=index_name,
embedding=embeddings,
namespace=namespace,
)
if __name__ == "__main__":
import sys
if len(sys.argv) != 2:
print("Usage: python document_uploader.py ")
sys.exit(1)
upload_file(sys.argv[1])
You can run python document_uploader.py <file_path>
with any PDF and then use the browser in the Pinecone console to see the text chunks that were uploaded.
Querying Text
Now we can create a script for retrieving relevant text for a query string. We’re using the same LangChain packages here to query the vector store:
from dotenv import load_dotenv
from langchain_pinecone import PineconeEmbeddings, PineconeVectorStore
from pinecone import Pinecone
# Ensure that PINECONE_API_KEY is loaded from .env
load_dotenv()
def retrieve_documents(query: str):
index_name = "my-index"
namespace = "default"
# Get the index
pc = Pinecone()
index = pc.Index(index_name)
# Initialize embeddings
embeddings = PineconeEmbeddings(model="multilingual-e5-large")
vector_store = PineconeVectorStore(
index=index, embedding=embeddings, namespace=namespace
)
# Create a retriever that returns top 2 matches
retriever = vector_store.as_retriever(search_kwargs={"k": 2})
# Retrieve the documents using the provided query
retrieved_docs = retriever.invoke(query)
# Display the retrieved documents
print("\n\n".join([doc.page_content for doc in retrieved_docs]))
if __name__ == "__main__":
import sys
if len(sys.argv) != 2:
print("Usage: python document_retriever.py ")
sys.exit(1)
retrieve_documents(sys.argv[1])
You can run this script with a query like this: python document_retriever.py "mobile app testing tools"
.
Next Steps
These are the main pieces you need for the “retrieval augmented” part of RAG. The final piece is generation, where you provide the retrieved documents as context to an LLM call with a user’s question.
While RAG can sound like a complicated concept (and the underlying vector search details are certainly beyond my comprehension), it’s surprisingly easy to implement with a tool like Pinecone. You can easily experiment with various settings like different embedding models, text splitting algorithms, and search params to get the best results.