Simplify RAG with Pinecone + LangChain

In my current project, I’ve been diving deep into the world of agents and exploring the abundant resources available within the LangChain ecosystem. I wanted to expand my toolkit and get hands-on experience with Retrieval Augmented Generation (RAG), and it turns out that while RAG may sound intense, using Pinecone makes it remarkably straightforward to implement.

Pinecone

Pinecone offers two products: a managed vector database, and an Assistant. The vector database has APIs for uploading text to an index and then searching for matching text in that index. The Assistant product has a simpler API where you can upload files directly and then interact with a chatbot-style API powered by those documents. While the Assistant product is really easy to use, I wanted to dig deeper into implementing RAG so this article will focus on the vector database approach using the Python SDK.

Uploading Text

We need an index for storing the text. You can create it using one of the various SDKs or directly in the Pinecone console. I used the console and selected “multilingual-e5-large” for the configuration. This creates a dense vector which supports semantic search.

Next you’ll need to create a Pinecone API key and add it to a .env file in the root of your project:

PINECONE_API_KEY=&ltyour api key&gt

Then install the following packages:

pip install langchain-pinecone langchain_community pypdf pinecone

Rather than using Pinecone’s Python SDK directly, we’re using a utilities from a couple of LangChain packages to simplify this process, as well as PyPDF for converting the PDF content to text.

Now we can create a Python script for converting a PDF to smaller chunks of text and then upload the text:


from dotenv import load_dotenv
from langchain_community.document_loaders import PyPDFLoader
from langchain_pinecone import PineconeEmbeddings, PineconeVectorStore
from langchain_text_splitters import RecursiveCharacterTextSplitter

# Ensure that PINECONE_API_KEY is loaded from .env
load_dotenv()

def upload_file(file_path: str):
	# Load the PDF
	loader = PyPDFLoader(file_path)
	data = loader.load()

	# Chunk the document text
	text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=20)
	documents = text_splitter.split_documents(data)

	index_name = "my-index"
	namespace = "default"

	# Initialize embeddings
	embeddings = PineconeEmbeddings(model="multilingual-e5-large")

	# Upload documents
	PineconeVectorStore.from_documents(
		documents=documents,
		index_name=index_name,
		embedding=embeddings,
		namespace=namespace,
	)

if __name__ == "__main__":
	import sys
	if len(sys.argv) != 2:
		print("Usage: python document_uploader.py ")
		sys.exit(1)
		
	upload_file(sys.argv[1])

You can run python document_uploader.py &ltfile_path&gt with any PDF and then use the browser in the Pinecone console to see the text chunks that were uploaded.

Querying Text

Now we can create a script for retrieving relevant text for a query string. We’re using the same LangChain packages here to query the vector store:


from dotenv import load_dotenv

from langchain_pinecone import PineconeEmbeddings, PineconeVectorStore
from pinecone import Pinecone

# Ensure that PINECONE_API_KEY is loaded from .env
load_dotenv()

def retrieve_documents(query: str):
	index_name = "my-index"
	namespace = "default"

	# Get the index
	pc = Pinecone()
	index = pc.Index(index_name)
	
	# Initialize embeddings
	embeddings = PineconeEmbeddings(model="multilingual-e5-large")
	vector_store = PineconeVectorStore(
		index=index, embedding=embeddings, namespace=namespace
	)

	# Create a retriever that returns top 2 matches
	retriever = vector_store.as_retriever(search_kwargs={"k": 2})
	
	# Retrieve the documents using the provided query
	retrieved_docs = retriever.invoke(query)
	
	# Display the retrieved documents
	print("\n\n".join([doc.page_content for doc in retrieved_docs]))

if __name__ == "__main__":
	import sys
	if len(sys.argv) != 2:
		print("Usage: python document_retriever.py ")
		sys.exit(1)
		
	retrieve_documents(sys.argv[1])

You can run this script with a query like this: python document_retriever.py "mobile app testing tools".

Next Steps

These are the main pieces you need for the “retrieval augmented” part of RAG. The final piece is generation, where you provide the retrieved documents as context to an LLM call with a user’s question.

While RAG can sound like a complicated concept (and the underlying vector search details are certainly beyond my comprehension), it’s surprisingly easy to implement with a tool like Pinecone. You can easily experiment with various settings like different embedding models, text splitting algorithms, and search params to get the best results.

Conversation

Join the conversation

Your email address will not be published. Required fields are marked *