Sunday, June 29, 2025

Hitchhiker’s Information to RAG with ChatGPT API and LangChain


generate tons of phrases and responses primarily based on normal data, however what occurs once we want solutions requiring correct and particular data? Solely generative fashions regularly battle to offer solutions on area particular questions for a bunch of causes; possibly the information they have been skilled on are actually outdated, possibly what we’re asking for is actually particular and specialised, possibly we would like responses that take into consideration private or company knowledge that simply aren’t public… 🤷‍♀️ the listing goes on.

So, how can we leverage generative AI whereas maintaining our responses correct, related, and down-to-earth? A superb reply to this query is the Retrieval-Augmented Era (RAG) framework. RAG is a framework that consists of two key parts: retrieval and era (duh!). Not like solely generative fashions which might be pre-trained on particular knowledge, RAG incorporates an additional step of retrieval that permits us to push extra info into the mannequin from an exterior supply, akin to a database or a doc. To place it in a different way, a RAG pipeline permits for offering coherent and pure responses (offered by the era step), that are additionally factually correct and grounded in a data base of our selection (offered by the retrieval step).

On this manner, RAG might be a particularly priceless device for purposes the place extremely specialised knowledge is required, as for example buyer assist, authorized recommendation, or technical documentation. One typical instance of a RAG software is buyer assist chatbots, answering buyer points primarily based on an organization’s database of assist paperwork and FAQs. One other instance could be advanced software program or technical merchandise with in depth troubleshooting guides. Another instance could be authorized recommendation — a RAG mannequin would entry and retrieve customized knowledge from legislation libraries, earlier instances, or agency pointers. The examples are actually infinite; nonetheless, in all these instances, the entry to exterior, particular, and related to the context knowledge permits the mannequin to supply extra exact and correct responses.

So, on this submit, I stroll you thru constructing a easy RAG pipeline in Python, using ChatGPT API, LangChain, and FAISS.

What about RAG?

From a extra technical perspective, RAG is a way used to reinforce an LLM’s responses by injecting it with extra, domain-specific info. In essence, RAG permits for a mannequin to additionally take into consideration extra exterior info — like a recipe e-book, a technical guide, or an organization’s inside data base — whereas forming its responses.

This is essential as a result of it permits us to get rid of a bunch of issues inherent to LLMs, as for example:

  • Hallucinations — making issues up
  • Outdated info — if the mannequin wasn’t skilled on current knowledge
  • Transparency — not figuring out the place responses are coming from

To make this work, the exterior paperwork are first processed into vector embeddings and saved in a vector database. Then, once we submit a immediate to the LLM, any related knowledge is retrieved from the vector database and handed to the LLM together with our immediate. In consequence, the response of the LLM is shaped by contemplating each our immediate and any related info current within the vector database within the background. Such a vector database might be hosted regionally or within the cloud, utilizing a service like Pinecone or Weaviate.

Picture by writer

What about ChatGPT API, LangChain, and FAISS?

The primary element for constructing a RAG pipeline is the LLM mannequin that can generate the responses. This may be any LLM, like Gemini or Claude, however on this submit, I can be utilizing OpenAI’s ChatGPT fashions by way of their API platform. So as to use their API, we have to register and procure an API key. We additionally want to ensure the respective Python libraries are put in.

pip set up openai

The opposite main element of constructing a RAG is processing exterior knowledge — producing embeddings from paperwork and storing them in a vector database. The preferred framework for performing such a job is LangChain. Particularly, LangChain permits:

  • Load and extract textual content from varied doc sorts (PDFs, DOCX, TXT, and so on.)
  • Cut up the textual content into chunks appropriate for producing the embeddings
  • Generate vector embeddings (on this submit, with the help of OpenAI’s API)
  • Retailer and search embeddings by way of vector databases like FAISSChroma, and Pinecone

We are able to simply set up the required LangChain libraries by:

pip set up langchain langchain-community langchain-openai

On this submit, I’ll be utilizing LangChain along with FAISS, a neighborhood vector database developed by Fb AI Analysis. FAISS is a really light-weight package deal, and is thus acceptable for constructing a easy/small RAG pipeline. It may be simply put in with:

pip set up faiss-cpu

Placing every little thing collectively

So, in abstract, I’ll use:

  • ChatGPT fashions by way of OpenAI’s API because the LLM
  • LangChain, together with OpenAI’s API, to load the exterior information, course of them, and generate the vector embeddings
  • FAISS to generate a neighborhood vector database

The file that I can be feeding into the RAG pipeline for this submit is a textual content file with some info about me. This textual content file is positioned within the folder ‘RAG information’.

Now we’re all arrange, and we are able to begin by specifying our API key and initializing our mannequin:

from openai import OpenAI # Chat_GPT API key api_key = "your key" 

# initialize LLM 
llm = ChatOpenAI(openai_api_key=api_key, mannequin="gpt-4o-mini", temperature=0.3)

Then we are able to load the information we need to use for the RAG, generate the embeddings, and retailer them as a vector database as follows:

# loading paperwork for use for RAG 
text_folder = "rag_files"  

all_documents = []
for filename in os.listdir(text_folder):
    if filename.decrease().endswith(".txt"):
        file_path = os.path.be part of(text_folder, filename)
        loader = TextLoader(file_path)
        all_documents.prolong(loader.load())

# generate embeddings
embeddings = OpenAIEmbeddings(openai_api_key=api_key)

# create vector database w FAISS 
vector_store = FAISS.from_documents(paperwork, embeddings)
retriever = vector_store.as_retriever()

Lastly, we are able to wrap every little thing in a easy executable Python file:

def important():
    print("Welcome to the RAG Assistant. Sort 'exit' to give up.n")
    
    whereas True:
        user_input = enter("You: ").strip()
        if user_input.decrease() == "exit":
            print("Exiting…")
            break

        # get related paperwork
        relevant_docs = retriever.get_relevant_documents(user_input)
        retrieved_context = "nn".be part of([doc.page_content for doc in relevant_docs])

        # system immediate
        system_prompt = (
            "You're a useful assistant. "
            "Use ONLY the next data base context to reply the person. "
            "If the reply isn't within the context, say you do not know.nn"
            f"Context:n{retrieved_context}"
        )

        # messages for LLM 
        messages = [
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": user_input}
        ]

        # generate response
        response = llm.invoke(messages)
        assistant_message = response.content material.strip()
        print(f"nAssistant: {assistant_message}n")

if __name__ == "__main__":
    important()

Discover how the system immediate is outlined. Basically, a system immediate is an instruction given to the LLM that units the habits, tone, or constraints of the assistant earlier than the person interacts. For instance, we might set the system immediate to make the LLM present responses like speaking to a 4-year-old or a rocket scientist — right here we ask to offer responses solely primarily based on the exterior knowledge we offered, the ‘Maria info

So, let’s see what we’ve cooked! 🍳

Firstly, I ask a query that’s irrelevant to the offered exterior datasource, to guarantee that the mannequin solely makes use of the offered datasource when forming the responses and never normal data.


… after which I requested some questions particularly from the file I offered…

✨✨✨✨

On my thoughts

Apparently, it is a very simplistic instance of a RAG setup — there’s way more to contemplate when implementing it in an actual enterprise atmosphere, akin to safety issues round how knowledge is dealt with, or efficiency points when coping with a bigger, extra reasonable data corpus and elevated token utilization. Nonetheless, I consider OpenAI’s API is actually spectacular and presents immense, untapped potential for constructing customized, context-specific AI purposes.


Liked this submit? Let’s be associates! Be a part of me on

📰Substack 💌 Medium 💼LinkedIn Purchase me a espresso!

Related Articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest Articles

PHP Code Snippets Powered By : XYZScripts.com