the world of economic providers, Know-Your-Buyer (KYC) and Anti-Cash Laundering (AML) are essential protection strains in opposition to illicit actions. KYC is of course modelled as a graph downside, the place prospects, accounts, transactions, IP addresses, gadgets, and areas are all interconnected nodes in an enormous community of relationships. Investigators sift by means of these advanced webs of connections, making an attempt to attach seemingly disparate dots to uncover fraud, sanctions violations, and cash laundering rings.
This can be a nice use case for AI grounded by a information graph (GraphRAG). The intricate internet of connections requires capabilities past commonplace document-based RAG (sometimes primarily based on vector similarity search and reranking methods).
Disclosure
I’m a Senior Product Supervisor for AI at Neo4j, the graph database featured on this submit. Though the snippets concentrate on Neo4j, the identical patterns may be utilized with any graph database. My essential intention is to share sensible steering on constructing GraphRAG brokers with the AI/ML neighborhood. All code within the linked repository is open-source and free so that you can discover, experiment with, and adapt.
All on this weblog submit had been created by the creator.
A GraphRAG KYC Agent
This weblog submit supplies a hands-on information for AI engineers and builders on easy methods to construct an preliminary KYC agent prototype with the OpenAI Brokers SDK. We’ll discover easy methods to equip our agent with a set of instruments to uncover and examine potential fraud patterns.
The diagram beneath illustrates the agent processing pipeline to reply questions raised throughout a KYC investigation.
Let’s stroll by means of the key parts:
- The KYC Agent: It leverages the OpenAI Brokers SDK and acts because the “mind,” deciding which software to make use of primarily based on the person’s question and the dialog historical past. It performs the function of MCP Host and MCP shopper to the Neo4j MCP Cypher Server. Most significantly, it runs a quite simple loop that takes a query from the person, invokes the agent, and processes the outcomes, whereas maintaining the dialog historical past.
- The Toolset. A group of instruments obtainable to the agent.
- GraphRAG Instruments: These are Graph information retrieval features that wrap a really particular Cypher question. For instance:
- Get Buyer Particulars: A graph retrieval software that given a Buyer ID, it retrieves details about a buyer, together with their accounts and up to date transaction historical past.
- Neo4j MCP Server: A Neo4j MCP Cypher Server exposing instruments to work together with a Neo4j database. It supplies three important instruments:
- Get Schema from the Database.
- Run a READ Cypher Question in opposition to the database
- Run a WRITE Cypher QUery in opposition to the database
- A Textual content-To-Cypher software: A python operate wrapping a fine-tuned Gemma3-4B mannequin operating regionally by way of Ollama. The software interprets pure language questions into Cypher graph queries.
- A Reminiscence Creation software: This software permits investigators to doc their findings immediately within the information graph. It creates a “reminiscence” (of an investigation) within the information graph and hyperlinks it to all related prospects, transactions, and accounts. Over time, this helps construct a useful information base for future investigations.
- GraphRAG Instruments: These are Graph information retrieval features that wrap a really particular Cypher question. For instance:
- A KYC Information Graph: A Neo4j database storing a information graph of 8,000 fictitious prospects, their accounts, transactions, gadgets and IP addresses. It is usually used because the agent’s long-term reminiscence retailer.
Need to check out the agent now? Simply observe the directions on the venture repo. You possibly can come again and browse how the agent was constructed later.
Why GraphRAG for KYC?
Conventional RAG techniques concentrate on discovering info inside massive our bodies of textual content which might be chunked up into fragments. KYC investigations depend on discovering attention-grabbing patterns in a posh internet of interconnected information – prospects linked to accounts, accounts related by means of transactions, transactions tied to IP addresses and gadgets, and prospects related to private and employer addresses.
Understanding these relationships is vital to uncovering refined fraud patterns.
- “Does this buyer share an IP tackle with somebody on a watchlist?”
- “Is that this transaction a part of a round cost loop designed to obscure the supply of funds?”
- “Are a number of new accounts being opened by people working for a similar, newly-registered, shell firm?”
These are questions of connectivity. A information graph, the place prospects, accounts, transactions, and gadgets are nodes and their relationships are express edges, is the perfect information construction for this process. GraphRAG (information retrieval) instruments make it easy to determine uncommon patterns of exercise.

A Artificial KYC Dataset
For the needs of this weblog, I’ve created an artificial dataset with 8,000 fictitious prospects and their accounts, transactions, registered addresses, gadgets and IP addresses.
The picture beneath reveals the “schema” of the database after the dataset is loaded into Neo4j. In Neo4j, a schema describes the kind of entities and relationships saved within the database. In our case, the primary entities are: Buyer, Handle, Accounts, Gadget, IP Handle, Transactions. The principle relationships amongst them are as illustrated beneath.

The dataset comprises a couple of anomalies. Some prospects are concerned in suspicious transaction rings. There are a couple of remoted gadgets and IP addresses (not linked to any buyer or account). There are some addresses shared by numerous prospects. Be at liberty to discover the artificial dataset era script, if you wish to perceive or modify the dataset to your necessities.
A Primary Agent with OpenAI Brokers SDK
Let’s stroll by means of the key components of our KYC Agent.
The implementation is generally inside kyc_agent.py. The total supply code and step-by-step directions on easy methods to run the agent can be found on Github.
First, let’s outline the agent’s core identification with appropriate directions.
import os
from brokers import Agent, Runner, function_tool
# ... different imports
# Outline the directions for the agent
directions = """You're a KYC analyst with entry to a information graph. Use the instruments to reply questions on prospects, accounts, and suspicious patterns.
You might be additionally a Neo4j knowledgeable and may use the Neo4j MCP server to question the graph.
In case you get a query concerning the KYC database which you can not reply with GraphRAG instruments, it's best to
- use the Neo4j MCP server to fetch the schema of the graph (if wanted)
- use the generate_cypher software to generate a Cypher question from query and the schema
- use the Neo4j MCP server to question the graph to reply the query
"""
The directions are essential. They set the agent’s persona and supply a high-level technique for easy methods to method issues, particularly when a pre-defined software doesn’t match the person’s request.
Now, let’s begin with a minimal agent. No instruments. Simply the directions.
# Agent Definition, we are going to add instruments later.
kyc_agent = Agent(
title="KYC Analyst",
directions=directions,
instruments=[...], # We are going to populate this listing
mcp_servers=[...] # And this one
)
Let’s add some instruments to our KYC Agent
An agent is just pretty much as good as its instruments. Let’s look at 5 instruments we’re giving our KYC analyst.
Software 1 & 2: Pre-defined Cypher Queries
For frequent and demanding queries, it’s finest to have optimized, pre-written Cypher queries wrapped in Python features. You should utilize the @function_tool decorator from the OpenAI Agent SDK to make these features obtainable to the agent.
Software 1: `find_customer_rings`
This software is designed to detect recursive patterns attribute of cash laundering, particularly ‘round transactions’ the place funds cycle by means of a number of accounts to disguise their origin.
In KYC graph, this interprets on to discovering cycles or paths that return to or close to their start line inside a directed transaction graph. Implementing such detection entails advanced graph traversal algorithms, usually using variable-length paths to discover connections as much as a sure ‘hop’ distance.
The code snippet beneath reveals a find_customer_rings operate that executes a Cypher Question in opposition to the KYC database and returns as much as 10 potential buyer rings. For every rings, the next info is returned: the purchasers accounts and transactions concerned in these rings.
@function_tool
def find_customer_rings(max_number_rings: int = 10, customer_in_watchlist: bool = True, ...):
"""
Detects round transaction patterns (as much as 6 hops) involving high-risk prospects.
Finds account cycles the place the accounts are owned by prospects matching specified
danger standards (watchlisted and/or PEP standing).
Args:
max_number_rings: Most rings to return (default: 10)
customer_in_watchlist: Filter for watchlisted prospects (default: True)
customer_is_pep: Filter for PEP prospects (default: False)
customer_id: Particular buyer to concentrate on (not applied)
Returns:
dict: Incorporates ring paths and related high-risk prospects
"""
logger.data(f"TOOL: FIND_CUSTOMER_RINGS")
with driver.session() as session:
outcome = session.run(
f"""
MATCH p=(a:Account)-[:FROM|TO*6]->(a:Account)
WITH p, [n IN nodes(p) WHERE n:Account] AS accounts
UNWIND accounts AS acct
MATCH (cust:Buyer)-[r:OWNS]->(acct)
WHERE cust.on_watchlist = $customer_in_watchlist
// ... extra Cypher to gather outcomes ...
""",
max_number_rings=max_number_rings,
customer_in_watchlist=customer_in_watchlist,
)
# ... Python code to course of and return outcomes ...
It’s value noting that the documentation string (doc string) is routinely utilized by OpenAI Brokers SDK because the software description! So good Python operate documentation pays off!.
Software 2: `get_customer_and_accounts`
A easy, but important, software for retrieving a buyer’s profile, together with their accounts and most up-to-date transactions. That is the bread-and-butter of any investigation. The code is much like our earlier software – a operate that takes a buyer ID and wraps round a easy Cypher question.
As soon as once more, the operate is embellished with @function_tool to make it obtainable to the agent.
The Cypher question wrapped by this Python is proven beneath
outcome = session.run(
"""
MATCH (c:Buyer {id: $customer_id})-[o:OWNS]->(a:Account)
WITH c, a
CALL (c,a) FROM]->(t:Transaction)
ORDER BY t.timestamp DESC
LIMIT $tx_limit
RETURN accumulate(t) as transactions
RETURN c as buyer, a as account, transactions
""",
customer_id=enter.customer_id
)
A notable side of this software’s design is using Pydantic to specify the operate’s output. The OpenAI AgentsSDK makes use of Pydantic fashions returned by the operate to routinely generate a textual content description of the output parameters.
In case you look fastidiously, the operate returns
return CustomerAccountsOutput(
buyer=CustomerModel(**buyer),
accounts=[AccountModel(**a) for a in accounts],
)
The CustomerModel and AccountModel embrace every of the properties returned for every Buyer, its accounts and a listing of latest transactions. You possibly can see their definition in schemas.py.
Instruments 3 & 4: The place Neo4j MCP Server meets Textual content-To-Cypher
That is the place our KYC agent will get some extra attention-grabbing powers.
A major problem in constructing versatile AI brokers is enabling them to work together dynamically with advanced information sources, past pre-defined, static features. Brokers want the power to carry out general-purpose querying the place new insights would possibly require spontaneous information exploration with out requiring a priori Python wrappers for each attainable motion.
This part explores a standard architectural sample to handle this. A software to translate pure language query into Cypher coupled with one other software to permit dynamic question execution.
We show this mechanism utilizing the Neo4 MCP Server to reveal dynamic graph question execution and a Google Gemma3-4B fine-tuned mannequin for Textual content-to-Cypher translation.
Software 3: Including the Neo4j MCP server toolset
For a sturdy agent to function successfully with a information graph, it wants to grasp the graph’s construction and to execute Cypher queries. These capabilities allow the agent to introspect the information and execute dynamic ad-hoc queries.
The MCP Neo4j Cypher server supplies the fundamental instruments: get-neo4j-schema (to retrieve graph schema dynamically), read-neo4j-cypher (for executing arbitrary learn queries), and write-neo4j-cypher (for create, replace, delete queries).
Luckily, the OpenAI Brokers SDK has assist for MCP. The code snippet beneath reveals how simple it’s so as to add the Neo4j MCP Server to our KYC Agent.
# Software 3: Neo4j MCP server setup
neo4j_mcp_server = MCPServerStdio(
params={
"command": "uvx",
"args": ["[email protected]"],
"env": {
"NEO4J_URI": NEO4J_URI,
"NEO4J_USERNAME": NEO4J_USER,
"NEO4J_PASSWORD": NEO4J_PASSWORD,
"NEO4J_DATABASE": NEO4J_DATABASE,
},
},
cache_tools_list=True,
title="Neo4j MCP Server",
)
You possibly can study extra about how MCP is supported in OpenAI Brokers SDK right here.
Software 4: A Textual content-To-Cypher Software
The flexibility to dynamically translate pure language into highly effective graph queries usually depends on specialised Giant Language Fashions (LLMs) – finetuned with schema-aware question era.
We are able to use open weights, publicly obtainable Textual content-to-Cypher fashions obtainable on Huggingface, equivalent to neo4j/text-to-cypher-Gemma-3-4B-Instruct-2025.04.0. This mannequin was particularly finetuned to generate correct Cypher queries from person query and a schema.
With the intention to run this mannequin on an area machine, we will flip to Ollama. Utilizing Llama.cpp, it’s comparatively easy to transform any HuggingFace fashions to GGUF format, which is required to run a mannequin in Ollama. Utilizing the ‘convert-hf-to-GGUF’ python script, I generated a GGUF model of the Gemma3-4B finetuned mannequin and uploaded it to Ollama.
In case you are an Ollama person, you may obtain this mannequin to your native machine with:
ollama pull ed-neo4j/t2c-gemma3-4b-it-q8_0-35k
What occurs when a person asks a query that doesn’t match any of our pre-defined instruments?
For instance, “For buyer CUST_00001, discover his addresses and verify if they’re shared with different prospects”
As a substitute of failing, our agent can generate a Cypher question on the fly…
@function_tool
async def generate_cypher(request: GenerateCypherRequest) -> str:
"""
Generate a Cypher question from pure language utilizing an area finetuned text2cypher Ollama mannequin
"""
USER_INSTRUCTION = """...""" # Detailed immediate directions
user_message = USER_INSTRUCTION.format(
schema=request.database_schema,
query=request.query
)
# Generate Cypher question utilizing the text2cypher mannequin
mannequin: str = "ed-neo4j/t2c-gemma3-4b-it-q8_0-35k"
response = await chat(
mannequin=mannequin,
messages=[{"role": "user", "content": user_message}]
)
return response['message']['content']
The generate_cypher software addresses the problem of Cypher question era, however how does the agent know when to make use of this software? The reply lies within the agent directions.
You could do not forget that at the beginning of the weblog, we outlined the directions for the agent as follows:
directions = """You're a KYC analyst with entry to a information graph. Use the instruments to reply questions on prospects, accounts, and suspicious patterns.
You might be additionally a Neo4j knowledgeable and may use the Neo4j MCP server to question the graph.
In case you get a query concerning the KYC database which you can not reply with GraphRAG instruments, it's best to
- use the Neo4j MCP server to get the schema of the graph (if wanted)
- use the generate_cypher software to generate a Cypher question from query and the schema
- use the Neo4j MCP server to question the graph to reply the query
"""
This time, notice the precise directions to deal with ad-hoc queries that may not be answered by the graph retrieval primarily based instruments.
When the agent goes down this path, it goes by means of following steps:
- The agent will get a novel query.
- It first calls `neo4j-mcp-server.get-neo4j-schema` to get the schema of the database.
- It then feeds the schema and the person’s query to the `generate_cypher` software. It will generate a Cypher question.
- Lastly, it takes the generated Cypher question and run it utilizing `neo4j-mcp-server.read-neo4j-cypher`.
If there are errors, in both the cypher era or the execution of the cypher, the agent retries to generate Cypher and rerun it.
As you may see, the above method isn’t bullet-proof. It depends closely on the Textual content-To-Cypher mannequin to supply legitimate and proper Cypher. Most often, it really works. Nonetheless, in circumstances the place it doesn’t, it’s best to contemplate:
- Defining express Cypher retrieval instruments for any such questions.
- Including some type of finish person suggestions (thumbs up / down) in your UI/UX. It will assist flag questions that the agent is combating. You possibly can then resolve finest method to deal with this class of questions. (e.g cypher retrieval software, higher directions, enchancment to text2cypher mannequin, guardrails or simply get your agent to politely decline to reply the query).
Software 5 – Including Reminiscence to the KYC Agent
The subject of agent reminiscence is getting numerous consideration currently.
Whereas brokers inherently handle short-term reminiscence by means of conversational historical past, advanced, multi-session duties like monetary investigations demand a extra persistent and evolving long-term reminiscence.
This long-term reminiscence isn’t only a log of previous interactions; it’s a dynamic information base that may accumulate insights, observe ongoing investigations, and supply context throughout completely different periods and even completely different brokers.
The create_memory software implements a type of express information graph reminiscence, the place summaries of investigations are saved as devoted nodes and explicitly linked to related entities (prospects, accounts, transactions).
@function_tool
def create_memory(content material: str, customer_ids: listing[str] = [], account_ids: listing[str] = [], transaction_ids: listing[str] = []) -> str:
"""
Create a Reminiscence node and hyperlink it to specified prospects, accounts, and transactions
"""
logger.data(f"TOOL: CREATE_MEMORY")
with driver.session() as session:
outcome = session.run(
"""
CREATE (m:Reminiscence {content material: $content material, created_at: datetime()})
WITH m
UNWIND $customer_ids as cid
MATCH (c:Buyer {id: cid})
MERGE (m)-[:FOR_CUSTOMER]->(c)
WITH m
UNWIND $account_ids as assist
MATCH (a:Account {id: assist})
MERGE (m)-[:FOR_ACCOUNT]->(a)
WITH m
UNWIND $transaction_ids as tid
MATCH (t:Transaction {id: tid})
MERGE (m)-[:FOR_TRANSACTION]->(t)
RETURN m.content material as content material
""",
content material=content material,
customer_ids=customer_ids,
account_ids=account_ids,
transaction_ids=transaction_ids
# ...
)
Extra concerns for implementing “agent reminiscence” embrace:
- Reminiscence Architectures: Exploring various kinds of reminiscence (episodic, semantic, procedural) and their frequent implementations (vector databases for semantic search, relational databases, or information graphs for structured insights).
- Contextualization: How the information graph construction permits for wealthy contextualization of recollections, enabling highly effective retrieval primarily based on relationships and patterns, moderately than simply key phrase matching.
- Replace and Retrieval Methods: How recollections are up to date over time (e.g., appended, summarized, refined) and the way they’re retrieved by the agent (e.g., by means of graph traversal, semantic similarity, or mounted guidelines).
- Challenges: The complexities of managing reminiscence consistency, dealing with conflicting info, stopping ‘hallucinations’ in reminiscence retrieval, and guaranteeing the reminiscence stays related and up-to-date with out changing into overly massive or noisy.”
That is an space of lively improvement and quickly evolving with many frameworks addressing a number of the concerns above.
Placing all of it collectively – An Instance Investigation
Let’s see how our agent handles a typical workflow. You possibly can run this your self (or be at liberty to observe alongside step-by-step directions on the KYC agent github repo)
1. “Get me the schema of the database“
- Agent Motion: The agent identifies this as a schema question and makes use of the Neo4j MCP Server’s `get-neo4j-schema` software.
2. “Present me 5 watchlisted prospects concerned in suspicious rings“
- Agent Motion: This immediately matches the aim of our customized software. The agent calls `find_customer_rings` with `customer_in_watchlist=True`.
3. “For every of those prospects, discover their addresses and discover out if they’re shared with different prospects“.
- Agent Motion: This can be a query that may’t be answered with any of the GraphRAG instruments. The agent ought to observe its directions:
- It already has the schema (from our first interplay above).
- It calls `generate_cypher` with the query and schema. The software returns a Cypher question that tries to reply the investigator’s query.
- It executes this Cypher question utilizing the Neo4j MCP Cypher Server `read-neo4j-cypher` software.
4. “For the shopper whose tackle is shared , are you able to get me extra particulars“
- Agent Motion: The agent determines that the `get_customer_and_accounts` software is the right match and calls it with the shopper’s ID.
5. “Write a 300-word abstract of this investigation. Retailer it as a reminiscence. Make certain to hyperlink it to each account and transaction belonging to this buyer“.
- Agent Motion: The agent first makes use of its inner LLM capabilities to generate the abstract. Then, it calls the `create_memory` software, passing the abstract textual content and the listing of all buyer, account, and transaction IDs it has encountered in the course of the dialog.
Key Takeaways
In case you bought this far, I hope you loved the journey of getting accustomed to a primary implementation of a KYC GraphRAG Agent. Plenty of cool applied sciences right here: OpenAI Agent SDK, MCP, Neo4j, Ollama and a Gemma3-4B finetuned Textual content-To-Cypher mannequin!
I hope you gained some appreciation for:
- GraphRAG, or extra particularly Graph-powered information retrieval as a necessary for connected-data issues. It permits brokers to reply questions on closely related information that will be inconceivable to reply with commonplace RAG.
- The significance of a balanced toolkit is highly effective. Mix MCP Server instruments with your personal optimized instruments.
- MCP Servers are a game-changer. They will let you join your brokers to an growing set of MCP servers.
- Experiment with extra MCP Servers so that you get a greater sense of the probabilities.
- Brokers ought to be capable of write again to your information retailer in a managed method.
- In our instance we noticed how an analyst can persist its findings (e.g., including Reminiscence nodes to the knowlege graph) and within the course of making a virtuous cycle the place the agent improves the underlying information base for total groups of investigators.
- The agent provides info to the information graph and it by no means updates or deletes current info.
The patterns and instruments mentioned right here are usually not restricted to KYC. They are often utilized to produce chain evaluation, digital twin administration, drug discovery, and another area the place the relationships between information factors are as vital as the information itself.
The period of graph-aware AI brokers is right here.
What’s Subsequent?
You have got constructed a easy AI agent on high of OpenAI Brokers SDK with MCP, Neo4j and a Textual content-to-Cypher mannequin. All operating on a single machine.
Whereas this preliminary agent supplies a robust basis, transitioning to a production-level system entails addressing a number of extra necessities, equivalent to:
- Agent UI/UX: That is the central half on your customers to work together along with your agent. It will finally be a key driver of the adoption and success of your agent.
Lengthy operating duties and multiagent techniques: Some duties are priceless however take a major period of time to run. In these circumstances, brokers ought to be capable of offload components of their workload to different brokers.- OpenAI does present some assist for handing off to subagents nevertheless it may not be appropriate for long-running brokers.
- Agent Guardrails – OpenAI Brokers SDK supplies some assist for Guardrails.
- Agent Internet hosting – It exposes your agent to your customers.
- Securing comms to your agent – Finish person authentication and authorization to your agent.
- Database entry controls – Managing entry management to the information saved within the KYC Information Graph.
- Dialog Historical past.
- Agent Observability.
- Agent Reminiscence.
- Agent Analysis – What’s the affect of fixing agent instruction and or including/eradicating a software?.
- And extra…
Within the meantime, I hope this has impressed you to continue to learn and experimenting!.
