is without doubt one of the most promising purposes of LLMs, and CrewAI has rapidly develop into a preferred framework for constructing agent groups. However one in all its most vital options—the hierarchical manager-worker course of—merely doesn’t perform as documented. In actual workflows, the supervisor doesn’t successfully coordinate brokers; as an alternative, CrewAI executes duties sequentially, resulting in incorrect reasoning, pointless software calls, and very excessive latency. This challenge has been highlighted in a number of on-line boards with no clear decision.
On this article, I exhibit why CrewAI’s hierarchical course of fails, present the proof from precise Langfuse traces, and supply a reproducible pathway to make the manager-worker sample work reliably utilizing customized prompting.
Multi-agent Orchestration
Earlier than we get into the main points, allow us to perceive what orchestration means in an agentic context. In easy phrases, orchestration is managing and coordinating a number of inter-dependent duties in a workflow. However have’nt workflow administration instruments (eg; RPA) been accessible eternally to do exactly that? So what modified with LLMs?
The reply is the flexibility of LLMs to grasp which means and intent from pure language directions, simply as individuals in a group would. Whereas earlier workflow instruments have been rule-based and inflexible, with LLMs functioning as brokers, the expectation is that they may be capable to perceive the intent of the consumer’s question, use reasoning to create a multi-step plan, infer the instruments for use, derive their inputs within the right codecs, and synthesize all of the totally different intermediate leads to a exact response to the consumer’s question. And the orchestration frameworks are supposed to information the LLM with applicable prompts for planning, tool-calling, producing response and so forth.
Among the many orchestration frameworks, CrewAI, with its pure language primarily based definition of duties, brokers and crews relies upon probably the most on the LLM’s potential to grasp language and handle workflows. Whereas not as deterministic as LangGraph (since LLM outputs can’t be absolutely deterministic), it abstracts away many of the complexity of routing, error dealing with and so forth into easy, user-friendly constructs with parameters, which the consumer can tune for applicable conduct. So it’s a good framework for creating prototypes by product groups and even non-developers.
Besides that the manager-worker sample doesn’t work as supposed…
As an example, let’s take a use-case to work with. And likewise consider the response primarily based on the next standards:
- High quality of orchestration
- High quality of ultimate response
- Explainability
- Latency and utilization value
Use Case
Take the case the place a group of buyer assist brokers resolve technical or billing tickets. When a ticket comes, a triage agent categorizes the ticket, then assigns to the technical or billing assist specialist for decision. There’s a Buyer Help Supervisor who coordinates the group, delegates duties and validates high quality of response.
Collectively they are going to be fixing queries akin to:
- Why is my laptop computer overheating?
- Why was I charged twice final month?
- My laptop computer is overheating and likewise, I used to be charged twice final month?
- My bill quantity is inaccurate after system glitch?
The primary question is only technical, so solely the technical assist agent must be invoked by the supervisor, the second is Billing solely and the third and fourth ones require solutions from each technical and billing brokers.
Let’s construct this group of CrewAI brokers and see how properly it really works.
Hierarchical Course of
In accordance with CrewAI documentation ,“adopting a hierarchical strategy permits for a transparent hierarchy in job administration, the place a ‘supervisor’ agent coordinates the workflow, delegates duties, and validates outcomes for streamlined and efficient execution. “ Additionally, the supervisor agent might be created in two methods, routinely by CrewAI or explicitly set by the consumer. Within the latter case, you might have extra management over directions to the supervisor agent. We’ll attempt each methods for our use case.
CrewAI Code
Following is the code for the use case. I’ve used gpt-4o because the LLM and Langfuse for observability.
from crewai import Agent, Crew, Course of, Process, LLM
from dotenv import load_dotenv
import os
from observe import * # Langfuse hint
load_dotenv()
verbose = False
max_iter = 4
API_VERSION = os.getenv(API_VERSION')
# Create your LLM
llm_a = LLM(
mannequin="gpt-4o",
api_version=API_VERSION,
temperature = 0.2,
max_tokens = 8000,
)
# Outline the supervisor agent
supervisor = Agent(
position="Buyer Help Supervisor",
aim="Oversee the assist group to make sure well timed and efficient decision of buyer inquiries. Use the software to categorize the consumer question first, then determine the subsequent steps.Syntesize responses from totally different brokers if wanted to offer a complete reply to the shopper.",
backstory=( """
You don't attempt to discover a solution to the consumer ticket {ticket} your self.
You delegate duties to coworkers primarily based on the next logic:
Notice the class of the ticket first by utilizing the triage agent.
If the ticket is categorized as 'Each', all the time assign it first to the Technical Help Specialist, then to the Billing Help Specialist, then print the ultimate mixed response. Make sure that the ultimate response solutions each technical and billing points raised within the ticket primarily based on the responses from each Technical and Billing Help Specialists.
ELSE
If the ticket is categorized as 'Technical', assign it to the Technical Help Specialist, else skip this step.
Earlier than continuing additional, analyse the ticket class. Whether it is 'Technical', print the ultimate response. Terminate additional actions.
ELSE
If the ticket is categorized as 'Billing', assign it to the Billing Help Specialist.
Lastly, compile and current the ultimate response to the shopper primarily based on the outputs from the assigned brokers.
"""
),
llm = llm_a,
allow_delegation=True,
verbose=verbose,
)
# Outline the triage agent
triage_agent = Agent(
position="Question Triage Specialist",
aim="Categorize the consumer question into technical or billing associated points. If a question requires each points, reply with 'Each'.",
backstory=(
"You're a seasoned knowledgeable in analysing intent of consumer question. You reply exactly with one phrase: 'Technical', 'Billing' or 'Each'."
),
llm = llm_a,
allow_delegation=False,
verbose=verbose,
)
# Outline the technical assist agent
technical_support_agent = Agent(
position="Technical Help Specialist",
aim="Resolve technical points reported by prospects promptly and successfully",
backstory=(
"You're a extremely expert technical assist specialist with a robust background in troubleshooting software program and {hardware} points. "
"Your main duty is to help prospects in resolving technical issues, guaranteeing their satisfaction and the graceful operation of their merchandise."
),
llm = llm_a,
allow_delegation=False,
verbose=verbose,
)
# Outline the billing assist agent
billing_support_agent = Agent(
position="Billing Help Specialist",
aim="Deal with buyer inquiries associated to billing, funds, and account administration",
backstory=(
"You're an skilled billing assist specialist with experience in dealing with buyer billing inquiries. "
"Your principal goal is to offer clear and correct data relating to billing processes, resolve fee points, and help with account administration to make sure buyer satisfaction."
),
llm = llm_a,
allow_delegation=False,
verbose=verbose,
)
# Outline duties
categorize_tickets = Process(
description="Categorize the incoming buyer assist ticket: '{ticket} primarily based on its content material to find out whether it is technical or billing-related. If a question requires each points, reply with 'Each'.",
expected_output="A categorized ticket labeled as 'Technical' or 'Billing' or 'Each'. Don't be verbose, simply reply with one phrase.",
agent=triage_agent,
)
resolve_technical_issues = Process(
description="Resolve technical points described within the ticket: '{ticket}'",
expected_output="Detailed options offered to every technical challenge.",
agent=technical_support_agent,
)
resolve_billing_issues = Process(
description="Resolve billing points described within the ticket: '{ticket}'",
expected_output="Complete responses to every billing-related inquiry.",
agent=billing_support_agent,
)
# Instantiate your crew with a customized supervisor and hierarchical course of
crew_q = Crew(
brokers=[triage_agent, technical_support_agent, billing_support_agent],
duties=[categorize_tickets, resolve_technical_issues, resolve_billing_issues],
# manager_llm = llm_a, # Uncomment for auto-created supervisor
manager_agent=supervisor, # Remark for auto-created supervisor
course of=Course of.hierarchical,
verbose=verbose,
)
As is clear, this system displays the group of human brokers. Not solely is there a manger, triage agent, technical and billing assist agent, however the CrewAI objects akin to Agent, Process and Crew are self-evident of their which means and straightforward to visualise. One other statement is that there’s little or no python code and many of the reasoning, planning and conduct is pure language primarily based which relies upon upon the flexibility of the LLM to derive which means and intent from language, then motive and plan for the aim.
A CrewAI code due to this fact, scores excessive on ease of improvement. It’s a low-code approach of making a movement rapidly with many of the heavy-lifting of the workflow being accomplished by the orchestration framework somewhat than the developer.
How properly does it work?
As we’re testing the hierarchical course of, the method parameter is about to Course of.hierarchical within the Crew definition. We will attempt totally different options of CrewAI as follows and measure efficiency:
- Supervisor agent auto-created by CrewAI
- Utilizing our customized supervisor agent
1. Auto-created supervisor agent
Enter question: Why is my laptop computer overheating?
Right here is the Langfuse hint:

The important thing observations are as follows:
- First the output is “Based mostly on the offered context, it appears there’s a misalignment between the character of the difficulty (laptop computer overheating) and its categorization as a billing concern. To make clear the connection, it will be vital to find out if the shopper is requesting a refund for the laptop computer because of the overheating challenge, disputing a cost associated to the acquisition or restore of the laptop computer, or searching for compensation for restore prices incurred because of the overheating…” For a question that was clearly a technical challenge, it is a poor response.
- Why does it occur? The left panel reveals that the execution first went to triage specialist, then to technical assist after which unusually, to billing assist specialist as properly. The next graphic depicts this as properly:

Trying carefully, we discover that the triage specialist appropriately recognized the ticket as “Technical” and the technical assist agent gave an excellent reply as follows:

However then, as an alternative of stopping and replying with the above because the response, the Crew Supervisor went to the Billing assist specialist and tried to discover a non-existent billing challenge within the purely technical consumer question.

This resulted within the Billing agent’s response overwriting the Technical agent’s response, with the Crew Supervisor doing a sub-optimal job of validating the standard of the ultimate response towards the consumer’s question.
Why did it occur?
As a result of within the Crew job definition, I specified the duties as categorize_tickets, resolve_technical_issues, resolve_billing_issues and though the method is meant to be hierarchical, the Crew Supervisor doesn’t carry out any orchestration, as an alternative merely executing all of the duties sequentially.
crew_q = Crew(
brokers=[triage_agent, technical_support_agent, billing_support_agent],
duties=[categorize_tickets, resolve_technical_issues, resolve_billing_issues],
manager_llm = llm_a,
course of=Course of.hierarchical,
verbose=verbose,
)
Should you now ask a billing-related question, it would seem to provide an accurate reply just because the resolve_billing_issues is the final job within the sequence.
What a couple of question that requires each technical and billing assist, akin to “My laptop computer is overheating and likewise I used to be charged twice final month?” On this case additionally, the triage agent appropriately categorizes the ticket sort as “Each”, and the technical and billing brokers give right solutions to their particular person queries, however the supervisor is unable to mix all of the responses right into a coherent reply to consumer’s question. As a substitute, the ultimate response solely considers the billing response since that’s the final job to be known as in sequence.

Latency and Utilization: As might be seen from the above picture, the Crew execution took virtually 38 secs and spent 15759 tokens. The ultimate output is barely about 200 tokens. The remainder of the tokens have been spent in all of the considering, agent calling, producing intermediate responses and so forth – all to generate an unsatisfactory response on the finish. The efficiency might be categorised as “Poor”.
Analysis of this strategy
- High quality of orchestration: Poor
- High quality of ultimate output: Poor
- Explainability: Poor
- Latency and Utilization: Poor
However maybe, the above consequence is because of the truth that we relied on CrewAI’s built-in supervisor, which didn’t have our customized directions. Due to this fact, in our subsequent strategy we exchange the CrewAI automated supervisor with our customized Supervisor agent, which has detailed directions on what to do in case of Technical, Billing or Each tickets.
2. Utilizing Customized Supervisor Agent
Our Buyer Help Supervisor is outlined with the next very particular directions. Notice that this requires some experimentation to get it working, and a generic supervisor immediate akin to that talked about within the CrewAI documentation will give the identical inaccurate outcomes because the built-in supervisor agent above.
position="Buyer Help Supervisor",
aim="Oversee the assist group to make sure well timed and efficient decision of buyer inquiries. Use the software to categorize the consumer question first, then determine the subsequent steps.Syntesize responses from totally different brokers if wanted to offer a complete reply to the shopper.",
backstory=( """
You don't attempt to discover a solution to the consumer ticket {ticket} your self.
You delegate duties to coworkers primarily based on the next logic:
Notice the class of the ticket first by utilizing the triage agent.
If the ticket is categorized as 'Each', all the time assign it first to the Technical Help Specialist, then to the Billing Help Specialist, then print the ultimate mixed response. Make sure that the ultimate response solutions each technical and billing points raised within the ticket primarily based on the responses from each Technical and Billing Help Specialists.
ELSE
If the ticket is categorized as 'Technical', assign it to the Technical Help Specialist, else skip this step.
Earlier than continuing additional, analyse the ticket class. Whether it is 'Technical', print the ultimate response. Terminate additional actions.
ELSE
If the ticket is categorized as 'Billing', assign it to the Billing Help Specialist.
Lastly, compile and current the ultimate response to the shopper primarily based on the outputs from the assigned brokers.
"""
And within the Crew definition, we use the customized supervisor as an alternative of the built-in one:
crew_q = Crew(
brokers=[triage_agent, technical_support_agent, billing_support_agent],
duties=[categorize_tickets, resolve_technical_issues, resolve_billing_issues],
# manager_llm = llm_a,
manager_agent=supervisor,
course of=Course of.hierarchical,
verbose=verbose,
)
Let’s repeat the take a look at circumstances
Enter question: Why is my laptop computer overheating?
The hint is the next:


Crucial statement is that now for this technical question, the movement didn’t go to the Billing assist specialist agent. The supervisor appropriately adopted directions, labeled the question as technical and stopped execution as soon as the Technical Help Specialist had generated its response. From the response preview displayed, it’s evident that it’s a good response for the consumer question. Additionally, the latency is 24 secs and token utilization is 10k.
Enter question: Why was I charged twice final month?
The hint is as follows:


As might be seen, the supervisor appropriately skipped executing the Technical Help Specialist, though that was earlier than the Billing agent within the Crew definition. As a substitute the response generated is of fine high quality from the Billing Help Specialist solely. Latency is 16 secs and token utilization 7,700 solely
Enter question: My laptop computer is overheating and likewise, I used to be charged twice final month?
The hint reveals the Supervisor executed each Technical and Billing assist brokers and offered a mixed response.

The response preview within the determine above doesn’t present the total response, which is as follows, and combines responses from each assist brokers. Latency is 38 secs and token utilization is 20k, which is commensurate with the a number of brokers orchestration and the detailed response generated.
Expensive Buyer,
Thanks for reaching out to us relating to the problems you're experiencing. We sincerely apologize for any inconvenience triggered. Under are the detailed options to deal with your issues:
**1. Laptop computer Overheating Subject:**
- **Examine for Correct Air flow**: Guarantee your laptop computer is positioned on a tough, flat floor to permit correct airflow. Keep away from utilizing it on mushy surfaces like beds or couches that may block the vents. Think about using a laptop computer cooling pad or stand with built-in followers to enhance airflow.
- **Clear the Laptop computer's Vents and Followers**: Mud and particles can accumulate within the vents and followers, limiting airflow. Energy off the laptop computer, unplug it, and use a can of compressed air to softly blow out mud from the vents. If you're comfy, you may clear the interior followers and parts extra completely, or take the laptop computer to an expert technician for inside cleansing.
- **Monitor Working Purposes and Processes**: Open the Process Supervisor (Home windows: Ctrl + Shift + Esc, macOS: Exercise Monitor) and test for processes consuming excessive CPU or GPU utilization. Shut pointless purposes or processes to cut back the load on the system.
- **Replace Drivers and Software program**: Replace your working system, drivers (particularly graphics drivers), and every other important software program to the most recent variations.
- **Examine for Malware or Viruses**: Run a full system scan utilizing a good antivirus program to detect and take away any malware.
- **Modify Energy Settings**: Modify your energy settings to "Balanced" or "Energy Saver" mode (Home windows: Management Panel > Energy Choices, macOS: System Preferences > Vitality Saver).
- **Examine the Laptop computer's {Hardware}**: If the laptop computer remains to be overheating, there could also be a problem with the {hardware}, akin to a failing fan or thermal paste that wants substitute. Seek the advice of an expert technician to examine and exchange the thermal paste or defective {hardware} parts if essential.
- **Environmental Components**: Function the laptop computer in a cool, well-ventilated surroundings. Keep away from utilizing the laptop computer in direct daylight or close to warmth sources.
- **Contemplate Upgrading Elements**: If the laptop computer is older, contemplate upgrading parts akin to RAM or switching to an SSD to cut back the pressure on the system and assist with warmth administration.
- **Monitor Temperature Ranges**: Set up a temperature monitoring software (e.g., HWMonitor, Core Temp, or Macs Fan Management) to maintain observe of the CPU and GPU temperatures. This may also help determine if the laptop computer is constantly working at excessive temperatures and supply insights into when the overheating happens.
If the issue persists after making an attempt all of the above options, please contact the laptop computer producer’s assist group or a licensed restore technician for additional help.
**2. Duplicate Billing Subject:**
- **Verification of Account Particulars**: I reviewed your account data to make sure accuracy and confirmed that the account is energetic and matches the reported challenge.
- **Billing Information Examine**: Upon inspecting the billing data for final month, I recognized two an identical costs processed on the identical day. This confirms the duplicate billing challenge.
- **Cost Technique Particulars**: I verified the fee technique used for the transactions and confirmed that each costs have been processed utilizing the identical fee technique.
- **Decision**:
- For the reason that duplicate cost was confirmed, I've initiated a refund for the second transaction. The refund might be processed again to the unique fee technique inside 5-7 enterprise days.
- Moreover, I've utilized a credit score to your account as a gesture of goodwill for the inconvenience triggered.
- **Buyer Communication**: You'll obtain an e mail or telephone name from us to tell you of the decision, together with the refund timeline and the utilized credit score.
We sincerely apologize for the inconvenience brought on by these points. Please tell us in case you want additional help or have any extra questions. We're right here to assist
And our remaining, extra nuanced question is as follows:
Enter question: My bill quantity is inaccurate after system glitch?
The hint reveals that the question was appropriately categorised as ‘Each’, requiring responses from Technical and Billing brokers and a mixed response was offered. Latency is 24 secs and tokens used are 12k.

The complete response is included right here:
Thanks for bringing this challenge to our consideration. Based mostly on the technical investigation, listed here are the steps we're taking to resolve the system glitch that triggered the wrong bill quantity:
1. We've verified the main points of the wrong bill quantity and recognized the difficulty within the system.
2. Our technical group has analyzed the system logs and reviewed the bill calculation logic to pinpoint the basis explanation for the error.
3. The required fixes have been applied to deal with the glitch, and the system has been completely examined to make sure the difficulty is resolved.
4. A corrected bill has been generated, and we now have verified its accuracy.
Moreover, right here’s the entire decision for the billing challenge associated to the wrong bill quantity brought on by the system glitch:
1. **Corrected Bill Verification**: The technical group has resolved the system glitch and generated a corrected bill. The corrected bill has been personally verified to make sure its accuracy, together with cross-checking the quantities, dates, and any relevant costs or credit.
2. **Buyer Communication**: We'll promptly notify you in regards to the decision of the difficulty. This communication consists of:
- An apology for the inconvenience brought on by the wrong bill.
- Affirmation that the system glitch has been resolved.
- Assurance that the corrected bill has been completely reviewed for accuracy.
- A duplicate of the corrected bill on your data.
3. **Further Steps Taken**: To stop comparable points sooner or later, the technical group has applied measures to make sure system stability and accuracy in bill technology.
4. **Account Adjustment (if relevant)**: If the wrong bill resulted in any overpayment or underpayment, the required changes might be made to your account. This consists of issuing a refund for any overpayment or offering clear directions for settling any excellent stability.
5. **Observe-Up**: We're right here to help you with any additional questions or issues relating to your account or billing. Please don't hesitate to achieve out to us, and we might be joyful to assist. On your comfort, we now have offered direct contact data for additional communication.
We sincerely apologize for any inconvenience this may occasionally have triggered and guarantee you that we're taking steps to stop comparable points sooner or later. Thanks on your understanding and persistence.
Analysis of this strategy
- High quality of orchestration: Good
- High quality of ultimate output: Good
- Explainability: Good (we perceive why it did what it did)
- Latency and Utilization: Truthful (commensurate with the complexity of the output)
Takeaway
In abstract, the hierarchical Supervisor–Employee sample in CrewAI doesn’t perform as documented. The core orchestration logic is weak; as an alternative of permitting the supervisor to selectively delegate duties, CrewAI executes all duties sequentially, inflicting incorrect agent invocation, overwritten outputs, and inflated latency/token utilization. Why it failed comes all the way down to the framework’s inside routing—hierarchical mode doesn’t implement conditional branching or true delegation, so the ultimate response is successfully decided by whichever job occurs to run final. The repair is introducing a customized supervisor agent with specific, step-wise directions: it makes use of the triage consequence, conditionally calls solely the required brokers, synthesizes their outputs, and terminates execution on the proper level—restoring right routing, enhancing output high quality, and considerably optimising token prices.
Conclusion
CrewAI, within the spirit of conserving the LLM on the middle of orchestration, relies upon upon it for many of the heavy-lifting of orchestration, utilising consumer prompts mixed with detailed scaffolding prompts embedded within the framework. In contrast to LangGraph and AutoGen, this strategy sacrifices determinism for developer-friendliness. And generally leads to surprising conduct for important options such because the manager-worker sample, essential for a lot of real-life use circumstances. This text makes an attempt to exhibit a pathway for reaching the specified orchestration for this sample utilizing cautious prompting. In future articles, I intend to discover extra options for CrewAI, LangGraph and others for his or her applicability in sensible use circumstances.
You should use CrewAI to design an interactive conversational assistant on a doc retailer and additional make the responses actually multimodal. Refer my articles on GraphRAG Design and Multimodal RAG.
Join with me and share your feedback at www.linkedin.com/in/partha-sarkar-lets-talk-AI
All pictures on this article drawn by me or generated utilizing Copilot or Langfuse. Code shared is written by me.
