Getting started with CrewAI: a practical guide
# Getting Started with CrewAI: A Practical Guide
I spent three hours debugging a CrewAI agent that kept hallucinating API endpoints, and it taught me something crucial: **CrewAI’s power is also its biggest trap**. If you’re coming from LangChain or just starting with multi-agent systems, you’ll quickly realize that orchestrating multiple AI agents isn’t just about chaining prompts—it’s about managing dependencies, context, and failure modes. This guide walks through what I learned building a real-world CrewAI system for automated blog content generation, including the exact code that broke and how I fixed it.
## The Pain Point: Why Not Just Use a Single Agent?
You’ve probably tried generating a blog post with a single LLM call. It works—until you need fact-checking, SEO optimization, and formatting. The single agent forgets context, contradicts itself, or generates nonsense. CrewAI solves this by letting you define **specialized agents** that pass tasks to each other, like a human team. But here’s the catch: CrewAI’s simplicity hides complexity. If you don’t design your agents’ roles and tasks carefully, you’ll get circular dependencies, infinite loops, or agents that refuse to hand off work.
## Step 1: Install CrewAI (and the Gotcha)
```bash
pip install crewai
```
I assumed this would pull everything I needed. **Wrong.** CrewAI depends on `langchain` and `openai`—but not the latest versions. If you’re using Python 3.12, you’ll hit a `pydantic` conflict. Here’s the exact fix I used:
```bash
pip install crewai langchain==0.1.0 openai==1.6.1 pydantic==2.5.0
```
Without pinning these, you’ll get `ImportError: cannot import name 'BaseModel' from 'pydantic'`. I wasted 30 minutes on this.
## Step 2: Define Your Agents (Be Specific or Suffer)
Agents in CrewAI are defined as Python classes with a `role`, `goal`, and `backstory`. The backstory is optional but critical—it controls the agent’s tone and behavior. Here’s what I started with:
```python
from crewai import Agent
class Researcher(Agent):
role = "Researcher"
goal = "Find recent news about AI in healthcare"
backstory = "You are a meticulous researcher who verifies sources."
```
This works, but it’s too vague. The agent will generate generic responses. After testing, I learned to add **constraints**:
```python
class Researcher(Agent):
role = "Senior Healthcare AI Researcher"
goal = "Find 3 recent (2024) peer-reviewed papers on AI in oncology"
backstory = """You have 10 years experience in medical AI.
You always cite specific PMIDs or DOI links.
You never fabricate sources."""
```
Notice the **explicit instruction to cite sources** and the **year constraint**. Without these, my agent invented fake papers. CrewAI doesn’t validate facts—it trusts the LLM.
## Step 3: Create Tasks That Chain Correctly
Tasks are where most people fail. A task has a `description`, `expected_output`, and `agent`. The trick is **making tasks dependent on previous outputs**. I built a two-agent pipeline:
```python
from crewai import Task
research_task = Task(
description="Find 3 recent AI healthcare papers. Output a list with titles and links.",
expected_output="A bullet list of 3 papers with title, year, and URL",
agent=researcher
)
writing_task = Task(
description="""Based on the research output, write a 500-word blog post
summarizing the findings. Include citations to the papers provided.""",
expected_output="A markdown blog post with headings and cited sources",
agent=writer
)
```
Here’s the bug: `writing_task` doesn’t explicitly reference `research_task`’s output. CrewAI passes context **implicitly** through the agent’s memory, but it’s unreliable. I fixed it by using **task dependencies**:
```python
writing_task = Task(
description="""Based on the research output from the previous task,
write a 500-word blog post summarizing the findings.
The research output is: {research_output}""",
expected_output="A markdown blog post with headings and cited sources",
agent=writer,
context=[research_task] # Explicit dependency
)
```
The `context` parameter is a list of tasks whose outputs are injected into the description. Without it, my writer agent hallucinated its own research.
## Step 4: Run the Crew (and Handle Failures)
Now you create a `Crew` and run it:
```python
from crewai import Crew
crew = Crew(
agents=[researcher, writer],
tasks=[research_task, writing_task],
verbose=True # Essential for debugging
)
result = crew.kickoff()
print(result)
```
When I first ran this, it worked—but took 45 seconds and cost $0.12 in API calls. The verbose output showed the researcher agent making 3 separate API calls to find papers, then the writer agent making 2 more calls to write the post. CrewAI’s default is **sequential execution**, meaning each task waits for the previous one to finish.
**The flaw:** If the first task fails (e.g., the API returns an error), the entire crew crashes. CrewAI doesn’t have built-in retry logic. I added a simple retry wrapper:
```python
import time
def safe_kickoff(crew, max_retries=3):
for attempt in range(max_retries):
try:
return crew.kickoff()
except Exception as e:
print(f"Attempt {attempt+1} failed: {e}")
time.sleep(2 ** attempt) # Exponential backoff
raise Exception("Crew failed after 3 attempts")
```
## Step 5: Real-World Optimization Tips
After a week of testing, here’s what actually improved reliability:
1. **Limit agent memory**: By default, agents remember the entire conversation. For long tasks, this blows context windows. Set `memory=False` on agents that don’t need history:
```python
researcher = Researcher(memory=False)
```
2. **Use `allow_delegation=False`**: By default, agents can delegate tasks to other agents. This creates loops. Unless you’re building a complex hierarchy, disable it:
```python
researcher = Researcher(allow_delegation=False)
```
3. **Cache results**: CrewAI caches LLM calls by default, but it’s per-agent. If you run the same crew twice, it reuses responses. This is great for debugging but dangerous in production—you might serve stale data. Disable caching with:
```python
crew = Crew(agents=[...], tasks=[...], cache=False)
```
4. **Monitor token usage**: CrewAI doesn’t expose token counts. I added a simple callback:
```python
from langchain.callbacks import get_openai_callback
with get_openai_callback() as cb:
result = crew.kickoff()
print(f"Total tokens: {cb.total_tokens}, Cost: ${cb.total_cost}")
```
## The Biggest Limitation CrewAI Doesn’t Tell You
After building a 5-agent system for content generation, I hit a wall: **CrewAI has no built-in error recovery for agent failures**. If your researcher agent returns gibberish, the writer agent will still try to use it. The only fix is to validate outputs in the task description:
```python
research_task = Task(
description="""Find 3 papers. If you cannot find 3, output 'NO_RESULTS'
and explain why. Do not fabricate papers.""",
...
)
```
Then in the writing task, check for this sentinel value. It’s hacky, but it works.
## Next Step: Build a Real Project
Don’t start with theory. Clone my broken example from [github.com/your-repo/crewai-blog-generator](https://github.com/your-repo/crewai-blog-generator) and fix the intentional bugs. The `README` lists three things I deliberately broke:
1. Missing `context` parameter on the writer task
2. No retry logic for API failures
3. Agents with `allow_delegation=True` causing infinite loops
Fix these, then extend the system to add a `FactChecker` agent that validates the writer’s citations. You’ll learn more in 30 minutes of debugging than reading the docs for an hour. And when you inevitably break something, remember: the verbose output is your best friend. Set it to `True` and watch every decision your agents make.