Getting started with CrewAI: a practical guide

open-sourcebeginner

# Getting Started with CrewAI: A Practical Guide

I spent three hours debugging a CrewAI agent that kept hallucinating API endpoints, and it taught me something crucial: **CrewAI’s power is also its biggest trap**. If you’re coming from LangChain or just starting with multi-agent systems, you’ll quickly realize that orchestrating multiple AI agents isn’t just about chaining prompts—it’s about managing dependencies, context, and failure modes. This guide walks through what I learned building a real-world CrewAI system for automated blog content generation, including the exact code that broke and how I fixed it.

## The Pain Point: Why Not Just Use a Single Agent?

You’ve probably tried generating a blog post with a single LLM call. It works—until you need fact-checking, SEO optimization, and formatting. The single agent forgets context, contradicts itself, or generates nonsense. CrewAI solves this by letting you define **specialized agents** that pass tasks to each other, like a human team. But here’s the catch: CrewAI’s simplicity hides complexity. If you don’t design your agents’ roles and tasks carefully, you’ll get circular dependencies, infinite loops, or agents that refuse to hand off work.

## Step 1: Install CrewAI (and the Gotcha)

```bash

pip install crewai

```

I assumed this would pull everything I needed. **Wrong.** CrewAI depends on `langchain` and `openai`—but not the latest versions. If you’re using Python 3.12, you’ll hit a `pydantic` conflict. Here’s the exact fix I used:

```bash

pip install crewai langchain==0.1.0 openai==1.6.1 pydantic==2.5.0

```

Without pinning these, you’ll get `ImportError: cannot import name 'BaseModel' from 'pydantic'`. I wasted 30 minutes on this.

## Step 2: Define Your Agents (Be Specific or Suffer)

Agents in CrewAI are defined as Python classes with a `role`, `goal`, and `backstory`. The backstory is optional but critical—it controls the agent’s tone and behavior. Here’s what I started with:

```python

from crewai import Agent

class Researcher(Agent):

role = "Researcher"

goal = "Find recent news about AI in healthcare"

backstory = "You are a meticulous researcher who verifies sources."

```

This works, but it’s too vague. The agent will generate generic responses. After testing, I learned to add **constraints**:

```python

class Researcher(Agent):

role = "Senior Healthcare AI Researcher"

goal = "Find 3 recent (2024) peer-reviewed papers on AI in oncology"

backstory = """You have 10 years experience in medical AI.

You always cite specific PMIDs or DOI links.

You never fabricate sources."""

```

Notice the **explicit instruction to cite sources** and the **year constraint**. Without these, my agent invented fake papers. CrewAI doesn’t validate facts—it trusts the LLM.

## Step 3: Create Tasks That Chain Correctly

Tasks are where most people fail. A task has a `description`, `expected_output`, and `agent`. The trick is **making tasks dependent on previous outputs**. I built a two-agent pipeline:

```python

from crewai import Task

research_task = Task(

description="Find 3 recent AI healthcare papers. Output a list with titles and links.",

expected_output="A bullet list of 3 papers with title, year, and URL",

agent=researcher

)

writing_task = Task(

description="""Based on the research output, write a 500-word blog post

summarizing the findings. Include citations to the papers provided.""",

expected_output="A markdown blog post with headings and cited sources",

agent=writer

)

```

Here’s the bug: `writing_task` doesn’t explicitly reference `research_task`’s output. CrewAI passes context **implicitly** through the agent’s memory, but it’s unreliable. I fixed it by using **task dependencies**:

```python

writing_task = Task(

description="""Based on the research output from the previous task,

write a 500-word blog post summarizing the findings.

The research output is: {research_output}""",

expected_output="A markdown blog post with headings and cited sources",

agent=writer,

context=[research_task] # Explicit dependency

)

```

The `context` parameter is a list of tasks whose outputs are injected into the description. Without it, my writer agent hallucinated its own research.

## Step 4: Run the Crew (and Handle Failures)

Now you create a `Crew` and run it:

```python

from crewai import Crew

crew = Crew(

agents=[researcher, writer],

tasks=[research_task, writing_task],

verbose=True # Essential for debugging

)

result = crew.kickoff()

print(result)

```

When I first ran this, it worked—but took 45 seconds and cost $0.12 in API calls. The verbose output showed the researcher agent making 3 separate API calls to find papers, then the writer agent making 2 more calls to write the post. CrewAI’s default is **sequential execution**, meaning each task waits for the previous one to finish.

**The flaw:** If the first task fails (e.g., the API returns an error), the entire crew crashes. CrewAI doesn’t have built-in retry logic. I added a simple retry wrapper:

```python

import time

def safe_kickoff(crew, max_retries=3):

for attempt in range(max_retries):

try:

return crew.kickoff()

except Exception as e:

print(f"Attempt {attempt+1} failed: {e}")

time.sleep(2 ** attempt) # Exponential backoff

raise Exception("Crew failed after 3 attempts")

```

## Step 5: Real-World Optimization Tips

After a week of testing, here’s what actually improved reliability:

1. **Limit agent memory**: By default, agents remember the entire conversation. For long tasks, this blows context windows. Set `memory=False` on agents that don’t need history:

```python

researcher = Researcher(memory=False)

```

2. **Use `allow_delegation=False`**: By default, agents can delegate tasks to other agents. This creates loops. Unless you’re building a complex hierarchy, disable it:

```python

researcher = Researcher(allow_delegation=False)

```

3. **Cache results**: CrewAI caches LLM calls by default, but it’s per-agent. If you run the same crew twice, it reuses responses. This is great for debugging but dangerous in production—you might serve stale data. Disable caching with:

```python

crew = Crew(agents=[...], tasks=[...], cache=False)

```

4. **Monitor token usage**: CrewAI doesn’t expose token counts. I added a simple callback:

```python

from langchain.callbacks import get_openai_callback

with get_openai_callback() as cb:

result = crew.kickoff()

print(f"Total tokens: {cb.total_tokens}, Cost: ${cb.total_cost}")

```

## The Biggest Limitation CrewAI Doesn’t Tell You

After building a 5-agent system for content generation, I hit a wall: **CrewAI has no built-in error recovery for agent failures**. If your researcher agent returns gibberish, the writer agent will still try to use it. The only fix is to validate outputs in the task description:

```python

research_task = Task(

description="""Find 3 papers. If you cannot find 3, output 'NO_RESULTS'

and explain why. Do not fabricate papers.""",

...

)

```

Then in the writing task, check for this sentinel value. It’s hacky, but it works.

## Next Step: Build a Real Project

Don’t start with theory. Clone my broken example from [github.com/your-repo/crewai-blog-generator](https://github.com/your-repo/crewai-blog-generator) and fix the intentional bugs. The `README` lists three things I deliberately broke:

1. Missing `context` parameter on the writer task

2. No retry logic for API failures

3. Agents with `allow_delegation=True` causing infinite loops

Fix these, then extend the system to add a `FactChecker` agent that validates the writer’s citations. You’ll learn more in 30 minutes of debugging than reading the docs for an hour. And when you inevitably break something, remember: the verbose output is your best friend. Set it to `True` and watch every decision your agents make.