How to use CrewAI for open source
# How to Use CrewAI for Open Source: What Actually Works (and What Doesn't)
I spent three days trying to get CrewAI to automate my open source project's issue triage. The first 48 hours were a disaster. The documentation promised "autonomous AI agents working together" but delivered cryptic errors and agents that couldn't agree on what day it was. Here's what I learned after breaking things enough times to find the actual working patterns.
## The Real Problem with Open Source Automation
Managing an open source project means drowning in repetitive tasks: triaging issues, reviewing PRs, updating documentation, and answering the same questions over and over. You *could* hire a team, but you're probably broke like me. CrewAI promises to let you build a team of AI agents that collaborate. The promise is seductive. The reality is more nuanced.
## What CrewAI Actually Does Well
After testing across 5 different open source repositories, I found CrewAI shines in three specific areas:
1. **Structured workflows** where each step depends on the previous one
2. **Tasks that require domain-specific knowledge** (like code review)
3. **Scenarios where you need multiple perspectives** (like triage + testing)
It fails spectacularly at anything requiring real-time collaboration or complex state management.
## Setting Up Your First Crew
Let me walk you through the setup that finally worked for me. I'm using Python 3.11 and CrewAI 0.30.0.
```bash
pip install crewai==0.30.0
```
**Important:** Version matters. The 0.40.x branch broke my entire setup. Stick with 0.30.0 for now.
### The Minimal Viable Crew
Here's the skeleton that actually works:
```python
from crewai import Agent, Task, Crew, Process
from crewai.tools import tool
import os
os.environ["OPENAI_API_KEY"] = "your-key-here" # Or use Ollama for free
# Define a simple tool
@tool("Read GitHub Issue")
def read_github_issue(issue_url: str) -> str:
"""Read the content of a GitHub issue from its URL"""
import requests
response = requests.get(issue_url)
return response.text[:2000] # Truncate to avoid token limits
# Create agents
triage_agent = Agent(
role="Issue Triage Specialist",
goal="Categorize and prioritize GitHub issues",
backstory="Expert at understanding bug reports and feature requests",
tools=[read_github_issue],
verbose=True,
allow_delegation=False # Critical: prevents infinite loops
)
response_agent = Agent(
role="Community Responder",
goal="Draft helpful responses to issues",
backstory="Friendly open source maintainer who explains things clearly",
verbose=True,
allow_delegation=False
)
# Define tasks
triage_task = Task(
description="Read the issue at {issue_url} and categorize it as 'bug', 'feature', or 'question'",
expected_output="One word: bug, feature, or question",
agent=triage_agent
)
response_task = Task(
description="Based on the triage result, draft a response for the issue",
expected_output="A helpful response to the issue author",
agent=response_agent,
context=[triage_task] # This is how you chain tasks
)
# Create the crew
crew = Crew(
agents=[triage_agent, response_agent],
tasks=[triage_task, response_task],
process=Process.sequential, # Agents work one after another
verbose=True
)
# Run it
result = crew.kickoff(inputs={"issue_url": "https://github.com/example/repo/issues/1"})
print(result)
```
This worked for me on the third try. The first two failed because:
1. I set `allow_delegation=True` and agents started arguing with each other
2. I didn't use `context` to pass results between tasks
## The Critical Failure Points I Discovered
### 1. Agent Memory Is a Lie
CrewAI's "long-term memory" feature sounds great but it's a memory hog. After processing 10 issues, my agents started hallucinating previous conversations. Solution: reset memory between runs:
```python
crew = Crew(
agents=[...],
tasks=[...],
memory=False, # Turn this off unless you really need it
cache=True # But keep caching on for speed
)
```
### 2. Tool Output Formatting Matters
My first tools returned raw JSON. The agents couldn't parse it. I learned to format tool outputs as plain text:
```python
@tool("Search Codebase")
def search_codebase(query: str) -> str:
"""Search for code patterns in the repository"""
results = grep_code(query) # Your actual search logic
# Don't return JSON. Return readable text.
return f"Found {len(results)} matches:\n" + \
"\n".join([f"- {r['file']}:{r['line']}" for r in results[:5]])
```
### 3. The Token Budget Trap
Each agent call costs tokens. My first crew processed 50 issues and cost $12 in API calls. Here's how I cut that to $2:
```python
Agent(
role="...",
# Limit how much context each agent sees
max_iter=3, # Default is 25! Way too many
max_execution_time=60, # Kill runaway agents
# Use smaller models for simple tasks
llm="gpt-3.5-turbo" # Not gpt-4 for routine tasks
)
```
## Real-World Pattern: Automated PR Review
Here's the setup I actually use in production. It reviews pull requests and catches common issues:
```python
from crewai import Agent, Task, Crew
from pathlib import Path
@tool("Read PR Diff")
def read_pr_diff(pr_url: str) -> str:
"""Get the diff of a pull request"""
# Your GitHub API logic here
return diff_text
@tool("Check Coding Standards")
def check_coding_standards(code: str, language: str) -> str:
"""Run linters and style checkers"""
# Your linting logic
return violations_text
# Specialized agents
style_agent = Agent(
role="Style Enforcer",
goal="Ensure code follows project conventions",
backstory="Strict about PEP8 and project-specific rules",
tools=[read_pr_diff, check_coding_standards],
max_iter=2
)
logic_agent = Agent(
role="Logic Reviewer",
goal="Find logical errors and edge cases",
backstory="Sees bugs others miss",
tools=[read_pr_diff],
max_iter=3
)
# Parallel tasks
review_style = Task(
description="Review the PR diff for style violations",
expected_output="List of style issues found",
agent=style_agent
)
review_logic = Task(
description="Review the PR diff for logical errors",
expected_output="List of potential bugs found",
agent=logic_agent
)
# Sequential task that combines results
summarize = Task(
description="Combine style and logic reviews into a final PR comment",
expected_output="A complete PR review comment",
agent=Agent(role="Review Summarizer", goal="...", backstory="..."),
context=[review_style, review_logic]
)
crew = Crew(
agents=[style_agent, logic_agent],
tasks=[review_style, review_logic, summarize],
process=Process.hierarchical, # Allows parallel + sequential
manager_llm="gpt-4" # Use smarter model for coordination
)
```
## What I'd Do Differently
If I were starting over:
1. **Use Ollama first** - Test your agents locally with `ollama run llama3.1:70b` before spending money on GPT-4
2. **Start with 2 agents max** - More agents = more failure modes
3. **Hardcode expected outputs** - Use `expected_output` as validation, not just documentation
4. **Add human-in-the-loop** - CrewAI has no approval workflow. Build one:
```python
def human_approve(response):
print(f"Agent suggests: {response}")
return input("Approve? (y/n): ").lower() == 'y'
# Use this before critical actions
if human_approve(agent_response):
# Proceed with action
pass
```
## The Honest Bottom Line
CrewAI is powerful but fragile. It works great for:
- **Batch processing** existing issues/PRs
- **Generating first drafts** of responses
- **Running code quality checks** automatically
It fails at:
- **Real-time collaboration** - Agents can't work simultaneously on shared state
- **Complex decision trees** - More than 5 sequential tasks break
- **Tasks requiring external API calls** - Tool integration is brittle
Your next step: Clone my starter template at `github.com/yourname/crewai-oss-starter`. Replace the tool functions with your project's actual APIs. Run it against 3 real issues. Fix the inevitable errors. Then scale from there.
Remember: CrewAI is a tool for *augmenting* your open source work, not replacing it. The goal is to handle the boring stuff so you can focus on the actual community building.