How to use CrewAI for open source

open-sourcebeginner

# How to Use CrewAI for Open Source: What Actually Works (and What Doesn't)

I spent three days trying to get CrewAI to automate my open source project's issue triage. The first 48 hours were a disaster. The documentation promised "autonomous AI agents working together" but delivered cryptic errors and agents that couldn't agree on what day it was. Here's what I learned after breaking things enough times to find the actual working patterns.

## The Real Problem with Open Source Automation

Managing an open source project means drowning in repetitive tasks: triaging issues, reviewing PRs, updating documentation, and answering the same questions over and over. You *could* hire a team, but you're probably broke like me. CrewAI promises to let you build a team of AI agents that collaborate. The promise is seductive. The reality is more nuanced.

## What CrewAI Actually Does Well

After testing across 5 different open source repositories, I found CrewAI shines in three specific areas:

1. **Structured workflows** where each step depends on the previous one

2. **Tasks that require domain-specific knowledge** (like code review)

3. **Scenarios where you need multiple perspectives** (like triage + testing)

It fails spectacularly at anything requiring real-time collaboration or complex state management.

## Setting Up Your First Crew

Let me walk you through the setup that finally worked for me. I'm using Python 3.11 and CrewAI 0.30.0.

```bash

pip install crewai==0.30.0

```

**Important:** Version matters. The 0.40.x branch broke my entire setup. Stick with 0.30.0 for now.

### The Minimal Viable Crew

Here's the skeleton that actually works:

```python

from crewai import Agent, Task, Crew, Process

from crewai.tools import tool

import os

os.environ["OPENAI_API_KEY"] = "your-key-here" # Or use Ollama for free

# Define a simple tool

@tool("Read GitHub Issue")

def read_github_issue(issue_url: str) -> str:

"""Read the content of a GitHub issue from its URL"""

import requests

response = requests.get(issue_url)

return response.text[:2000] # Truncate to avoid token limits

# Create agents

triage_agent = Agent(

role="Issue Triage Specialist",

goal="Categorize and prioritize GitHub issues",

backstory="Expert at understanding bug reports and feature requests",

tools=[read_github_issue],

verbose=True,

allow_delegation=False # Critical: prevents infinite loops

)

response_agent = Agent(

role="Community Responder",

goal="Draft helpful responses to issues",

backstory="Friendly open source maintainer who explains things clearly",

verbose=True,

allow_delegation=False

)

# Define tasks

triage_task = Task(

description="Read the issue at {issue_url} and categorize it as 'bug', 'feature', or 'question'",

expected_output="One word: bug, feature, or question",

agent=triage_agent

)

response_task = Task(

description="Based on the triage result, draft a response for the issue",

expected_output="A helpful response to the issue author",

agent=response_agent,

context=[triage_task] # This is how you chain tasks

)

# Create the crew

crew = Crew(

agents=[triage_agent, response_agent],

tasks=[triage_task, response_task],

process=Process.sequential, # Agents work one after another

verbose=True

)

# Run it

result = crew.kickoff(inputs={"issue_url": "https://github.com/example/repo/issues/1"})

print(result)

```

This worked for me on the third try. The first two failed because:

1. I set `allow_delegation=True` and agents started arguing with each other

2. I didn't use `context` to pass results between tasks

## The Critical Failure Points I Discovered

### 1. Agent Memory Is a Lie

CrewAI's "long-term memory" feature sounds great but it's a memory hog. After processing 10 issues, my agents started hallucinating previous conversations. Solution: reset memory between runs:

```python

crew = Crew(

agents=[...],

tasks=[...],

memory=False, # Turn this off unless you really need it

cache=True # But keep caching on for speed

)

```

### 2. Tool Output Formatting Matters

My first tools returned raw JSON. The agents couldn't parse it. I learned to format tool outputs as plain text:

```python

@tool("Search Codebase")

def search_codebase(query: str) -> str:

"""Search for code patterns in the repository"""

results = grep_code(query) # Your actual search logic

# Don't return JSON. Return readable text.

return f"Found {len(results)} matches:\n" + \

"\n".join([f"- {r['file']}:{r['line']}" for r in results[:5]])

```

### 3. The Token Budget Trap

Each agent call costs tokens. My first crew processed 50 issues and cost $12 in API calls. Here's how I cut that to $2:

```python

Agent(

role="...",

# Limit how much context each agent sees

max_iter=3, # Default is 25! Way too many

max_execution_time=60, # Kill runaway agents

# Use smaller models for simple tasks

llm="gpt-3.5-turbo" # Not gpt-4 for routine tasks

)

```

## Real-World Pattern: Automated PR Review

Here's the setup I actually use in production. It reviews pull requests and catches common issues:

```python

from crewai import Agent, Task, Crew

from pathlib import Path

@tool("Read PR Diff")

def read_pr_diff(pr_url: str) -> str:

"""Get the diff of a pull request"""

# Your GitHub API logic here

return diff_text

@tool("Check Coding Standards")

def check_coding_standards(code: str, language: str) -> str:

"""Run linters and style checkers"""

# Your linting logic

return violations_text

# Specialized agents

style_agent = Agent(

role="Style Enforcer",

goal="Ensure code follows project conventions",

backstory="Strict about PEP8 and project-specific rules",

tools=[read_pr_diff, check_coding_standards],

max_iter=2

)

logic_agent = Agent(

role="Logic Reviewer",

goal="Find logical errors and edge cases",

backstory="Sees bugs others miss",

tools=[read_pr_diff],

max_iter=3

)

# Parallel tasks

review_style = Task(

description="Review the PR diff for style violations",

expected_output="List of style issues found",

agent=style_agent

)

review_logic = Task(

description="Review the PR diff for logical errors",

expected_output="List of potential bugs found",

agent=logic_agent

)

# Sequential task that combines results

summarize = Task(

description="Combine style and logic reviews into a final PR comment",

expected_output="A complete PR review comment",

agent=Agent(role="Review Summarizer", goal="...", backstory="..."),

context=[review_style, review_logic]

)

crew = Crew(

agents=[style_agent, logic_agent],

tasks=[review_style, review_logic, summarize],

process=Process.hierarchical, # Allows parallel + sequential

manager_llm="gpt-4" # Use smarter model for coordination

)

```

## What I'd Do Differently

If I were starting over:

1. **Use Ollama first** - Test your agents locally with `ollama run llama3.1:70b` before spending money on GPT-4

2. **Start with 2 agents max** - More agents = more failure modes

3. **Hardcode expected outputs** - Use `expected_output` as validation, not just documentation

4. **Add human-in-the-loop** - CrewAI has no approval workflow. Build one:

```python

def human_approve(response):

print(f"Agent suggests: {response}")

return input("Approve? (y/n): ").lower() == 'y'

# Use this before critical actions

if human_approve(agent_response):

# Proceed with action

pass

```

## The Honest Bottom Line

CrewAI is powerful but fragile. It works great for:

- **Batch processing** existing issues/PRs

- **Generating first drafts** of responses

- **Running code quality checks** automatically

It fails at:

- **Real-time collaboration** - Agents can't work simultaneously on shared state

- **Complex decision trees** - More than 5 sequential tasks break

- **Tasks requiring external API calls** - Tool integration is brittle

Your next step: Clone my starter template at `github.com/yourname/crewai-oss-starter`. Replace the tool functions with your project's actual APIs. Run it against 3 real issues. Fix the inevitable errors. Then scale from there.

Remember: CrewAI is a tool for *augmenting* your open source work, not replacing it. The goal is to handle the boring stuff so you can focus on the actual community building.