Last month, I was running a project that required spinning up dozens of automated coding agents to refactor a massive monolithic codebase. Within 48 hours, my OpenAI bill had shot past $300. Claude Sonnet was even pricier at that scale. I needed a model that could handle agentic coding loops—reading files, writing code, running tests, and iterating—without requiring me to take out a second mortgage.
That's when I started looking seriously at GLM 5.2 from Zhipu AI. I'd heard it mentioned in a few developer forums as a budget-friendly coding model, but I was skeptical. Budget models usually mean budget results. After a week of testing it across my agent harness, though, I'm convinced it deserves a real spot in the rotation—especially for sub-agent tasks where you don't need frontier-model quality on every single call.
Here's the practical guide I wish I'd had when I started.
What GLM 5.2 Actually Is (And Why It Matters for Agents)
GLM 5.2 is the latest release in Zhipu AI's General Language Model series, developed out of Tsinghua University. It's not a general chatbot model trying to also write code—it's built specifically with agentic coding as a design target. That distinction matters more than you'd think.
The key specs that caught my attention:
- Code generation that's competitive with GPT-4o on HumanEval benchmarks
- 128K token context window, which is crucial when you're stuffing entire file trees into prompts
- Reliable function calling and tool use—this is the make-or-break feature for agents, and where a lot of cheaper models fall apart
- Strong multilingual performance, particularly in Chinese, but fully capable with English codebases
The real selling point for me was that last bullet about function calling. When you're running an agent loop, the model needs to consistently output structured tool calls—read this file, edit that function, run this test. Models that hallucinate tool formats or skip steps break the entire loop. GLM 5.2 has been surprisingly consistent here.
The Setup: Getting GLM 5.2 Running via OpenRouter
Here's where I hit my first wall. You can't just swap gpt-4o for glm-5.2 in your API call and call it a day. Zhipu AI's direct API works, but if you're already using OpenAI-compatible tooling (which most agent frameworks are), you need a routing layer. OpenRouter is the cleanest solution I've found.
Step 1: Get Your OpenRouter API Key
Head to openrouter.ai and create an account. Generate an API key from the dashboard. This is the only key you'll need—OpenRouter handles the routing to Zhipu's infrastructure.
Step 2: Configure Your Agent Framework
I primarily use Claude Code and a custom Python agent harness. Here's how to point both at GLM 5.2.
For Claude Code, you need to set environment variables:
export OPENROUTER_API_KEY="sk-or-v1-your-key-here"
export ANTHROPIC_BASE_URL="https://openrouter.ai/api/v1"
Then in your Claude Code config, set the model to:
openrouter:zhipu/glm-5.2
For a custom Python harness using the OpenAI SDK:
from openai import OpenAI
client = OpenAI(
base_url="https://openrouter.ai/api/v1",
api_key="sk-or-v1-your-key-here"
)
response = client.chat.completions.create(
model="zhipu/glm-5.2",
messages=[
{"role": "system", "content": "You are a coding agent. Read files, write code, and run tests."},
{"role": "user", "content": "Refactor the authentication module to use JWT tokens."}
],
tools=[...], # Your tool definitions here
temperature=0.1
)
Step 3: Verify It's Working
Before wiring this into a complex agent loop, test it with a simple call:
curl https://openrouter.ai/api/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-or-v1-your-key-here" \
-d '{
"model": "zhipu/glm-5.2",
"messages": [{"role": "user", "content": "Write a Python function that checks if a string is a valid email address."}]
}'
You should get back a clean response with working code. If you get a model not found error, double-check the model string—it needs to be exactly zhipu/glm-5.2.
The Mistakes I Made So You Don't Have To
Mistake #1: Using the wrong model identifier. I initially tried glm-5.2, zhipu/glm5.2, and zhipu/glm-5 before landing on the correct zhipu/glm-5.2. OpenRouter is picky about these strings. Check the model list at openrouter.ai/models if you're unsure.
Mistake #2: Treating it like GPT-4o for complex planning. GLM 5.2 is excellent at executing well-defined coding tasks, but it's not as strong at high-level architectural planning. I tried having it design a microservices architecture from scratch, and the results were mediocre. When I switched to using Claude for the initial design and GLM 5.2 for the implementation sub-tasks, the workflow became both cheaper and more reliable.
Mistake #3: Skimping on context. GLM 5.2 has a 128K context window, and it actually uses it well. Early on, I was being too aggressive about truncating context to save tokens. When I started loading full file contents and relevant dependencies up front, the code quality improved noticeably.
The Prompting Pattern That Actually Works
After a lot of trial and error, I've settled on a consistent pattern for getting good results from GLM 5.2 in agent loops:
- Give it a clear, specific goal — not "improve the code" but "refactor the
process_paymentfunction to separate validation logic from processing logic" - Load relevant context up front — include the file contents, related imports, and test files in the initial prompt
- Define explicit success criteria — "the refactored code must pass all existing tests and the validation function must return a tuple of (is_valid, error_message)"
Here's a real prompt template I use:
You are a coding agent working on a Python codebase.
GOAL: {specific_task}
CONTEXT:
File: {filename}
{file_contents}
Related files:
{related_file_contents}
SUCCESS CRITERIA:
- {criterion_1}
- {criterion_2}
- All existing tests must pass
Execute this task step by step. Read any files you need, make your changes, and verify the result.
This pattern consistently produces working code on the first or second attempt.
Real Results From My Project
On that monolith refactoring project I mentioned, here's what the numbers looked like after I switched the sub-agent tasks to GLM 5.2:
- Cost per agent run: dropped from ~$0.45 (GPT-4o) to ~$0.06 (GLM 5.2)
- Success rate on well-defined tasks: ~85% for GLM 5.2 vs ~92% for GPT-4o
- Time to completion: roughly the same, maybe 10-15% slower
The 7% drop in success rate sounds significant, but remember—when a sub-agent fails, you just retry. At 7x cheaper, you can afford a lot of retries and still come out way ahead.
Honest Limitations
GLM 5.2 isn't a replacement for frontier models across the board. Here's where it falls short:
- Complex multi-step reasoning that requires holding many abstract constraints in mind simultaneously—it can lose the thread
- Novel architectural decisions where there's not a well-established pattern to draw from
- Edge case handling in unfamiliar domains—it tends to be more conventional in its solutions, which is great for standard coding but less ideal for creative problem-solving
- Documentation quality—the code it writes works, but inline comments and docstrings are often sparse or generic
I also noticed occasional quirks with tool call formatting when the tool schema gets very complex (more than 10-15 parameters). Keeping tool definitions simple and well-documented helps a lot.
Practical Tips
Use it as a sub-agent, not the orchestrator. Let Claude or GPT-4o handle the planning and task decomposition, then dispatch implementation tasks to GLM 5.2.
Keep your tool schemas clean. Complex nested objects in tool definitions cause more issues than they're worth. Flatten parameters where possible.
Set temperature low. I use 0.1 for coding tasks. GLM 5.2 doesn't need "creativity" for writing a refactoring function—it needs consistency.
Always include test files in context. When GLM 5.2 can see the tests it needs to pass, success rates jump significantly.
Monitor your OpenRouter usage. The costs are low, but they add up fast when you're running parallel agent loops. Set spending limits in the OpenRouter dashboard.
GLM 5.2 isn't going to replace GPT-4o or Claude for everything. But for the 70% of agent tasks that are well-defined implementation work—writing functions, fixing bugs, refactoring modules—it's more than good enough at a fraction of the price. That's a trade I'll make every time.