Skip to main content
  1. Posts/

LLM Structured Output and Function Calling in 2026: A Complete Guide from JSON Mode to Tool Use

Author
XiDao
XiDao provides stable, high-speed, and cost-effective LLM API gateway services for developers worldwide. One API Key to access OpenAI, Anthropic, Google, Meta models with smart routing and auto-retry.

Introduction: Why Structured Output Matters
#

In 2026, large language model (LLM) applications have evolved from simple chatbots to complex autonomous agent systems. Throughout this evolution, one fundamental technical challenge has persisted: how to make LLM outputs reliably parseable by programs.

Traditional LLM output is free-form text, forcing developers to use fragile approaches like regex and string matching to extract information. Structured Output and Function Calling (Tool Use) technologies have completely changed this paradigm.

Industry data shows that over 85% of production-grade LLM applications in 2026 rely on some form of structured output. Whether it’s AI agents calling external tools, data extraction pipelines, or multi-model collaboration systems, structured output is indispensable infrastructure.

Core Concepts Explained
#

What is Structured Output?
#

Structured output means the LLM returns data conforming to a predefined JSON Schema, rather than free-form text. This guarantees output predictability and parseability.

What is Function Calling?
#

Function Calling (also known as Tool Use) allows developers to define a set of “tool functions.” The LLM can autonomously decide which function to call and what parameters to pass based on the user’s request. This is the core capability for building AI agents.

The Relationship Between the Two
#

Structured output and function calling are essentially two sides of the same coin:

  • Structured Output: Constrains the model’s output format to ensure parseability
  • Function Calling: Lets the model select tools and generate structured call parameters

2026 LLM Function Calling Capability Comparison
#

ModelStructured OutputParallel CallsNested SchemaStreaming Tool Use
Claude 4.7 (Anthropic)JSON ModeYesYesYes
GPT-5.5 (OpenAI)Structured OutputsYesYesYes
Gemini 2.5 Pro (Google)JSON ModeYesYesYes
Llama 4 Maverick (Meta)JSON ModeYesPartialYes
Qwen3 (Alibaba)JSON ModeYesYesYes
DeepSeek-V3JSON ModeYesPartialYes

Tutorial 1: Structured Data Extraction with Claude 4.7
#

Basic JSON Mode
#

import anthropic
import json

client = anthropic.Anthropic(
    base_url="https://api.xidao.online/v1",  # Using XiDao API Gateway
    api_key="your-xidao-api-key"
)

# Define a JSON Schema
schema = {
    "name": "extract_contact_info",
    "description": "Extract contact information from text",
    "input_schema": {
        "type": "object",
        "properties": {
            "name": {"type": "string", "description": "Full name"},
            "email": {"type": "string", "format": "email"},
            "phone": {"type": "string"},
            "company": {"type": "string"},
            "role": {"type": "string"}
        },
        "required": ["name"]
    }
}

text = """
Hi, I'm Sarah Chen, working as a senior ML engineer at Google DeepMind.
You can reach me at sarah.chen@google.com or call me at +1-650-555-0142.
"""

response = client.messages.create(
    model="claude-4-7-20250514",
    max_tokens=1024,
    tools=[schema],
    messages=[{
        "role": "user",
        "content": f"Extract contact information from the following text:\n\n{text}"
    }]
)

# Parse tool call results
for block in response.content:
    if block.type == "tool_use":
        result = block.input
        print(json.dumps(result, indent=2))

Output:

{
  "name": "Sarah Chen",
  "email": "sarah.chen@google.com",
  "phone": "+1-650-555-0142",
  "company": "Google DeepMind",
  "role": "Senior ML Engineer"
}

Nested Schema for Complex Data Extraction
#

For more complex scenarios, Claude 4.7 supports deeply nested schema definitions:

complex_schema = {
    "name": "extract_invoice",
    "description": "Extract invoice information",
    "input_schema": {
        "type": "object",
        "properties": {
            "invoice_number": {"type": "string"},
            "date": {"type": "string", "format": "date"},
            "vendor": {
                "type": "object",
                "properties": {
                    "name": {"type": "string"},
                    "tax_id": {"type": "string"},
                    "address": {"type": "string"}
                },
                "required": ["name"]
            },
            "items": {
                "type": "array",
                "items": {
                    "type": "object",
                    "properties": {
                        "description": {"type": "string"},
                        "quantity": {"type": "number"},
                        "unit_price": {"type": "number"},
                        "total": {"type": "number"}
                    },
                    "required": ["description", "total"]
                }
            },
            "total_amount": {"type": "number"},
            "currency": {"type": "string"}
        },
        "required": ["invoice_number", "items", "total_amount"]
    }
}

Tutorial 2: GPT-5.5 Structured Outputs
#

OpenAI further enhanced structured output capabilities in GPT-5.5 with the response_format parameter:

from openai import OpenAI
from pydantic import BaseModel, Field
from typing import List, Optional

client = OpenAI(
    base_url="https://api.xidao.online/v1",
    api_key="your-xidao-api-key"
)

# Define output structure using Pydantic models
class SentimentResult(BaseModel):
    sentiment: str = Field(description="positive, negative, or neutral")
    confidence: float = Field(ge=0, le=1, description="Confidence score 0-1")
    key_phrases: List[str] = Field(description="Key sentiment words")
    summary: str = Field(description="One-line summary")

class AnalysisResponse(BaseModel):
    results: List[SentimentResult]
    overall_sentiment: str
    processing_notes: Optional[str] = None

# Force output format using response_format
completion = client.beta.chat.completions.parse(
    model="gpt-5.5-turbo",
    messages=[
        {"role": "system", "content": "You are a professional text sentiment analysis assistant."},
        {"role": "user", "content": "Analyze the following reviews:\n1. This product is amazing, highly recommended!\n2. Terrible service attitude, never coming back.\n3. It's okay, nothing special."}
    ],
    response_format=AnalysisResponse
)

result = completion.choices[0].message.parsed
print(result.model_dump_json(indent=2))

Tutorial 3: Building a Multi-Tool Agent System
#

The real power of Function Calling lies in building agent systems that can make autonomous decisions. Here’s a complete multi-tool agent example:

import anthropic
import json
from datetime import datetime

client = anthropic.Anthropic(
    base_url="https://api.xidao.online/v1",
    api_key="your-xidao-api-key"
)

# Define tool set
tools = [
    {
        "name": "get_weather",
        "description": "Get weather information for a specified city",
        "input_schema": {
            "type": "object",
            "properties": {
                "city": {"type": "string", "description": "City name"},
                "unit": {"type": "string", "enum": ["celsius", "fahrenheit"], "default": "celsius"}
            },
            "required": ["city"]
        }
    },
    {
        "name": "search_flights",
        "description": "Search for flight information",
        "input_schema": {
            "type": "object",
            "properties": {
                "origin": {"type": "string", "description": "Departure city"},
                "destination": {"type": "string", "description": "Destination city"},
                "date": {"type": "string", "format": "date", "description": "Departure date"},
                "passengers": {"type": "integer", "minimum": 1, "default": 1}
            },
            "required": ["origin", "destination", "date"]
        }
    },
    {
        "name": "book_hotel",
        "description": "Book a hotel room",
        "input_schema": {
            "type": "object",
            "properties": {
                "city": {"type": "string"},
                "check_in": {"type": "string", "format": "date"},
                "check_out": {"type": "string", "format": "date"},
                "stars": {"type": "integer", "minimum": 1, "maximum": 5},
                "budget_max": {"type": "number"}
            },
            "required": ["city", "check_in", "check_out"]
        }
    }
]

# Simulated tool execution functions
def execute_tool(name, params):
    if name == "get_weather":
        return {"temp": 28, "condition": "Sunny", "humidity": 45}
    elif name == "search_flights":
        return {"flights": [{"airline": "United", "price": 380, "time": "08:30"}]}
    elif name == "book_hotel":
        return {"hotel": "Marriott Waikiki", "price": 220, "status": "confirmed"}
    return {"error": "Unknown tool"}

# Agent loop - supports multi-step reasoning
def run_agent(user_message, max_iterations=5):
    messages = [{"role": "user", "content": user_message}]

    for i in range(max_iterations):
        response = client.messages.create(
            model="claude-4-7-20250514",
            max_tokens=4096,
            tools=tools,
            messages=messages
        )

        # Check for tool calls
        tool_uses = [b for b in response.content if b.type == "tool_use"]

        if not tool_uses:
            # No tool calls, return final text response
            text_block = next(b for b in response.content if b.type == "text")
            return text_block.text

        # Execute all tool calls and collect results
        messages.append({"role": "assistant", "content": response.content})
        tool_results = []

        for tool_use in tool_uses:
            result = execute_tool(tool_use.name, tool_use.input)
            tool_results.append({
                "type": "tool_result",
                "tool_use_id": tool_use.id,
                "content": json.dumps(result)
            })

        messages.append({"role": "user", "content": tool_results})

    return "Max iterations reached"

# Run the agent
response = run_agent(
    "I want to travel to Hawaii next Wednesday. Check the weather in San Francisco, "
    "search for flights to Honolulu, and book me a 4-star hotel."
)
print(response)

Tutorial 4: Gemini 2.5 JSON Mode with Grounding
#

Google’s Gemini 2.5 Pro has unique advantages in structured output, especially in Grounding scenarios combined with Google Search:

from google import genai
from google.genai import types
import json

client = genai.Client(
    api_key="your-xidao-api-key",
    http_options={"base_url": "https://api.xidao.online/gemini"}
)

# Define response schema
response_schema = {
    "type": "object",
    "properties": {
        "company": {"type": "string"},
        "founded_year": {"type": "integer"},
        "founders": {"type": "array", "items": {"type": "string"}},
        "valuation_usd_billion": {"type": "number"},
        "key_products": {
            "type": "array",
            "items": {
                "type": "object",
                "properties": {
                    "name": {"type": "string"},
                    "launch_year": {"type": "integer"},
                    "description": {"type": "string"}
                }
            }
        },
        "recent_news": {"type": "string"}
    },
    "required": ["company", "founded_year"]
}

response = client.models.generate_content(
    model="gemini-2.5-pro",
    contents="Tell me about the latest developments at Anthropic",
    config=types.GenerateContentConfig(
        response_mime_type="application/json",
        response_schema=response_schema,
        temperature=0.3
    )
)

data = json.loads(response.text)
print(json.dumps(data, indent=2))

Best Practices and Common Pitfalls
#

1. Schema Design Principles
#

# BAD: Schema is too loose
bad_schema = {
    "type": "object",
    "properties": {
        "data": {"type": "string"}  # Anything can go in here
    }
}

# GOOD: Schema is strictly constrained
good_schema = {
    "type": "object",
    "properties": {
        "status": {"type": "string", "enum": ["success", "error", "pending"]},
        "code": {"type": "integer", "minimum": 100, "maximum": 599},
        "message": {"type": "string", "maxLength": 500},
        "timestamp": {"type": "string", "format": "date-time"}
    },
    "required": ["status", "code"],
    "additionalProperties": False
}

2. Error Handling and Retry Strategies
#

import tenacity
import anthropic

@tenacity.retry(
    stop=tenacity.stop_after_attempt(3),
    retry=tenacity.retry_if_exception_type((
        anthropic.APIError,
        anthropic.RateLimitError
    )),
    wait=tenacity.wait_exponential(multiplier=1, min=2, max=30)
)
def safe_structured_call(client, schema, prompt):
    """Structured output call with retry logic"""
    response = client.messages.create(
        model="claude-4-7-20250514",
        max_tokens=2048,
        tools=[schema],
        messages=[{"role": "user", "content": prompt}]
    )

    for block in response.content:
        if block.type == "tool_use":
            return block.input

    # If the model didn't call the tool (rare), extract text and try parsing
    text = next(b.text for b in response.content if b.type == "text")
    try:
        return json.loads(text)
    except json.JSONDecodeError:
        raise ValueError(f"Model failed to return valid JSON: {text[:200]}")

3. Streaming Function Calling
#

For real-time applications, streaming processing is essential:

def stream_tool_use(client, tools, message):
    """Stream processing for Function Calls"""
    current_tool = None
    accumulated_json = ""

    with client.messages.stream(
        model="claude-4-7-20250514",
        max_tokens=4096,
        tools=tools,
        messages=[{"role": "user", "content": message}]
    ) as stream:
        for event in stream:
            if event.type == "content_block_start":
                if hasattr(event.content_block, 'name'):
                    current_tool = event.content_block.name
                    print(f"Calling tool: {current_tool}")
            elif event.type == "content_block_delta":
                if hasattr(event.delta, 'partial_json'):
                    accumulated_json += event.delta.partial_json
                    print(f"  Accumulating params: {len(accumulated_json)} chars", end="\r")
            elif event.type == "content_block_stop":
                if current_tool:
                    params = json.loads(accumulated_json)
                    print(f"  Final params: {json.dumps(params, indent=2)}")
                    current_tool = None
                    accumulated_json = ""

4. Optimizing Tool Descriptions
#

The description field of a tool is critical for the model’s decision-making. Here are optimization tips:

# Poor description
{"name": "search", "description": "Search"}

# Good description
{
    "name": "search_products",
    "description": "Search the product database for items. Use this tool when the user "
                   "asks about product information, pricing, or availability. Supports "
                   "searching by name, category, and price range. Returns a list of "
                   "matching items with name, price, and stock status."
}

Performance Optimization Tips
#

Batch Processing
#

For large-scale data processing scenarios, using batch APIs can significantly reduce costs:

# Batch process structured extraction for multiple texts
batch_prompts = [
    "Extract: John Doe, 123 Main St, phone 555-0123",
    "Extract: Jane Smith, 456 Oak Ave, email jane@example.com",
    # ... more texts
]

# Using Batch API (supported by Claude)
batch_requests = [
    {
        "custom_id": f"extract-{i}",
        "params": {
            "model": "claude-4-7-20250514",
            "max_tokens": 512,
            "tools": [contact_schema],
            "messages": [{"role": "user", "content": prompt}]
        }
    }
    for i, prompt in enumerate(batch_prompts)
]

# Submit batch request with 50% discount
batch = client.batches.create(requests=batch_requests)

Using XiDao Gateway’s Smart Routing
#

When your application needs to select different models based on different scenarios, XiDao’s smart routing automatically chooses the optimal model:

import requests

# XiDao Smart Routing API
response = requests.post(
    "https://api.xidao.online/v1/chat/completions",
    headers={"Authorization": "Bearer your-xidao-api-key"},
    json={
        "model": "auto",  # Automatically select the optimal model
        "messages": [{"role": "user", "content": "Extract invoice information"}],
        "response_format": {"type": "json_object"},
        "tools": [invoice_schema]
    }
)

Summary
#

In 2026, structured output and function calling have become essential capabilities for LLM application development. Choosing the right model, designing good schemas, and implementing robust error handling are the keys to building reliable AI applications. With the XiDao API Gateway, you can seamlessly access structured output capabilities across all major models with a single API key, focusing on business logic rather than infrastructure.

Whether you’re building AI agents, data processing pipelines, or multi-model collaboration systems, mastering structured output is a must-have skill for every AI developer in 2026.


This article was written by the XiDao Tech Team. XiDao provides developers worldwide with stable, high-speed, and cost-effective LLM API gateway services, supporting unified access to Claude, GPT, Gemini, Llama, and other leading models. Visit global.xidao.online to learn more.

Related

Multi-Model AI Agent Orchestration in 2026: Collaborating with Claude 4.7, GPT-5.5, and Gemini 2.5

In 2026, AI Agents have moved from proof-of-concept to production deployment. The era of single-model solutions is fading, replaced by a new paradigm of multi-model collaborative orchestration. This article explores how to build high-performance multi-model AI Agent systems using Claude 4.7, GPT-5.5, and Gemini 2.5.

MCP Protocol and AI Agent Toolchains: A Developer's Essential Guide for 2026

The AI Agent Explosion of 2026 # In 2026, AI Agents have moved from proof-of-concept to production. Anthropic’s MCP (Model Context Protocol) has become the de facto standard for connecting large language models to external tools and data sources. The latest models like Claude 4.7 and GPT-5.5 natively support MCP tool calling. As a developer, mastering MCP protocol and AI Agent toolchain development has become one of the most valuable technical skills in 2026.

2026 AI API Price War: Who is the Cost-Performance King

·1976 words·10 mins
2026 AI API Price War: Who is the Cost-Performance King # In 2026, the AI large model API market has entered an unprecedented era of fierce price competition. From the shocking launch of DeepSeek R2 at the start of the year to the wave of price cuts by major providers mid-year, developers and businesses face increasingly complex decisions when choosing API services. This article provides a deep analysis of pricing strategies from major AI API providers, reveals hidden cost traps, and helps you find the true cost-performance champion.

2026 LLM Application Cost Optimization Complete Handbook

2026 LLM Application Cost Optimization Complete Handbook # In 2026, LLM API prices continue to decline, yet enterprise LLM bills are skyrocketing due to exponential growth in use cases. This guide provides a systematic cost optimization framework across 10 core dimensions, helping you reduce LLM operating costs by 70%+ without sacrificing quality. Table of Contents # Model Selection Strategy Prompt Engineering for Cost Reduction Context Caching Batch API for 50% Savings Token Counting & Monitoring Smart Routing by Task Complexity Streaming Responses Fine-tuning vs Few-shot Cost Analysis Response Caching XiDao API Gateway for Unified Cost Management 1. Model Selection Strategy # The 2026 LLM API market has stratified into clear pricing tiers. Choosing the right model is the single highest-impact cost optimization lever.