LLM Structured Output and Function Calling in 2026: A Complete Guide from JSON Mode to Tool Use

Table of Contents

Introduction: Why Structured Output Matters
#

In 2026, large language model (LLM) applications have evolved from simple chatbots to complex autonomous agent systems. Throughout this evolution, one fundamental technical challenge has persisted: how to make LLM outputs reliably parseable by programs.

Traditional LLM output is free-form text, forcing developers to use fragile approaches like regex and string matching to extract information. Structured Output and Function Calling (Tool Use) technologies have completely changed this paradigm.

Industry data shows that over 85% of production-grade LLM applications in 2026 rely on some form of structured output. Whether it’s AI agents calling external tools, data extraction pipelines, or multi-model collaboration systems, structured output is indispensable infrastructure.

Core Concepts Explained
#

What is Structured Output?
#

Structured output means the LLM returns data conforming to a predefined JSON Schema, rather than free-form text. This guarantees output predictability and parseability.

What is Function Calling?
#

Function Calling (also known as Tool Use) allows developers to define a set of “tool functions.” The LLM can autonomously decide which function to call and what parameters to pass based on the user’s request. This is the core capability for building AI agents.

The Relationship Between the Two
#

Structured output and function calling are essentially two sides of the same coin:

Structured Output: Constrains the model’s output format to ensure parseability
Function Calling: Lets the model select tools and generate structured call parameters

2026 LLM Function Calling Capability Comparison
#

Model	Structured Output	Parallel Calls	Nested Schema	Streaming Tool Use
Claude 4.7 (Anthropic)	JSON Mode	Yes	Yes	Yes
GPT-5.5 (OpenAI)	Structured Outputs	Yes	Yes	Yes
Gemini 2.5 Pro (Google)	JSON Mode	Yes	Yes	Yes
Llama 4 Maverick (Meta)	JSON Mode	Yes	Partial	Yes
Qwen3 (Alibaba)	JSON Mode	Yes	Yes	Yes
DeepSeek-V3	JSON Mode	Yes	Partial	Yes

Tutorial 1: Structured Data Extraction with Claude 4.7
#

Basic JSON Mode
#

import anthropic
import json

client = anthropic.Anthropic(
    base_url="https://api.xidao.online/v1",  # Using XiDao API Gateway
    api_key="your-xidao-api-key"
)

# Define a JSON Schema
schema = {
    "name": "extract_contact_info",
    "description": "Extract contact information from text",
    "input_schema": {
        "type": "object",
        "properties": {
            "name": {"type": "string", "description": "Full name"},
            "email": {"type": "string", "format": "email"},
            "phone": {"type": "string"},
            "company": {"type": "string"},
            "role": {"type": "string"}
        },
        "required": ["name"]
    }
}

text = """
Hi, I'm Sarah Chen, working as a senior ML engineer at Google DeepMind.
You can reach me at sarah.chen@google.com or call me at +1-650-555-0142.
"""

response = client.messages.create(
    model="claude-4-7-20250514",
    max_tokens=1024,
    tools=[schema],
    messages=[{
        "role": "user",
        "content": f"Extract contact information from the following text:\n\n{text}"
    }]
)

# Parse tool call results
for block in response.content:
    if block.type == "tool_use":
        result = block.input
        print(json.dumps(result, indent=2))

Output:

{
  "name": "Sarah Chen",
  "email": "sarah.chen@google.com",
  "phone": "+1-650-555-0142",
  "company": "Google DeepMind",
  "role": "Senior ML Engineer"
}

Nested Schema for Complex Data Extraction
#

For more complex scenarios, Claude 4.7 supports deeply nested schema definitions:

complex_schema = {
    "name": "extract_invoice",
    "description": "Extract invoice information",
    "input_schema": {
        "type": "object",
        "properties": {
            "invoice_number": {"type": "string"},
            "date": {"type": "string", "format": "date"},
            "vendor": {
                "type": "object",
                "properties": {
                    "name": {"type": "string"},
                    "tax_id": {"type": "string"},
                    "address": {"type": "string"}
                },
                "required": ["name"]
            },
            "items": {
                "type": "array",
                "items": {
                    "type": "object",
                    "properties": {
                        "description": {"type": "string"},
                        "quantity": {"type": "number"},
                        "unit_price": {"type": "number"},
                        "total": {"type": "number"}
                    },
                    "required": ["description", "total"]
                }
            },
            "total_amount": {"type": "number"},
            "currency": {"type": "string"}
        },
        "required": ["invoice_number", "items", "total_amount"]
    }
}

Tutorial 2: GPT-5.5 Structured Outputs
#

OpenAI further enhanced structured output capabilities in GPT-5.5 with the response_format parameter:

from openai import OpenAI
from pydantic import BaseModel, Field
from typing import List, Optional

client = OpenAI(
    base_url="https://api.xidao.online/v1",
    api_key="your-xidao-api-key"
)

# Define output structure using Pydantic models
class SentimentResult(BaseModel):
    sentiment: str = Field(description="positive, negative, or neutral")
    confidence: float = Field(ge=0, le=1, description="Confidence score 0-1")
    key_phrases: List[str] = Field(description="Key sentiment words")
    summary: str = Field(description="One-line summary")

class AnalysisResponse(BaseModel):
    results: List[SentimentResult]
    overall_sentiment: str
    processing_notes: Optional[str] = None

# Force output format using response_format
completion = client.beta.chat.completions.parse(
    model="gpt-5.5-turbo",
    messages=[
        {"role": "system", "content": "You are a professional text sentiment analysis assistant."},
        {"role": "user", "content": "Analyze the following reviews:\n1. This product is amazing, highly recommended!\n2. Terrible service attitude, never coming back.\n3. It's okay, nothing special."}
    ],
    response_format=AnalysisResponse
)

result = completion.choices[0].message.parsed
print(result.model_dump_json(indent=2))

Tutorial 3: Building a Multi-Tool Agent System
#

The real power of Function Calling lies in building agent systems that can make autonomous decisions. Here’s a complete multi-tool agent example:

import anthropic
import json
from datetime import datetime

client = anthropic.Anthropic(
    base_url="https://api.xidao.online/v1",
    api_key="your-xidao-api-key"
)

# Define tool set
tools = [
    {
        "name": "get_weather",
        "description": "Get weather information for a specified city",
        "input_schema": {
            "type": "object",
            "properties": {
                "city": {"type": "string", "description": "City name"},
                "unit": {"type": "string", "enum": ["celsius", "fahrenheit"], "default": "celsius"}
            },
            "required": ["city"]
        }
    },
    {
        "name": "search_flights",
        "description": "Search for flight information",
        "input_schema": {
            "type": "object",
            "properties": {
                "origin": {"type": "string", "description": "Departure city"},
                "destination": {"type": "string", "description": "Destination city"},
                "date": {"type": "string", "format": "date", "description": "Departure date"},
                "passengers": {"type": "integer", "minimum": 1, "default": 1}
            },
            "required": ["origin", "destination", "date"]
        }
    },
    {
        "name": "book_hotel",
        "description": "Book a hotel room",
        "input_schema": {
            "type": "object",
            "properties": {
                "city": {"type": "string"},
                "check_in": {"type": "string", "format": "date"},
                "check_out": {"type": "string", "format": "date"},
                "stars": {"type": "integer", "minimum": 1, "maximum": 5},
                "budget_max": {"type": "number"}
            },
            "required": ["city", "check_in", "check_out"]
        }
    }
]

# Simulated tool execution functions
def execute_tool(name, params):
    if name == "get_weather":
        return {"temp": 28, "condition": "Sunny", "humidity": 45}
    elif name == "search_flights":
        return {"flights": [{"airline": "United", "price": 380, "time": "08:30"}]}
    elif name == "book_hotel":
        return {"hotel": "Marriott Waikiki", "price": 220, "status": "confirmed"}
    return {"error": "Unknown tool"}

# Agent loop - supports multi-step reasoning
def run_agent(user_message, max_iterations=5):
    messages = [{"role": "user", "content": user_message}]

    for i in range(max_iterations):
        response = client.messages.create(
            model="claude-4-7-20250514",
            max_tokens=4096,
            tools=tools,
            messages=messages
        )

        # Check for tool calls
        tool_uses = [b for b in response.content if b.type == "tool_use"]

        if not tool_uses:
            # No tool calls, return final text response
            text_block = next(b for b in response.content if b.type == "text")
            return text_block.text

        # Execute all tool calls and collect results
        messages.append({"role": "assistant", "content": response.content})
        tool_results = []

        for tool_use in tool_uses:
            result = execute_tool(tool_use.name, tool_use.input)
            tool_results.append({
                "type": "tool_result",
                "tool_use_id": tool_use.id,
                "content": json.dumps(result)
            })

        messages.append({"role": "user", "content": tool_results})

    return "Max iterations reached"

# Run the agent
response = run_agent(
    "I want to travel to Hawaii next Wednesday. Check the weather in San Francisco, "
    "search for flights to Honolulu, and book me a 4-star hotel."
)
print(response)

Tutorial 4: Gemini 2.5 JSON Mode with Grounding
#

Google’s Gemini 2.5 Pro has unique advantages in structured output, especially in Grounding scenarios combined with Google Search:

from google import genai
from google.genai import types
import json

client = genai.Client(
    api_key="your-xidao-api-key",
    http_options={"base_url": "https://api.xidao.online/gemini"}
)

# Define response schema
response_schema = {
    "type": "object",
    "properties": {
        "company": {"type": "string"},
        "founded_year": {"type": "integer"},
        "founders": {"type": "array", "items": {"type": "string"}},
        "valuation_usd_billion": {"type": "number"},
        "key_products": {
            "type": "array",
            "items": {
                "type": "object",
                "properties": {
                    "name": {"type": "string"},
                    "launch_year": {"type": "integer"},
                    "description": {"type": "string"}
                }
            }
        },
        "recent_news": {"type": "string"}
    },
    "required": ["company", "founded_year"]
}

response = client.models.generate_content(
    model="gemini-2.5-pro",
    contents="Tell me about the latest developments at Anthropic",
    config=types.GenerateContentConfig(
        response_mime_type="application/json",
        response_schema=response_schema,
        temperature=0.3
    )
)

data = json.loads(response.text)
print(json.dumps(data, indent=2))

Best Practices and Common Pitfalls
#

1. Schema Design Principles
#

# BAD: Schema is too loose
bad_schema = {
    "type": "object",
    "properties": {
        "data": {"type": "string"}  # Anything can go in here
    }
}

# GOOD: Schema is strictly constrained
good_schema = {
    "type": "object",
    "properties": {
        "status": {"type": "string", "enum": ["success", "error", "pending"]},
        "code": {"type": "integer", "minimum": 100, "maximum": 599},
        "message": {"type": "string", "maxLength": 500},
        "timestamp": {"type": "string", "format": "date-time"}
    },
    "required": ["status", "code"],
    "additionalProperties": False
}

2. Error Handling and Retry Strategies
#

import tenacity
import anthropic

@tenacity.retry(
    stop=tenacity.stop_after_attempt(3),
    retry=tenacity.retry_if_exception_type((
        anthropic.APIError,
        anthropic.RateLimitError
    )),
    wait=tenacity.wait_exponential(multiplier=1, min=2, max=30)
)
def safe_structured_call(client, schema, prompt):
    """Structured output call with retry logic"""
    response = client.messages.create(
        model="claude-4-7-20250514",
        max_tokens=2048,
        tools=[schema],
        messages=[{"role": "user", "content": prompt}]
    )

    for block in response.content:
        if block.type == "tool_use":
            return block.input

    # If the model didn't call the tool (rare), extract text and try parsing
    text = next(b.text for b in response.content if b.type == "text")
    try:
        return json.loads(text)
    except json.JSONDecodeError:
        raise ValueError(f"Model failed to return valid JSON: {text[:200]}")

3. Streaming Function Calling
#

For real-time applications, streaming processing is essential:

def stream_tool_use(client, tools, message):
    """Stream processing for Function Calls"""
    current_tool = None
    accumulated_json = ""

    with client.messages.stream(
        model="claude-4-7-20250514",
        max_tokens=4096,
        tools=tools,
        messages=[{"role": "user", "content": message}]
    ) as stream:
        for event in stream:
            if event.type == "content_block_start":
                if hasattr(event.content_block, 'name'):
                    current_tool = event.content_block.name
                    print(f"Calling tool: {current_tool}")
            elif event.type == "content_block_delta":
                if hasattr(event.delta, 'partial_json'):
                    accumulated_json += event.delta.partial_json
                    print(f"  Accumulating params: {len(accumulated_json)} chars", end="\r")
            elif event.type == "content_block_stop":
                if current_tool:
                    params = json.loads(accumulated_json)
                    print(f"  Final params: {json.dumps(params, indent=2)}")
                    current_tool = None
                    accumulated_json = ""

4. Optimizing Tool Descriptions
#

The description field of a tool is critical for the model’s decision-making. Here are optimization tips:

# Poor description
{"name": "search", "description": "Search"}

# Good description
{
    "name": "search_products",
    "description": "Search the product database for items. Use this tool when the user "
                   "asks about product information, pricing, or availability. Supports "
                   "searching by name, category, and price range. Returns a list of "
                   "matching items with name, price, and stock status."
}

Performance Optimization Tips
#

Batch Processing
#

For large-scale data processing scenarios, using batch APIs can significantly reduce costs:

# Batch process structured extraction for multiple texts
batch_prompts = [
    "Extract: John Doe, 123 Main St, phone 555-0123",
    "Extract: Jane Smith, 456 Oak Ave, email jane@example.com",
    # ... more texts
]

# Using Batch API (supported by Claude)
batch_requests = [
    {
        "custom_id": f"extract-{i}",
        "params": {
            "model": "claude-4-7-20250514",
            "max_tokens": 512,
            "tools": [contact_schema],
            "messages": [{"role": "user", "content": prompt}]
        }
    }
    for i, prompt in enumerate(batch_prompts)
]

# Submit batch request with 50% discount
batch = client.batches.create(requests=batch_requests)

Using XiDao Gateway’s Smart Routing
#

When your application needs to select different models based on different scenarios, XiDao’s smart routing automatically chooses the optimal model:

import requests

# XiDao Smart Routing API
response = requests.post(
    "https://api.xidao.online/v1/chat/completions",
    headers={"Authorization": "Bearer your-xidao-api-key"},
    json={
        "model": "auto",  # Automatically select the optimal model
        "messages": [{"role": "user", "content": "Extract invoice information"}],
        "response_format": {"type": "json_object"},
        "tools": [invoice_schema]
    }
)

Summary
#

In 2026, structured output and function calling have become essential capabilities for LLM application development. Choosing the right model, designing good schemas, and implementing robust error handling are the keys to building reliable AI applications. With the XiDao API Gateway, you can seamlessly access structured output capabilities across all major models with a single API key, focusing on business logic rather than infrastructure.

Whether you’re building AI agents, data processing pipelines, or multi-model collaboration systems, mastering structured output is a must-have skill for every AI developer in 2026.

This article was written by the XiDao Tech Team. XiDao provides developers worldwide with stable, high-speed, and cost-effective LLM API gateway services, supporting unified access to Claude, GPT, Gemini, Llama, and other leading models. Visit global.xidao.online to learn more.

Introduction: Why Structured Output Matters#

Core Concepts Explained#

What is Structured Output?#

What is Function Calling?#

The Relationship Between the Two#

2026 LLM Function Calling Capability Comparison#

Tutorial 1: Structured Data Extraction with Claude 4.7#

Basic JSON Mode#

Nested Schema for Complex Data Extraction#

Tutorial 2: GPT-5.5 Structured Outputs#

Tutorial 3: Building a Multi-Tool Agent System#

Tutorial 4: Gemini 2.5 JSON Mode with Grounding#

Best Practices and Common Pitfalls#

1. Schema Design Principles#

2. Error Handling and Retry Strategies#

3. Streaming Function Calling#

4. Optimizing Tool Descriptions#

Performance Optimization Tips#

Batch Processing#

Using XiDao Gateway’s Smart Routing#

Summary#

Related