Understanding Data Generation

This guide explains how Schemathesis generates test data for your API, from raw schemas to complete HTTP requests. Understanding this process helps you write better extensions, troubleshoot unexpected behavior, and optimize your testing strategy.

The Generation Hierarchy

Schemathesis data generation is built on the following hierarchy:

Hypothesis                    → Core data generation primitives
    ↓
hypothesis-jsonschema        → Schema-aware generation for OpenAPI
hypothesis-graphql           → Schema-aware generation for GraphQL  
    ↓
Schemathesis                 → Complete API testing workflow

How it works:

Hypothesis provides the foundation—strategies for generating strings, integers, objects, etc.
hypothesis-jsonschema and hypothesis-graphql translate your API schemas into Hypothesis strategies
Schemathesis orchestrates the entire process: parsing schemas, generating all request components, sending requests, and validating responses

This layered approach means Schemathesis inherits Hypothesis's powerful features (like automatic shrinking) while adding API-specific intelligence.

Testing Phases

Schemathesis generates test cases through multiple independent phases, each targeting different aspects of API testing.

Examples Phase

Uses example and examples from your schema, filling missing parameters with generated data.

Example:

# Schema
parameters:
  - name: limit
    schema:
      type: integer
      examples: [10, 50, 100]

# Produces: 3 test cases with limit=10, limit=50, limit=100

Coverage Phase

Generates boundary values, enum combinations, and constraint violations systematically.

Example:

# Schema: {"type": "string", "minLength": 2, "maxLength": 10}

# Produces: strings of length 2, 3, 9, 10 (boundaries)

Fuzzing Phase

Generates random, diverse data within schema constraints using Hypothesis strategies.

Example:

# Schema: {"type": "integer", "minimum": 0, "maximum": 100}

# Produces: random integers like 0, 47, 100
# plus unusual values Hypothesis finds interesting

Stateful Phase

Runs when OpenAPI schemas define links between operations. Creates sequences where response data feeds into subsequent requests.

Example:

# Schema with links: POST /users → GET /users/{id}

# Produces: POST /users, extract ID, then GET /users/{extracted_id}

Generation Modes

Schemathesis can generate two fundamentally different types of test data:

Positive Testing

Generates data that should be accepted by your API — valid according to your schema.

# Schema: {"type": "string", "minLength": 3}
# Positive examples: "abc", "hello", "test123"

Negative Testing

Generates data that should be rejected by your API — deliberately invalid according to your schema.

# Schema: {"type": "string", "minLength": 3}  
# Negative examples: 42, [], "", "ab"

How it works: Schemathesis mutates your schema to produce invalid data.

# Enable negative testing
schemathesis run --mode=negative https://api.example.com/openapi.json

Serialization Process

The final step transforms generated Python objects into actual HTTP requests based on your API's media types.

Media type support:

Schemathesis supports many common media types out of the box, including JSON, XML (with OpenAPI XML annotations), form data, plain text, and others. For unsupported media types, you can add custom serializers.

Example:

# Generated Python object
{"user_id": 123, "name": "test"}

# For application/json -> {"user_id": 123, "name": "test"}
# For application/xml -> <data><user_id>123</user_id><name>test</name></data>

If Schemathesis can't serialize data for a media type, those test cases are skipped. This keeps your test runs focused on actually testable scenarios.

Shrinking and Failure Handling

When Schemathesis finds a failing test case, it automatically shrinks it to the minimal example that reproduces the failure.

Before shrinking

{"name": "Very long user name", "age": 42, "metadata": {...}}

After shrinking

{"name": "a", "age": 42}  # Only essential data

Shrinking is enabled by default. Disable with --no-shrink for faster runs.

How Many Test Cases Does Schemathesis Generate?

Short answer: Up to --max-examples per operation (default: 100), but often fewer.

Why fewer:

Limited possibilities: Schema with enum: ["A", "B"] only generates 2 test cases
Phase limits: Examples phase generates exactly the number of examples in your schema
Coverage phase: Generates a deterministic count based on your constraints

Why more:

Rejected cases: Invalid data that can't be serialized gets discarded and retried
Shrinking: Additional test cases generated when minimizing failures