Understanding Data Generation
This guide explains how Schemathesis generates test data for your API, from raw schemas to complete HTTP requests. Understanding this process helps you write better extensions, troubleshoot unexpected behavior, and optimize your testing strategy.
The Generation Hierarchy
Schemathesis data generation is built on the following hierarchy:
Hypothesis → Core data generation primitives
↓
hypothesis-jsonschema → Schema-aware generation for OpenAPI
hypothesis-graphql → Schema-aware generation for GraphQL
↓
Schemathesis → Complete API testing workflow
How it works:
- Hypothesis provides the foundation—strategies for generating strings, integers, objects, etc.
- hypothesis-jsonschema and hypothesis-graphql translate your API schemas into Hypothesis strategies
- Schemathesis orchestrates the entire process: parsing schemas, generating all request components, sending requests, and validating responses
This layered approach means Schemathesis inherits Hypothesis's powerful features (like automatic shrinking) while adding API-specific intelligence.
Testing Phases
Schemathesis generates test cases through multiple independent phases, each targeting different aspects of API testing.
Examples Phase
Uses example
and examples
from your schema, filling missing parameters with generated data.
Example:
# Schema
parameters:
- name: limit
schema:
type: integer
examples: [10, 50, 100]
# Produces: 3 test cases with limit=10, limit=50, limit=100
Coverage Phase
Generates boundary values, enum combinations, and constraint violations systematically.
Example:
# Schema: {"type": "string", "minLength": 2, "maxLength": 10}
# Produces: strings of length 2, 3, 9, 10 (boundaries)
Fuzzing Phase
Generates random, diverse data within schema constraints using Hypothesis strategies.
Example:
# Schema: {"type": "integer", "minimum": 0, "maximum": 100}
# Produces: random integers like 0, 47, 100
# plus unusual values Hypothesis finds interesting
Stateful Phase
Runs when OpenAPI schemas define links between operations. Creates sequences where response data feeds into subsequent requests.
Example:
# Schema with links: POST /users → GET /users/{id}
# Produces: POST /users, extract ID, then GET /users/{extracted_id}
Generation Modes
Schemathesis can generate two fundamentally different types of test data:
Positive Testing
Generates data that should be accepted by your API — valid according to your schema.
Negative Testing
Generates data that should be rejected by your API — deliberately invalid according to your schema.
How it works: Schemathesis mutates your schema to produce invalid data.
Serialization Process
The final step transforms generated Python objects into actual HTTP requests based on your API's media types.
Media type support:
Schemathesis supports many common media types out of the box, including JSON, XML (with OpenAPI XML annotations), form data, plain text, and others. For unsupported media types, you can add custom serializers.
Example:
# Generated Python object
{"user_id": 123, "name": "test"}
# For application/json -> {"user_id": 123, "name": "test"}
# For application/xml -> <data><user_id>123</user_id><name>test</name></data>
If Schemathesis can't serialize data for a media type, those test cases are skipped. This keeps your test runs focused on actually testable scenarios.
Shrinking and Failure Handling
When Schemathesis finds a failing test case, it automatically shrinks it to the minimal example that reproduces the failure.
Before shrinking
After shrinking
Shrinking is enabled by default. Disable with --no-shrink
for faster runs.
How Many Test Cases Does Schemathesis Generate?
Short answer: Up to --max-examples
per operation (default: 100), but often fewer.
Why fewer:
- Limited possibilities: Schema with
enum: ["A", "B"]
only generates 2 test cases - Phase limits: Examples phase generates exactly the number of examples in your schema
- Coverage phase: Generates a deterministic count based on your constraints
Why more:
- Rejected cases: Invalid data that can't be serialized gets discarded and retried
- Shrinking: Additional test cases generated when minimizing failures