Document extraction guide
This guide documents the extraction API with both supported modes: vision and layout.
Guide structure
Modes at a glance
| Mode | Best for | Notes |
|---|---|---|
vision | Handwritten forms, free-layout scans, image-heavy pages | Vision-language extraction mode |
layout | Stable business documents such as invoices, orders, and contracts | Structured extraction mode with optional grounding |
Recommended reading order
- Start with Modes to understand the differences and selection advice for the two modes
- Then read Extract schema to learn how to write extract schema
- Finally read Response structure to understand how the returned results carry field values and grounding information