This is a synchronous API endpoint. This endpoint will return the result once the document is processed.
When using templates, you can provide a templateId to load predefined
configurations. Any configuration parameters (schema, extractPerPage, etc.)
explicitly specified in the API request will override the corresponding
template settings.
Body Parameters
Either file or URL is required but not both. See Accepted File Types .
URL of the document to extract data from
The file to extract data from. Use multipart/form-data as the Content-Type
header.
The template ID used for extraction.
JSON schema to define the structure of extracted data. See JSON schema
examples .
Overrides the schema from the template if provided.
Whether to exclude OCR result from the response. Defaults to false.
Whether to maintain format from the previous page. Defaults to false.
Overrides the maintainFormat from the template if provided.
Array of page numbers to process. Defaults to all pages.
Array of schema properties to extract per page. Defaults to empty array.
Overrides the extractPerPage from the template if provided.
If true, both document images and OCR result will be used to extract data.
Defaults to false.
Whether to bypass the cache and process the document from scratch. Defaults to
false.
Whether to extract directly from document images. Defaults to false.
Whether to include confidence intervals. Defaults to false.
Unique identifier for the webhook callback
Example JSON Schema
This is a JSON Schema, which defines the structure and validation rules for the JSON. For more examples and details, see JSON Schema Examples .
{
"type" : "object" ,
"properties" : {
"bill_to" : {
"type" : "string" ,
"description" : "The name of person who receives the invoice"
},
"ship_to" : {
"type" : "string" ,
"description" : "The location of the person who receives the invoice"
},
"balance_due" : {
"type" : "number" ,
"description" : "The total balance due"
}
}
}
Response
Unique identifier for the job
Response object containing OCR results and extracted data OCR results from document processing Array of processed pages Raw text content extracted from the page
Length of the extracted content
Name of the processed file
Number of input tokens processed for OCR
Number of output tokens generated for OCR
Processing time in milliseconds
Structured data extracted according to the provided schema
Total number of input tokens used in the extraction
Total number of output tokens generated for the extraction
Confidence intervals for OCR and extracted values Confidence intervals per OCR page Confidence interval for the page
Confidence intervals for extracted structured data
const options = {
method: 'POST' ,
headers: {
'x-api-key' : '<your-api-key>' ,
'Content-Type' : 'application/json' ,
},
body: JSON . stringify ({
url: '<file-url>' ,
templateId: '<template-id>' ,
}),
};
fetch ( 'https://api.getomni.ai/extract/sync' , options )
. then (( response ) => response . json ())
. then (( response ) => console . log ( response ))
. catch (( err ) => console . error ( err ));
{
"jobId" : "550e8400-e29b-41d4-a716-446655440000" ,
"result" : {
"ocr" : {
"pages" : [
{
"page" : 1 ,
"content" : "# Invoice ..." ,
"contentLength" : 698
}
],
"fileName" : "7faf9e7fd6cb4b3dbb4accca979023bb" ,
"inputTokens" : 931 ,
"outputTokens" : 220 ,
"completionTime" : 8593
},
"extracted" : {
"file_type" : "invoice"
},
"inputTokens" : 292 ,
"outputTokens" : 7
}
}