Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.grigori.in/llms.txt

Use this file to discover all available pages before exploring further.

Overview

The Jobs API handles document conversion requests, allowing you to upload files, track conversion progress, and download results.

Create Job

Request

Content-Type: multipart/form-data
file
file
required
The document file to convert. Must be less than 100MB.
output_format
string
required
Output format for the converted document.Options:
  • md - Markdown format
  • json - Structured JSON format
webhook_url
string
URL to receive job completion notifications.Format: Must be a valid HTTP/HTTPS URL
use_ocr
boolean
default:"false"
Enable OCR processing for images and scanned documents.
ocr_provider
string
default:"paddle"
OCR provider to use when use_ocr is enabled.Options:
  • paddle - PaddleOCR (recommended)
  • easyocr - EasyOCR
  • mistral - Mistral AI (requires API key)

Response

id
string
required
Unique job identifier
status
string
required
Current job statusValues: pending, processing, completed, failed
progress
integer
required
Job progress percentage (0-100)
file_name
string
required
Original filename
output_format
string
required
Requested output format
use_ocr
boolean
required
Whether OCR is enabled
ocr_provider
string
OCR provider used (when OCR is enabled)
created_at
string
required
Job creation timestamp (ISO 8601)
updated_at
string
required
Last update timestamp (ISO 8601)
webhook_url
string
Webhook URL for notifications
curl -X POST "http://localhost:8000/api/v1/jobs" \
  -F "[email protected]" \
  -F "output_format=md" \
  -F "webhook_url=https://your-site.com/webhook"

Response Example

{
  "id": "123e4567-e89b-12d3-a456-426614174000",
  "status": "pending",
  "progress": 0,
  "file_name": "document.pdf",
  "output_format": "md",
  "use_ocr": false,
  "ocr_provider": "paddle",
  "created_at": "2024-01-15T10:00:00Z",
  "updated_at": "2024-01-15T10:00:00Z",
  "webhook_url": "https://your-site.com/webhook"
}

Get Job Status

Request

job_id
string
required
The unique identifier of the job

Response

id
string
required
Job identifier
status
string
required
Current job statusValues:
  • pending - Job queued for processing
  • processing - Currently being converted
  • completed - Conversion finished successfully
  • failed - Conversion failed
progress
integer
required
Completion percentage (0-100)
file_name
string
required
Original filename
output_format
string
required
Output format
created_at
string
required
Job creation timestamp
updated_at
string
required
Last update timestamp
completed_at
string
Completion timestamp (only when status is completed or failed)
error_message
string
Error details (only when status is failed)
metadata
object
Additional job metadata
curl "http://localhost:8000/api/v1/jobs/123e4567-e89b-12d3-a456-426614174000"

Response Examples

{
  "id": "123e4567-e89b-12d3-a456-426614174000",
  "status": "pending",
  "progress": 0,
  "file_name": "document.pdf",
  "output_format": "md",
  "use_ocr": false,
  "created_at": "2024-01-15T10:00:00Z",
  "updated_at": "2024-01-15T10:00:00Z"
}

Download Result

Request

job_id
string
required
The unique identifier of the completed job

Response

The response is the converted document file with appropriate headers:
Content-Type
header
MIME type based on output formatValues:
  • text/markdown; charset=utf-8 for Markdown
  • application/json; charset=utf-8 for JSON
Content-Disposition
header
Filename for downloadFormat: attachment; filename="document.md"
curl "http://localhost:8000/api/v1/jobs/123e4567-e89b-12d3-a456-426614174000/result" \
  -o converted_document.md

Error Responses

400
error
Bad Request - Job is not completed yet
{
  "detail": "Job is not completed yet"
}
404
error
Not Found - Job doesn’t exist or result file is missing
{
  "detail": "Job not found"
}

Job Status States

1

pending

Job has been created and queued for processing
2

processing

Document is currently being converted
3

completed

Conversion finished successfully, result is available
4

failed

Conversion failed, check error_message for details

Polling for Completion

import requests
import time

def wait_for_completion(job_id, timeout=300):
    """Wait for job completion with timeout"""
    start_time = time.time()
    
    while time.time() - start_time < timeout:
        response = requests.get(f"http://localhost:8000/api/v1/jobs/{job_id}")
        job = response.json()
        
        if job['status'] == 'completed':
            return job
        elif job['status'] == 'failed':
            raise Exception(f"Job failed: {job.get('error_message', 'Unknown error')}")
        
        print(f"Progress: {job['progress']}%")
        time.sleep(2)
    
    raise TimeoutError(f"Job {job_id} did not complete within {timeout} seconds")

# Usage
job_id = "123e4567-e89b-12d3-a456-426614174000"
completed_job = wait_for_completion(job_id)
print(f"Job completed: {completed_job['id']}")

Rate Limiting

Rate limiting is not implemented in the current version. For production deployments, implement appropriate rate limiting middleware.

File Size Limits

Default limit: 100MB per fileConfiguration: Set MAX_FILE_SIZE environment variable (in bytes)
Files exceeding the limit will return a 413 Request Entity Too Large error.

Supported File Types

Documents

PDF, DOCX, RTF, TXT, MD

Presentations

PPTX, PPTM, POTX, POTM

Spreadsheets

XLSX, XLSM, XLS, CSV

Images

PNG, JPG, GIF, BMP, WebP, TIFF

Web

HTML, XML

Data

JSON, CSV

Next Steps

Health API

Check API health and readiness

Error Handling

Handle API errors properly

Webhooks

Set up real-time notifications

Examples

See complete integration examples