Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.grigori.in/llms.txt

Use this file to discover all available pages before exploring further.

Overview

The Document Converter API provides RESTful endpoints for uploading, converting, and retrieving documents. All endpoints return JSON responses and use standard HTTP status codes.
Base URL: http://localhost:8000/api/v1Content-Type: application/json for responses, multipart/form-data for file uploads

Authentication

The current version does not require authentication. For production deployments, implement authentication middleware.

Jobs API

Create Job

curl -X POST "http://localhost:8000/api/v1/jobs" \
  -H "Content-Type: multipart/form-data" \
  -F "[email protected]" \
  -F "output_format=md" \
  -F "webhook_url=https://your-site.com/webhook"
Request Parameters:
file
file
required
The document file to convert. See supported formats.
output_format
string
required
Output format: md for Markdown or json for structured JSON.
webhook_url
string
URL to receive job completion notifications.
use_ocr
boolean
default:"false"
Enable OCR processing for images and scanned documents.
ocr_provider
string
default:"paddle"
OCR provider: paddle, easyocr, or mistral.
Response:
{
  "id": "123e4567-e89b-12d3-a456-426614174000",
  "status": "pending",
  "progress": 0,
  "file_name": "document.pdf",
  "output_format": "md",
  "use_ocr": false,
  "created_at": "2024-01-15T10:00:00Z",
  "updated_at": "2024-01-15T10:00:00Z"
}

Get Job Status

curl "http://localhost:8000/api/v1/jobs/123e4567-e89b-12d3-a456-426614174000"
Response:
{
  "id": "123e4567-e89b-12d3-a456-426614174000",
  "status": "completed",
  "progress": 100,
  "file_name": "document.pdf",
  "output_format": "md",
  "use_ocr": false,
  "created_at": "2024-01-15T10:00:00Z",
  "updated_at": "2024-01-15T10:05:00Z",
  "completed_at": "2024-01-15T10:05:00Z"
}

Download Result

curl "http://localhost:8000/api/v1/jobs/123e4567-e89b-12d3-a456-426614174000/result" \
  -o converted_document.md

Health API

Health Check

curl "http://localhost:8000/api/v1/health"
Response:
{
  "status": "healthy",
  "timestamp": "2024-01-15T10:30:00Z",
  "version": "1.0.0"
}

Readiness Check

curl "http://localhost:8000/api/v1/ready"
Response:
{
  "status": "ready",
  "timestamp": "2024-01-15T10:30:00Z",
  "services": {
    "redis": true,
    "celery": true
  }
}

File Download Behavior

When downloading converted files via the /api/v1/jobs/{job_id}/result endpoint, the filename is automatically generated based on the original filename and output format:

Filename Generation Rules

1

Original extension is removed

The file extension from the uploaded file is stripped
2

New extension is added

The appropriate extension for the output format is added
3

Base name is preserved

The original filename (without extension) is kept intact

Examples

Original FilenameOutput FormatDownload Filename
document.pdfmddocument.md
presentation.pptxmdpresentation.md
spreadsheet.xlsxjsonspreadsheet.json
My Report (v2).docxmdMy Report (v2).md
data-2024.csvjsondata-2024.json

Supported Output Formats

Markdown (.md)

Human-readable markdown with embedded images as base64

JSON (.json)

Structured data with content, metadata, and base64-encoded images

Response Headers

The download response includes appropriate headers:
  • Content-Type:
    • text/markdown for .md files
    • application/json for .json files
  • Content-Disposition: attachment; filename="generated_filename"

Error Responses

The request was invalid or cannot be served.
{
  "detail": "Job is not completed yet"
}
The requested resource was not found.
{
  "detail": "Job not found"
}
The uploaded file exceeds the maximum size limit.
{
  "detail": "File too large. Maximum size is 100MB"
}
The request was well-formed but contains semantic errors.
{
  "detail": [
    {
      "loc": ["body", "output_format"],
      "msg": "Invalid output format",
      "type": "value_error"
    }
  ]
}
An unexpected error occurred on the server.
{
  "detail": "Internal server error"
}

Rate Limiting

Rate limiting is not implemented in the current version. For production deployments, implement rate limiting middleware.

Webhook Notifications

When a job completes (successfully or with failure), a webhook notification is sent to the specified URL:
{
  "job_id": "123e4567-e89b-12d3-a456-426614174000",
  "status": "completed",
  "progress": 100,
  "created_at": "2024-01-15T10:00:00Z",
  "updated_at": "2024-01-15T10:05:00Z",
  "completed_at": "2024-01-15T10:05:00Z",
  "result_url": "http://localhost:8000/api/v1/jobs/123e4567-e89b-12d3-a456-426614174000/result",
  "metadata": {}
}

Example Workflow

Here’s a complete example of converting a document:
1

Upload Document

curl -X POST "http://localhost:8000/api/v1/jobs" \
  -F "[email protected]" \
  -F "output_format=md"
Response: {"id": "job_123", "status": "pending", ...}
2

Check Status

curl "http://localhost:8000/api/v1/jobs/job_123"
Response: {"status": "processing", "progress": 50, ...}
3

Download Result

curl "http://localhost:8000/api/v1/jobs/job_123/result" -o presentation.md
The converted document is saved as presentation.md