Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.grigori.in/llms.txt

Use this file to discover all available pages before exploring further.

Prerequisites

Before getting started, ensure you have the following installed:

Python 3.12+

Required for running the application

Redis Server

Used for task queuing and job storage

Docker (Optional)

For containerized deployment

uv Package Manager

Recommended for fast dependency management
The fastest way to get started is using Docker Compose:
1

Clone the repository

git clone https://github.com/pratyush618/doc_loader.git
cd doc-converter
2

Configure environment

cp .env.example .env
Edit the .env file with your configuration:
# Redis Configuration
REDIS_URL=redis://redis:6379/0

# OCR Configuration (optional)
MISTRAL_API_KEY=your_mistral_api_key_here
EASY_OCR_USE_GPU=false

# File Storage
MAX_FILE_SIZE=104857600  # 100MB
3

Start the services

docker-compose up -d
This will start:
  • API server on port 8000
  • Redis server
  • Celery worker for processing
4

Verify the setup

curl http://localhost:8000/api/v1/health
You should see:
{
  "status": "healthy",
  "timestamp": "2024-01-15T10:30:00Z",
  "version": "1.0.0"
}

Option 2: Development Setup with uv

For local development, we recommend using uv for fast package management:
1

Install uv

winget install --id=astral-sh.uv -e
2

Set up the project

git clone https://github.com/pratyush618/doc_loader.git
cd doc-converter
uv sync  # Creates .venv and installs dependencies
3

Start Redis

# macOS
brew services start redis

# Linux
sudo systemctl start redis

# Windows
# Download and install Redis from https://redis.io/download
4

Start the API server

uv run python run_api.py
The API will be available at http://localhost:8000
5

Start the worker (new terminal)

uv run python run_worker.py
This starts the Celery worker for processing jobs

Option 3: Manual Installation

If you prefer not to use uv:
1

Install dependencies

pip install -r requirements.txt
2

Set up environment

cp .env.example .env
# Edit .env file as needed
3

Start Redis

redis-server
4

Start the services

# Terminal 1: API Server
python run_api.py

# Terminal 2: Worker
python run_worker.py

Your First Conversion

Once everything is running, let’s convert your first document:
curl -X POST "http://localhost:8000/api/v1/jobs" \
  -H "Content-Type: multipart/form-data" \
  -F "[email protected]" \
  -F "output_format=md"

API Endpoints

Submit Job

POST /api/v1/jobsUpload a file for conversion

Get Job Status

GET /api/v1/jobs/{job_id}Check conversion progress

Download Result

GET /api/v1/jobs/{job_id}/resultGet converted document

Health Check

GET /api/v1/healthCheck API health status

Configuration Options

Key environment variables you can configure:
# Application
APP_NAME=doc-converter
DEBUG=false

# API
API_HOST=0.0.0.0
API_PORT=8000

# Redis
REDIS_URL=redis://localhost:6379/0
# Storage paths
UPLOAD_DIR=./uploads
OUTPUT_DIR=./outputs
MAX_FILE_SIZE=104857600  # 100MB
FILE_TTL_HOURS=24
# OCR Providers
MISTRAL_API_KEY=your_api_key_here
EASY_OCR_USE_GPU=false
EASY_OCR_LANG=en
PADDLE_OCR_USE_GPU=false
# Webhooks
DEFAULT_WEBHOOK_URL=https://your-site.com/webhook
WEBHOOK_TIMEOUT=30
WEBHOOK_MAX_RETRIES=3

Testing Your Setup

Run the health check to ensure everything is working:
curl http://localhost:8000/api/v1/ready
Expected response:
{
  "status": "ready",
  "timestamp": "2024-01-15T10:30:00Z",
  "services": {
    "redis": true,
    "celery": true
  }
}
You’re all set! Your Document Converter is ready to process files. Check out the API Reference for detailed endpoint documentation.

Next Steps

Explore Features

Learn about supported formats and conversion options

Configure OCR

Set up OCR providers for image and PDF processing

Set Up Webhooks

Configure real-time notifications for job completion

Production Deployment

Deploy to production with scaling and monitoring

Troubleshooting

If you see Redis connection errors:
  1. Ensure Redis is running: redis-cli ping
  2. Check the Redis URL in your .env file
  3. Verify firewall settings if using remote Redis
If OCR providers fail to initialize:
  1. For EasyOCR: Ensure PyTorch is properly installed
  2. For PaddleOCR: Check model downloads in ./models/
  3. For Mistral: Verify your API key in .env
If file uploads fail:
  1. Check file size limits in configuration
  2. Verify upload directory permissions
  3. Ensure supported file format