The Document Converter supports a wide range of input formats and provides flexible output options to meet your needs.Documentation Index
Fetch the complete documentation index at: https://docs.grigori.in/llms.txt
Use this file to discover all available pages before exploring further.
Supported Input Formats
Documents
- PDF: Portable Document Format (.pdf)
- Word: Microsoft Word (.docx)
- RTF: Rich Text Format (.rtf)
- Text: Plain text (.txt), Markdown (.md), Log files
Presentations
- PowerPoint: .pptx, .pptm, .potx, .potm
- OpenDocument: .odp (planned)
- Google Slides: Via export (planned)
Spreadsheets
- Excel: .xlsx, .xlsm, .xls
- CSV: Comma-separated values (.csv)
- OpenDocument: .ods (planned)
Images
- Raster: PNG, JPEG, GIF, BMP, WebP, ICO, TIFF
- Vector: SVG (planned)
- Requires OCR for text extraction
Output Formats
- Markdown
- JSON
Human-readable format with embedded imagesFeatures:Use Cases:
- Preserves document structure with headers
- Maintains formatting (bold, italic, lists)
- Embeds images as base64 data URLs
- Compatible with documentation systems
- Easy to read and edit
- Documentation generation
- Content management systems
- Static site generators
- Wiki systems
- README files
Format-Specific Features
PDF Processing
PDF Processing
Capabilities:
- Multi-page document extraction
- Table detection and extraction
- Image extraction with OCR
- Metadata preservation (author, creation date)
- Bookmark and outline preservation
- Scanned PDFs require OCR
- Complex layouts may need manual review
- Password-protected PDFs not supported
- Some fonts may not render correctly
Word Document Processing
Word Document Processing
Capabilities:
- Heading hierarchy preservation
- Table extraction
- Image and embedded object handling
- Style and formatting preservation
- Comments and track changes (basic)
- Paragraphs and headings
- Lists (bulleted and numbered)
- Tables with headers
- Images and shapes
- Headers and footers
PowerPoint Processing
PowerPoint Processing
Capabilities:
- Slide-by-slide extraction
- Title and content separation
- Speaker notes extraction
- Image and media handling
- Slide layout preservation
- Slide titles and subtitles
- Bullet points and lists
- Text boxes and shapes
- Images and charts
- Tables and diagrams
Excel Processing
Excel Processing
Capabilities:
- Multi-sheet workbook support
- Table and data extraction
- Formula preservation (as text)
- Chart and graph handling
- Metadata extraction
- Header row detection
- Data type inference
- Empty cell handling
- Large dataset sampling
Image Processing
Image Processing
Capabilities:
- OCR text extraction
- Image format standardization
- Compression and optimization
- Metadata preservation
- Multi-language support
- PaddleOCR: High accuracy, 80+ languages
- EasyOCR: Simple setup, good performance
- Mistral AI: AI-powered, context-aware
Conversion Quality
High Quality
Text documents, Modern Office files99%+ accuracy for text extraction
Good Quality
PDFs with standard fonts, Simple layouts95%+ accuracy with proper formatting
Variable Quality
Scanned documents, Complex layouts, ImagesDepends on OCR quality and image resolution
Best Practices
Choose the Right Format
- Use Markdown for documentation and content management
- Use JSON for data processing and API integration
- Consider your downstream processing needs
Optimize for OCR
- Use high-resolution images (300+ DPI)
- Ensure good contrast and lighting
- Avoid skewed or rotated text
- Choose appropriate OCR provider
Handle Large Files
- Monitor file size limits
- Consider chunking very large documents
- Use appropriate timeout settings
- Implement progress monitoring
File Size Limits
| File Type | Recommended Size | Maximum Size |
|---|---|---|
| Text files | < 10MB | 100MB |
| PDFs | < 50MB | 100MB |
| Images | < 20MB | 100MB |
| Office files | < 30MB | 100MB |
Performance Considerations
Processing Time
- Text files: < 1 second
- Simple PDFs: 2-10 seconds
- Complex documents: 30-60 seconds
- OCR processing: 1-5 minutes
Resource Usage
- CPU: High during OCR processing
- Memory: 500MB-2GB per job
- Storage: 2-3x original file size
- Network: Minimal for local processing
Error Handling
Common conversion errors and solutions:Unsupported Format
Unsupported Format
Error: “No suitable converter found”Solution:
- Check file extension and MIME type
- Verify file is not corrupted
- Convert to supported format first
Corrupted File
Corrupted File
Error: “Failed to read file”Solution:
- Verify file integrity
- Try opening in original application
- Re-export or re-save the file
Memory Issues
Memory Issues
Error: “Out of memory”Solution:
- Reduce file size
- Increase system memory
- Process in smaller chunks
OCR Failures
OCR Failures
Error: “OCR extraction failed”Solution:
- Try different OCR provider
- Improve image quality
- Check language settings
Integration Examples
Next Steps
Configure OCR
Set up OCR providers for image processing
Output Formats
Learn about JSON structure and Markdown features
Webhooks
Set up real-time notifications
API Reference
Explore the complete API documentation