3.8 KiB
3.8 KiB
Arabic OCR
Arabic OCR powered by a local Ollama vision model. A single universal prompt handles all document types — handwritten text, certificates, IDs, tables, forms, and printed Arabic — and extracts structured text from PDFs and images.
Requirements
- Python 3.10+
- Ollama running locally or on a reachable host
poppler-utilssystem package (for PDF rendering only)
Setup
Linux / macOS
python3 -m venv arabic_ocr_env
source arabic_ocr_env/bin/activate
pip install -r requirements.txt
sudo apt-get install -y poppler-utils # macOS: brew install poppler
Windows
python -m venv arabic_ocr_env
arabic_ocr_env\Scripts\activate
pip install -r requirements.txt
# Download poppler for Windows: https://github.com/oschwartz10612/poppler-windows/releases
# Extract and pass the bin\ path to the script with --poppler
Pulling a Model
From Ollama registry
ollama pull qwen2.5vl:7b
List the available qwen2.5vl tags:
curl -s "https://ollama.com/library/qwen2.5vl/tags" | grep -oP 'qwen2.5vl:[a-zA-Z0-9._-]+' | sort -u
From Hugging Face (GGUF)
Ollama can pull GGUF models directly from Hugging Face. Use the hf.co/ prefix with the repo path:
# Pull the default quant from a HF repo
ollama run hf.co/bartowski/Qwen2.5-VL-7B-Instruct-GGUF
# Pick a specific quantization (append as a tag)
ollama run hf.co/bartowski/Qwen2.5-VL-7B-Instruct-GGUF:Q4_K_M
ollama run hf.co/bartowski/Qwen2.5-VL-7B-Instruct-GGUF:Q8_0
Common quantization suffixes: Q4_K_M (good balance), Q5_K_M (better quality), Q8_0 (near-lossless), IQ3_M (small/fast).
From a direct GGUF URL (Modelfile)
Create a Modelfile pointing at any GGUF URL:
FROM https://huggingface.co/bartowski/Qwen2.5-VL-7B-Instruct-GGUF/resolve/main/Qwen2.5-VL-7B-Instruct-Q4_K_M.gguf
Then build and run it:
ollama create my-arabic-ocr -f Modelfile
ollama run my-arabic-ocr
Pass the model name to the script with --model my-arabic-ocr.
Usage
source arabic_ocr_env/bin/activate
# PDF
python arabic_ocr_smart.py document.pdf
# JPEG or PNG image
python arabic_ocr_smart.py scan.jpg
python arabic_ocr_smart.py photo.png
# Write output to a specific file
python arabic_ocr_smart.py document.pdf output.txt
# Use a different Ollama model
python arabic_ocr_smart.py scan.pdf --model llava:13b
# Use a remote Ollama host
python arabic_ocr_smart.py scan.pdf --host http://192.168.1.10:11434
# PDF render resolution (PDF only, default: 300)
python arabic_ocr_smart.py scan.pdf --dpi 150
# Model context window in tokens (default: 12288)
python arabic_ocr_smart.py scan.pdf --ctx 8192
# Seconds to wait between streaming chunks (default: 600)
python arabic_ocr_smart.py scan.pdf --timeout 900
# Windows: point at your poppler bin\ directory (PDF only)
python arabic_ocr_smart.py scan.pdf --poppler "C:\poppler\bin"
Default output is saved as <input>_ocr.txt alongside the input file.
Options
| Flag | Default | Description |
|---|---|---|
input |
— | Input file: PDF, JPEG, or PNG (required) |
output |
<input>_ocr.txt |
Output text file (optional positional argument) |
--host |
http://192.168.122.1:11434 |
Ollama host URL |
--model |
qwen2.5vl:7b |
Ollama model name |
--dpi |
300 |
PDF render resolution (PDF only) |
--ctx |
12288 |
Model context window in tokens |
--timeout |
600 |
Seconds to wait between streaming chunks |
--poppler |
— | Path to poppler bin/ directory (Windows only) |
Output Format
Pages are separated by === headers:
============================================================
Page 1
============================================================
نوع الشهادة: ...
اسم الجهة المانحة: ...
...