arabic-ocr/README.md
Randa 26954bb01f Add README command to list available qwen2.5vl tags
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-27 11:37:11 +04:00

3.8 KiB

Arabic OCR

Arabic OCR powered by a local Ollama vision model. A single universal prompt handles all document types — handwritten text, certificates, IDs, tables, forms, and printed Arabic — and extracts structured text from PDFs and images.

Requirements

  • Python 3.10+
  • Ollama running locally or on a reachable host
  • poppler-utils system package (for PDF rendering only)

Setup

Linux / macOS

python3 -m venv arabic_ocr_env
source arabic_ocr_env/bin/activate
pip install -r requirements.txt
sudo apt-get install -y poppler-utils   # macOS: brew install poppler

Windows

python -m venv arabic_ocr_env
arabic_ocr_env\Scripts\activate
pip install -r requirements.txt
# Download poppler for Windows: https://github.com/oschwartz10612/poppler-windows/releases
# Extract and pass the bin\ path to the script with --poppler

Pulling a Model

From Ollama registry

ollama pull qwen2.5vl:7b

List the available qwen2.5vl tags:

curl -s "https://ollama.com/library/qwen2.5vl/tags" | grep -oP 'qwen2.5vl:[a-zA-Z0-9._-]+' | sort -u

From Hugging Face (GGUF)

Ollama can pull GGUF models directly from Hugging Face. Use the hf.co/ prefix with the repo path:

# Pull the default quant from a HF repo
ollama run hf.co/bartowski/Qwen2.5-VL-7B-Instruct-GGUF

# Pick a specific quantization (append as a tag)
ollama run hf.co/bartowski/Qwen2.5-VL-7B-Instruct-GGUF:Q4_K_M
ollama run hf.co/bartowski/Qwen2.5-VL-7B-Instruct-GGUF:Q8_0

Common quantization suffixes: Q4_K_M (good balance), Q5_K_M (better quality), Q8_0 (near-lossless), IQ3_M (small/fast).

From a direct GGUF URL (Modelfile)

Create a Modelfile pointing at any GGUF URL:

FROM https://huggingface.co/bartowski/Qwen2.5-VL-7B-Instruct-GGUF/resolve/main/Qwen2.5-VL-7B-Instruct-Q4_K_M.gguf

Then build and run it:

ollama create my-arabic-ocr -f Modelfile
ollama run my-arabic-ocr

Pass the model name to the script with --model my-arabic-ocr.

Usage

source arabic_ocr_env/bin/activate

# PDF
python arabic_ocr_smart.py document.pdf

# JPEG or PNG image
python arabic_ocr_smart.py scan.jpg
python arabic_ocr_smart.py photo.png

# Write output to a specific file
python arabic_ocr_smart.py document.pdf output.txt

# Use a different Ollama model
python arabic_ocr_smart.py scan.pdf --model llava:13b

# Use a remote Ollama host
python arabic_ocr_smart.py scan.pdf --host http://192.168.1.10:11434

# PDF render resolution (PDF only, default: 300)
python arabic_ocr_smart.py scan.pdf --dpi 150

# Model context window in tokens (default: 12288)
python arabic_ocr_smart.py scan.pdf --ctx 8192

# Seconds to wait between streaming chunks (default: 600)
python arabic_ocr_smart.py scan.pdf --timeout 900

# Windows: point at your poppler bin\ directory (PDF only)
python arabic_ocr_smart.py scan.pdf --poppler "C:\poppler\bin"

Default output is saved as <input>_ocr.txt alongside the input file.

Options

Flag Default Description
input Input file: PDF, JPEG, or PNG (required)
output <input>_ocr.txt Output text file (optional positional argument)
--host http://192.168.122.1:11434 Ollama host URL
--model qwen2.5vl:7b Ollama model name
--dpi 300 PDF render resolution (PDF only)
--ctx 12288 Model context window in tokens
--timeout 600 Seconds to wait between streaming chunks
--poppler Path to poppler bin/ directory (Windows only)

Output Format

Pages are separated by === headers:

============================================================
Page 1
============================================================

نوع الشهادة: ...
اسم الجهة المانحة: ...
...