deepseek-ai/DeepSeek-OCR
Frontier OCR model exploring optical context compression for LLMs, optimized for document parsing and markdown generation.
Optical context compression for efficient OCR and document understanding
View on HuggingFaceGuide
Overview
DeepSeek-OCR is a frontier OCR model exploring optical context compression for LLMs. It is optimized for document parsing, free-form OCR, and markdown generation from images, and ships with a custom n-gram logits processor for optimal quality.
Prerequisites
- Hardware: Single GPU with >=8 GB VRAM is typically sufficient for BF16 inference.
- vLLM: Current stable release (tested with
uv pip install -U vllm --torch-backend auto). - Python: 3.10+
Install vLLM:
uv venv
source .venv/bin/activate
uv pip install -U vllm --torch-backend auto
Client Usage
Offline OCR (Python)
from vllm import LLM, SamplingParams
from vllm.model_executor.models.deepseek_ocr import NGramPerReqLogitsProcessor
from PIL import Image
llm = LLM(
model="deepseek-ai/DeepSeek-OCR",
enable_prefix_caching=False,
mm_processor_cache_gb=0,
logits_processors=[NGramPerReqLogitsProcessor],
)
image_1 = Image.open("path/to/your/image_1.png").convert("RGB")
image_2 = Image.open("path/to/your/image_2.png").convert("RGB")
prompt = "<image>\nFree OCR."
model_input = [
{"prompt": prompt, "multi_modal_data": {"image": image_1}},
{"prompt": prompt, "multi_modal_data": {"image": image_2}},
]
sampling_param = SamplingParams(
temperature=0.0,
max_tokens=8192,
extra_args=dict(
ngram_size=30,
window_size=90,
whitelist_token_ids={128821, 128822}, # <td>, </td>
),
skip_special_tokens=False,
)
for output in llm.generate(model_input, sampling_param):
print(output.outputs[0].text)
Online OCR serving
vllm serve deepseek-ai/DeepSeek-OCR \
--logits_processors vllm.model_executor.models.deepseek_ocr:NGramPerReqLogitsProcessor \
--no-enable-prefix-caching \
--mm-processor-cache-gb 0
import time
from openai import OpenAI
client = OpenAI(api_key="EMPTY", base_url="http://localhost:8000/v1", timeout=3600)
messages = [
{
"role": "user",
"content": [
{"type": "image_url", "image_url": {"url": "https://ofasys-multimodal-wlcb-3-toshanghai.oss-accelerate.aliyuncs.com/wpf272043/keepme/image/receipt.png"}},
{"type": "text", "text": "Free OCR."},
],
}
]
start = time.time()
response = client.chat.completions.create(
model="deepseek-ai/DeepSeek-OCR",
messages=messages,
max_tokens=2048,
temperature=0.0,
extra_body={
"skip_special_tokens": False,
"vllm_xargs": {
"ngram_size": 30,
"window_size": 90,
"whitelist_token_ids": [128821, 128822],
},
},
)
print(f"Response costs: {time.time() - start:.2f}s")
print(f"Generated text: {response.choices[0].message.content}")
Troubleshooting / Configuration Tips
- Use the custom logits processor along with the model for optimal OCR and markdown generation performance.
- Unlike multi-turn chat, OCR tasks do not typically benefit from prefix caching or image reuse, so disable these features to avoid unnecessary hashing and caching overhead.
- DeepSeek-OCR works better with plain prompts than instruction formats. See the official example prompts.
- Depending on your hardware, adjust
max_num_batched_tokensfor better throughput.