Mistral AI on Tuesday released OCR 4, a document intelligence model that returns structured representations of entire documents — complete with bounding boxes, block-type classification, and per-word confidence scores. The release marks the company's fourth generation of optical character recognition technology in roughly 15 months.

The model supports 170 languages across 10 language groups and accepts PDF, DOC, PPT, and OpenDocument formats. It can be deployed as a single container on an organization's own infrastructure, a capability Mistral is positioning directly at enterprises in regulated industries that cannot route sensitive documents through U.S.-jurisdiction cloud APIs.

"Mistral OCR 4 extracts and structures content from a wide range of documents," the company said in its announcement. "Where previous generations focused on converting a page into clean text and tables, OCR 4 returns a structured representation of the document." The model is available immediately through the Mistral API, Document AI in Mistral Studio, and Amazon SageMaker.

The release lands at a moment when Mistral's pitch for European AI sovereignty has never been more commercially relevant. By offering on-premises deployment and avoiding reliance on U.S. cloud providers, the company differentiates itself in a market where compliance and data residency are increasingly critical for financial services, healthcare, and government clients.

While Mistral has rapidly iterated on its OCR capabilities, it faces entrenched competition from established players like Google Cloud's Document AI and Microsoft's Azure AI Document Intelligence, which offer similar structured extraction. Mistral's bet on sovereignty and open-source flexibility may carry weight in Europe, but scaling adoption against cloud giants with deeper enterprise relationships remains a challenge.