DocLang aims to make documents readable by AI, not humans
AIs struggle to understand documents designed for humans; the DocLang working group seeks to flip that imbalance with its specification for machine-readable business documents “built from the ground up for LLM tokenizers.” The working group, founded by IBM, Nvidia, and Red Hat and hosted by the Linux Foundation’s LF AI & Data project, aims to create an open, universal, AI-native document format designed to improve how enterprises prepare, exchange, and govern document data for AI systems. ABBYY and Human Signal will also be involved in its development, and other contributors are welcome. “Enterprises today work across a fragmented landscape of document formats, including PDFs, JPEGs, and other file types built primarily for human consumption rather than AI interpretation,” the group said in its launch announcement. “This disconnect can introduce complexity, raise costs, and reduce reliability when extracting meaning from business documents,” as organizations increasingly rely on genera