> ## Documentation Index
> Fetch the complete documentation index at: https://docs.somark.cn/llms.txt
> Use this file to discover all available pages before exploring further.

# SoMark Overview

> Overview of SoMark capabilities and core features

# Welcome to SoMark

SoMark converts PDFs, PPTs, images, and many other document formats into machine-readable structured output with high accuracy, high speed, and strong cost efficiency, providing high-quality data for LLM training and RAG applications.

<CardGroup cols={3}>
  <Card title="99% OCR Accuracy" icon="bullseye">
    Industry-leading recognition accuracy with coordinate traceback to pinpoint every element in the source document.
  </Card>

  <Card title="100 Pages in 5 Seconds" icon="bolt">
    High-speed parsing with horizontally scalable cluster deployment for large-scale batch workloads.
  </Card>

  <Card title="Pay As You Go" icon="circle-dollar-to-slot">
    Usage-based billing or one-time licensing. Private deployment starts from a single RTX 3090 GPU.
  </Card>

  <Card title="21 Component Types" icon="layer-group">
    Detects headings, tables, formulas, images, chemical structures, seals, QR codes, and 14 more element types.
  </Card>

  <Card title="Multiple Output Formats" icon="file-export">
    Outputs Markdown, JSON — ready for LLM training pipelines and RAG applications.
  </Card>

  <Card title="Broad Document Coverage" icon="files">
    Supports research papers, reports, whitepapers, contracts, scanned books, government files, and more.
  </Card>
</CardGroup>

## Supported file formats

`pdf` `png` `jpg` `jpeg` `bmp` `tiff` `jp2` `dib` `ppm` `pgm` `pbm` `gif` `heic` `heif` `webp` `xpm` `tga` `dds` `xbm` `doc` `docx` `ppt` `pptx` `xlsx` `xlsm` `xls`

## Recognized document elements

SoMark can recognize these 21 document element types:

| Category                       | Elements                                                                                         |
| ------------------------------ | ------------------------------------------------------------------------------------------------ |
| Text structure                 | Title `title`, text block `text`, header `header`, footer `footer`, footnote `footnote`          |
| Figures and tables             | Figure `figure`, figure caption `figure_caption`, table `table`, table caption `table_caption`   |
| Specialized content            | Equation `equation`, chemical structure `cs`, chemical equation `cs_equation`, code block `code` |
| Navigation and layout          | Sidebar `sider`, table of contents `cate`, TOC entry `cate_item`                                 |
| Education and structured items | Choice item `choice`, fill-in-the-blank `blank`, reference `reference`                           |
| Special elements               | QR code `qrcode`, stamp `stamp`                                                                  |

<div style={{ display: 'grid', gridTemplateColumns: 'repeat(7, minmax(0, 1fr))', gap: '12px' }}>
  <div style={{ border: '1px solid var(--gray-200)', borderRadius: '12px', padding: '12px', textAlign: 'center', display: 'flex', flexDirection: 'column', alignItems: 'center' }}><img src="https://mintcdn.com/soulcode-aa7e5a93/I8QD66YzkFVIEkWW/images/%E6%A0%87%E9%A2%98.png?fit=max&auto=format&n=I8QD66YzkFVIEkWW&q=85&s=9f8a53e551693b954194e176eadb617a" alt="Title" width="28" data-path="images/标题.png" /><div>Title</div></div>
  <div style={{ border: '1px solid var(--gray-200)', borderRadius: '12px', padding: '12px', textAlign: 'center', display: 'flex', flexDirection: 'column', alignItems: 'center' }}><img src="https://mintcdn.com/soulcode-aa7e5a93/I8QD66YzkFVIEkWW/images/%E6%96%87%E5%AD%97%E6%AE%B5.png?fit=max&auto=format&n=I8QD66YzkFVIEkWW&q=85&s=4cf36413a73acc0068b8487823592976" alt="Text block" width="28" data-path="images/文字段.png" /><div>Text block</div></div>
  <div style={{ border: '1px solid var(--gray-200)', borderRadius: '12px', padding: '12px', textAlign: 'center', display: 'flex', flexDirection: 'column', alignItems: 'center' }}><img src="https://mintcdn.com/soulcode-aa7e5a93/I8QD66YzkFVIEkWW/images/%E5%9B%BE%E7%89%87.png?fit=max&auto=format&n=I8QD66YzkFVIEkWW&q=85&s=73ddc445d445494414bb3892417b1875" alt="Figure" width="28" data-path="images/图片.png" /><div>Figure</div></div>
  <div style={{ border: '1px solid var(--gray-200)', borderRadius: '12px', padding: '12px', textAlign: 'center', display: 'flex', flexDirection: 'column', alignItems: 'center' }}><img src="https://mintcdn.com/soulcode-aa7e5a93/I8QD66YzkFVIEkWW/images/%E5%9B%BE%E4%BE%8B.png?fit=max&auto=format&n=I8QD66YzkFVIEkWW&q=85&s=f06418f6495a58c0c5fa6fecf0679748" alt="Figure caption" width="28" data-path="images/图例.png" /><div>Figure caption</div></div>
  <div style={{ border: '1px solid var(--gray-200)', borderRadius: '12px', padding: '12px', textAlign: 'center', display: 'flex', flexDirection: 'column', alignItems: 'center' }}><img src="https://mintcdn.com/soulcode-aa7e5a93/I8QD66YzkFVIEkWW/images/%E8%A1%A8%E6%A0%BC.png?fit=max&auto=format&n=I8QD66YzkFVIEkWW&q=85&s=e3079c324e45dae0d223a7e390753b14" alt="Table" width="28" data-path="images/表格.png" /><div>Table</div></div>
  <div style={{ border: '1px solid var(--gray-200)', borderRadius: '12px', padding: '12px', textAlign: 'center', display: 'flex', flexDirection: 'column', alignItems: 'center' }}><img src="https://mintcdn.com/soulcode-aa7e5a93/I8QD66YzkFVIEkWW/images/%E8%A1%A8%E4%BE%8B.png?fit=max&auto=format&n=I8QD66YzkFVIEkWW&q=85&s=f927e7f04c1571bcabb15fa83156ec96" alt="Table caption" width="28" data-path="images/表例.png" /><div>Table caption</div></div>
  <div style={{ border: '1px solid var(--gray-200)', borderRadius: '12px', padding: '12px', textAlign: 'center', display: 'flex', flexDirection: 'column', alignItems: 'center' }}><img src="https://mintcdn.com/soulcode-aa7e5a93/I8QD66YzkFVIEkWW/images/%E5%85%AC%E5%BC%8F.png?fit=max&auto=format&n=I8QD66YzkFVIEkWW&q=85&s=2f76a543d6710bcc4665512268f222f8" alt="Equation" width="28" data-path="images/公式.png" /><div>Equation</div></div>
  <div style={{ border: '1px solid var(--gray-200)', borderRadius: '12px', padding: '12px', textAlign: 'center', display: 'flex', flexDirection: 'column', alignItems: 'center' }}><img src="https://mintcdn.com/soulcode-aa7e5a93/I8QD66YzkFVIEkWW/images/%E9%A1%B5%E7%9C%89.png?fit=max&auto=format&n=I8QD66YzkFVIEkWW&q=85&s=dcddb88a7b2ed21f3b2735800175f090" alt="Header" width="28" data-path="images/页眉.png" /><div>Header</div></div>
  <div style={{ border: '1px solid var(--gray-200)', borderRadius: '12px', padding: '12px', textAlign: 'center', display: 'flex', flexDirection: 'column', alignItems: 'center' }}><img src="https://mintcdn.com/soulcode-aa7e5a93/I8QD66YzkFVIEkWW/images/%E9%A1%B5%E8%84%9A.png?fit=max&auto=format&n=I8QD66YzkFVIEkWW&q=85&s=225add72be26b3fc9e42272f100cead9" alt="Footer" width="28" data-path="images/页脚.png" /><div>Footer</div></div>
  <div style={{ border: '1px solid var(--gray-200)', borderRadius: '12px', padding: '12px', textAlign: 'center', display: 'flex', flexDirection: 'column', alignItems: 'center' }}><img src="https://mintcdn.com/soulcode-aa7e5a93/I8QD66YzkFVIEkWW/images/%E4%BE%A7%E8%BE%B9%E6%A0%8F.png?fit=max&auto=format&n=I8QD66YzkFVIEkWW&q=85&s=1a31efcbbdc229996ed8502d18a96a73" alt="Sidebar" width="28" data-path="images/侧边栏.png" /><div>Sidebar</div></div>
  <div style={{ border: '1px solid var(--gray-200)', borderRadius: '12px', padding: '12px', textAlign: 'center', display: 'flex', flexDirection: 'column', alignItems: 'center' }}><img src="https://mintcdn.com/soulcode-aa7e5a93/I8QD66YzkFVIEkWW/images/%E8%84%9A%E6%B3%A8.png?fit=max&auto=format&n=I8QD66YzkFVIEkWW&q=85&s=fe78efb097fe7885865c9e24fdcab2de" alt="Footnote" width="28" data-path="images/脚注.png" /><div>Footnote</div></div>
  <div style={{ border: '1px solid var(--gray-200)', borderRadius: '12px', padding: '12px', textAlign: 'center', display: 'flex', flexDirection: 'column', alignItems: 'center' }}><img src="https://mintcdn.com/soulcode-aa7e5a93/I8QD66YzkFVIEkWW/images/%E7%9B%AE%E5%BD%95.png?fit=max&auto=format&n=I8QD66YzkFVIEkWW&q=85&s=7e7939f79f4fb23c1071403baf6f31df" alt="TOC" width="28" data-path="images/目录.png" /><div>TOC</div></div>
  <div style={{ border: '1px solid var(--gray-200)', borderRadius: '12px', padding: '12px', textAlign: 'center', display: 'flex', flexDirection: 'column', alignItems: 'center' }}><img src="https://mintcdn.com/soulcode-aa7e5a93/I8QD66YzkFVIEkWW/images/%E7%9B%AE%E5%BD%95%E6%9D%A1%E7%9B%AE.png?fit=max&auto=format&n=I8QD66YzkFVIEkWW&q=85&s=de3b35db3624c9ab97a4b91929599cd0" alt="TOC entry" width="28" data-path="images/目录条目.png" /><div>TOC entry</div></div>
  <div style={{ border: '1px solid var(--gray-200)', borderRadius: '12px', padding: '12px', textAlign: 'center', display: 'flex', flexDirection: 'column', alignItems: 'center' }}><img src="https://mintcdn.com/soulcode-aa7e5a93/I8QD66YzkFVIEkWW/images/%E9%80%89%E9%A1%B9.png?fit=max&auto=format&n=I8QD66YzkFVIEkWW&q=85&s=657ec593723f803e43e3637ea821e1b8" alt="Choice" width="28" data-path="images/选项.png" /><div>Choice</div></div>
  <div style={{ border: '1px solid var(--gray-200)', borderRadius: '12px', padding: '12px', textAlign: 'center', display: 'flex', flexDirection: 'column', alignItems: 'center' }}><img src="https://mintcdn.com/soulcode-aa7e5a93/I8QD66YzkFVIEkWW/images/%E4%BB%A3%E7%A0%81%E6%AE%B5.png?fit=max&auto=format&n=I8QD66YzkFVIEkWW&q=85&s=48574c42391c2117bce29ee62e0a099f" alt="Code block" width="28" data-path="images/代码段.png" /><div>Code block</div></div>
  <div style={{ border: '1px solid var(--gray-200)', borderRadius: '12px', padding: '12px', textAlign: 'center', display: 'flex', flexDirection: 'column', alignItems: 'center' }}><img src="https://mintcdn.com/soulcode-aa7e5a93/I8QD66YzkFVIEkWW/images/%E5%A1%AB%E7%A9%BA%E7%A9%BA%E7%99%BD.png?fit=max&auto=format&n=I8QD66YzkFVIEkWW&q=85&s=0b5e31383fa659098fad9539a76a49e1" alt="Blank" width="28" data-path="images/填空空白.png" /><div>Blank</div></div>
  <div style={{ border: '1px solid var(--gray-200)', borderRadius: '12px', padding: '12px', textAlign: 'center', display: 'flex', flexDirection: 'column', alignItems: 'center' }}><img src="https://mintcdn.com/soulcode-aa7e5a93/I8QD66YzkFVIEkWW/images/%E5%8F%82%E8%80%83%E6%96%87%E7%8C%AE.png?fit=max&auto=format&n=I8QD66YzkFVIEkWW&q=85&s=757bf1aac35041ed83899c850045c8f9" alt="Reference" width="28" data-path="images/参考文献.png" /><div>Reference</div></div>
  <div style={{ border: '1px solid var(--gray-200)', borderRadius: '12px', padding: '12px', textAlign: 'center', display: 'flex', flexDirection: 'column', alignItems: 'center' }}><img src="https://mintcdn.com/soulcode-aa7e5a93/I8QD66YzkFVIEkWW/images/%E4%BA%8C%E7%BB%B4%E7%A0%81.png?fit=max&auto=format&n=I8QD66YzkFVIEkWW&q=85&s=ab5f1acb8cbc1bcaa8e69bf9711a0e25" alt="QR code" width="28" data-path="images/二维码.png" /><div>QR code</div></div>
  <div style={{ border: '1px solid var(--gray-200)', borderRadius: '12px', padding: '12px', textAlign: 'center', display: 'flex', flexDirection: 'column', alignItems: 'center' }}><img src="https://mintcdn.com/soulcode-aa7e5a93/I8QD66YzkFVIEkWW/images/%E5%8D%B0%E7%AB%A0.png?fit=max&auto=format&n=I8QD66YzkFVIEkWW&q=85&s=866c073e09c95f03f7215126d8cf025e" alt="Stamp" width="28" data-path="images/印章.png" /><div>Stamp</div></div>
  <div style={{ border: '1px solid var(--gray-200)', borderRadius: '12px', padding: '12px', textAlign: 'center', display: 'flex', flexDirection: 'column', alignItems: 'center' }}><img src="https://mintcdn.com/soulcode-aa7e5a93/I8QD66YzkFVIEkWW/images/%E5%8C%96%E5%AD%A6%E7%BB%93%E6%9E%84%E5%BC%8F.png?fit=max&auto=format&n=I8QD66YzkFVIEkWW&q=85&s=1c90a25d538bfc8d44dc9744b5490a20" alt="Chemical structure" width="28" data-path="images/化学结构式.png" /><div>Chemical structure</div></div>
  <div style={{ border: '1px solid var(--gray-200)', borderRadius: '12px', padding: '12px', textAlign: 'center', display: 'flex', flexDirection: 'column', alignItems: 'center' }}><img src="https://mintcdn.com/soulcode-aa7e5a93/I8QD66YzkFVIEkWW/images/%E5%8C%96%E5%AD%A6%E6%96%B9%E7%A8%8B%E5%BC%8F.png?fit=max&auto=format&n=I8QD66YzkFVIEkWW&q=85&s=6e181921c9cf0375b93b360360ee90b1" alt="Chemical equation" width="28" data-path="images/化学方程式.png" /><div>Chemical equation</div></div>
</div>

## Get Started

See the [Quickstart Guide](/en/documentation/get-started-overview) to begin; if you want to inspect API capability and limits first, jump to the [API overview](/en/api-reference/index), and use [FAQs](/en/qa) for common questions.
