Search notes:

Table extraction

Table extraction tries to infer a table's structure from its presentation and then to convert the data to a structured form.
Because tables come in many different styles, formats and layouts, automated table extraction is a challenge.
With Artificial Intelligence, better results might be achieved.

Subtasks

Table exctraction (TE) comprises three subtasks:
Table detection (TD) Locate the table in a document
Table structure recognition (TSR) Find rows, columns and cells in the table.
Functional analysis (FA) Determine the table data's keys and values.

ICDAR-2013 Dataset

The ICDAR-2013 dataset was the first dataset to address the three table extraction subtasks.
Because of its quality and relative completness comparared to other datasets, it is still popular for benchmarking (although it only consists of 257 tables for TD and TSR, and 92 tables for FA).

TODO

Table-transformer (Github repository) is a deep learning model based on object detection for extracting tables from PDFs and images, first proposed in PubTables-1M: Towards Comprehensive Table Extraction From Unstructured Documents.

See also

table

Index