AI Lund Lunch seminar: Reading Älvsborg’s Ransome: How to turn 16th century hand-written tax records in to structured economic information

Webinar
by AI Lund
2022-03-02 12:002022-03-02 13:15

We  present a system for extracting tabular information from loosely structured handwritten documents. The  system  consists  of  three parts, 

  • a u-net like CNN-basedmethod  for  text  detection  and  segmentation,
  • an attention-based method for simultaneous text recognition and classification of word-parts, and
  • a  method for matching the word parts into a tabular structure for each entry.

A key contribution is the observation that the attention-based recognition and classification module makes it possible for improved spatial analysis of the tabular information. The method is evaluated on a unique historical document: The Swedish Wealth Tax of 1571, consisting of 11,453 pages of hand-written tax records. The evaluation shows that the system provides a significant improvement to the state-of-the-art to the problem of tabular extraction from loosely structured historical documents.

Attributes

Language, Vision