Tesi etd-11182019-154858

Tipo di tesi

Tesi di laurea magistrale

Autore

SHEK, ABHI

URN

etd-11182019-154858

Titolo

Information extraction from documents through the application of Computer Vision and Deep Learning technologies

Dipartimento

INFORMATICA

Corso di studi

DATA SCIENCE AND BUSINESS INFORMATICS

Relatori

relatore Attardi, Giuseppe

Parole chiave

Tensorflow
RNN
Python
OpenCV
OCR
Keras
CTC
CNN
Tesseract

Data inizio appello

06/12/2019

Consultabilità

Non consultabile

Data di rilascio

06/12/2089

Riassunto

The goal of the thesis project is the development of a solution allowing the extraction of information from documents through the application of Deep Learning technologies. The solution will allow the automation of business processes activated by the information contained in documents, as typically happens in the banking/insurance world. Text recognition in images is a pivotal issue in machine learning. The first two Chapters in this thesis describe the digitization procedure and related issues and some traditional computer vision methods for Optical Character Recognition (OCR). Contrary to popular belief, conventional OCR systems remain a challenging problem in the world of text recognition, especially when the image-document has noisy behavior, diverse font, different contrast, and saturation, etc. In Chapter 3, to overcome information extraction issues, the development of a customized OCR model is presented. To obtain an efficient and high-quality approach, an implementation of a deep learning model, leveraging preprocessing and data augmentation, is described. the combination of the preprocessing and data augmentation with the cognitive solution as deep learning models is described. Deep learning has provided solutions to many pattern recognition problems. In the literature, various approaches have been implemented for OCR. But classic techniques always faces some drawbacks to overcome. Finally, a customized approach is defined to handle this issue. The final OCR model is robust and at the same time does not need human intervention for the preprocessing. Sequence-object detection becomes more reliable with this kind of model, as the model could be feed with different length of text within the image. The model will automatically recognize text in the image.

File

Nome file	Dimensione
Tesi non consultabile. Contatta l’autore

ETD

Archivio digitale delle tesi discusse presso l'Università di Pisa

Tesi etd-11182019-154858