ETD

Archivio digitale delle tesi discusse presso l'Università di Pisa

Tesi etd-11182019-154858


Tipo di tesi
Tesi di laurea magistrale
Autore
SHEK, ABHI
URN
etd-11182019-154858
Titolo
Information extraction from documents through the application of Computer Vision and Deep Learning technologies
Dipartimento
INFORMATICA
Corso di studi
DATA SCIENCE AND BUSINESS INFORMATICS
Relatori
relatore Attardi, Giuseppe
Parole chiave
  • Tensorflow
  • RNN
  • Python
  • OpenCV
  • OCR
  • Keras
  • CTC
  • CNN
  • Tesseract
Data inizio appello
06/12/2019
Consultabilità
Non consultabile
Data di rilascio
06/12/2089
Riassunto
The goal of the thesis project is the development of a solution allowing the extraction of information from documents through the application of Deep Learning technologies. The solution will allow the automation of business processes activated by the information contained in documents, as typically happens in the banking/insurance world. Text recognition in images is a pivotal issue in machine learning. The first two Chapters in this thesis describe the digitization procedure and related issues and some traditional computer vision methods for Optical Character Recognition (OCR). Contrary to popular belief, conventional OCR systems remain a challenging problem in the world of text recognition, especially when the image-document has noisy behavior, diverse font, different contrast, and saturation, etc. In Chapter 3, to overcome information extraction issues, the development of a customized OCR model is presented. To obtain an efficient and high-quality approach, an implementation of a deep learning model, leveraging preprocessing and data augmentation, is described. the combination of the preprocessing and data augmentation with the cognitive solution as deep learning models is described. Deep learning has provided solutions to many pattern recognition problems. In the literature, various approaches have been implemented for OCR. But classic techniques always faces some drawbacks to overcome. Finally, a customized approach is defined to handle this issue. The final OCR model is robust and at the same time does not need human intervention for the preprocessing. Sequence-object detection becomes more reliable with this kind of model, as the model could be feed with different length of text within the image. The model will automatically recognize text in the image.
File