Tesi etd-03252019-110144

Tipo di tesi

Tesi di laurea magistrale

URN

etd-03252019-110144

Titolo

Extraction of Technical Information from Unusual Sources

Dipartimento

INGEGNERIA DELL'ENERGIA, DEI SISTEMI, DEL TERRITORIO E DELLE COSTRUZIONI

Corso di studi

INGEGNERIA GESTIONALE

Relatori

relatore Prof. Fantoni, Gualtiero
correlatore Dott. Chiarello, Filippo

Parole chiave

Database
Information Extraction
Keywords
Natural Language Processing
POS Tagging
Regular Expressions
Soft Skills
Taxonomy
Text mining
Tool

Data inizio appello

02/05/2019

Consultabilità

Non consultabile

Data di rilascio

02/05/2089

Riassunto (Inglese)

Riassunto (Italiano)

Nowadays, society finds itself in the so called “Information Age”, in which the combination of exponential growth of computer capability enabled connectivity among compatible devices and, moreover, resulted in a mass proliferation of data. Connectivity and data have the ability of forging the way people live and work; for these reasons, organizations are starting to undertake actions for collecting, managing, representing, storing and securing all useful data. As mentioned above, a huge amount of information is now available: this is not only accessible in a structured and organized form, but even in an unstructured form such as document and texts. So, the key activity which is being pursued from firms is the comprehension of such amount of knowledge; this task would be a problem with a non-automatic approach. Text Mining comes in help to handle this unstructured nature. Text Mining is the process of extracting valuable and high-quality information from documents. This thesis uses Information Extraction techniques, which have the objective to find structured information from unstructured data and it is a narrower field than Text Mining. The process of Information Extraction (IE) is defined as the automated retrieval of certain information related to a specific topic from one or more bodies of text, and a brief description of its tasks involves collecting, elaborating and representing data in order to expose corresponding results.
The methodology outlined in this thesis is based on the advantages offered by Information Extraction tools, which make online searching more reliable thanks to the collection of a large number of data which are then automatically compared between them; furthermore, this also reduces the partiality of the results. The approach is different from the task that would involve domain experts, such as labor market experts in this case, for the identification of a list of sources. Even if the consultation just described could bring a high confidence result, it is equally much more cost-intensive and time-consuming than the automatic elaboration through Information Extraction tools of such documents.
This information involves not only productive processes, but even strategic decision task, such as Human Resources selection. The digitalization of HR Management has facilitated communication between recruiters and job seekers; jobs can now be accessed by candidates while recruiters can access their online profiles. This new way of sourcing suitable candidates, enables recruiters to concentrate more on the interview, which is the most important recruitment phase and it is still based on human interaction. Besides, the job market is represented by digital information which can be used for large scale computing, and in particular it has opened the way for a computer assisted recruitment process.
The purpose of this thesis is twofold: on the one hand it consists of the researching of the Unusual Sources which contain Technical Information such as knowledge and abilities sought by different firms from employees and, on the other hand, the focus is on the researching of Hard and Soft Skills present in the job applications selected websites. For the first task it was used Scopus Elsevier database in order to search accessible, free and congruent web sources, while for the second one an activity for collecting job resumes was followed by an evaluation of the richness of the occurrence of desired keywords on each website. Furthermore, the thesis work shows the approach for finding this needed Technical Information through the linguistic awareness of word dependencies and relations between each other, defining hyponyms, hypernyms and constructing a taxonomy of terms.

File

Nome file	Dimensione
Tesi non consultabile. Contatta l’autore

ETD

Archivio digitale delle tesi discusse presso l’Università di Pisa

Tesi etd-03252019-110144