Tesi etd-11262022-022333

Tipo di tesi

Tesi di laurea magistrale

Autore

VITI, ELENA

URN

etd-11262022-022333

Titolo

Methods for integrating survey data and big non-survey data

Dipartimento

ECONOMIA E MANAGEMENT

Corso di studi

ECONOMICS

Relatori

relatore Prof.ssa Giusti, Caterina
relatore Prof. Münnich, Ralph
correlatore Dott. Straubinger, Johannes

Parole chiave

big data
calibration
common variables
data integration
mass imputation
regression estimator

Data inizio appello

16/12/2022

Consultabilità

Tesi non consultabile

Riassunto

The purpose of this thesis is to investigate methods for integrating data from different surveys. Data integration is a broad topic associated with many different statistical techniques. This thesis addresses the topic systematically by first proposing integration techniques for probability samples and then integration techniques between probability sample and big data. First, the determining conditions for choosing the technique, sample type and available information, are presented. Then a suitable statistical technique is selected and applied for survey integration.
For a more depth analysis the selected statistical technique is applied to different models, in the case of integration between two probability samples, or to different types of big data, in the case of integration between a probability sample and big data. At the end, the estimates obtained in each case study will be examined with particular regard to the relationship between the technique applied and different models or different types of big data.
The paper is organized as follows: first the used dataset is presented. The type of sample and the information available in each sample, which constitute the basis for the application of the methodologies, will be selected from the dataset, so an in-depth knowledge of it is essential.
Subsequently the approaches used for integrating probability samples are discussed. In the macro approach the samples are combined to obtain more efficient estimator of the parameter of interest. In the macro approach the aim is creating a single synthetic dataset containing information available in both sample and then use it for estimation. The methods are applied to different types of information available in the samples and different sizes of the samples.
Later statistical techniques used for integrating probability sample and big data are discussed. Some words are spent to explain the argument. A few words are spent on the general characteristics of big data to explain how it was possible to obtain big data starting from the dataset used for the analysis. The proposed method is applied to different types of big data.
For every case simulations are implemented and results are analyzed. Finally, general conclusions are drawn and future research are indicated.

File

Nome file	Dimensione
Tesi non consultabile. Contatta l’autore

ETD

Archivio digitale delle tesi discusse presso l’Università di Pisa

Tesi etd-11262022-022333