Tesi etd-11262022-022333 |
Link copiato negli appunti
Tipo di tesi
Tesi di laurea magistrale
Autore
VITI, ELENA
URN
etd-11262022-022333
Titolo
Methods for integrating survey data and big non-survey data
Dipartimento
ECONOMIA E MANAGEMENT
Corso di studi
ECONOMICS
Relatori
relatore Prof.ssa Giusti, Caterina
relatore Prof. Münnich, Ralph
correlatore Dott. Straubinger, Johannes
relatore Prof. Münnich, Ralph
correlatore Dott. Straubinger, Johannes
Parole chiave
- big data
- calibration
- common variables
- data integration
- mass imputation
- regression estimator
Data inizio appello
16/12/2022
Consultabilità
Tesi non consultabile
Riassunto
The purpose of this thesis is to investigate methods for integrating data from different surveys. Data integration is a broad topic associated with many different statistical techniques. This thesis addresses the topic systematically by first proposing integration techniques for probability samples and then integration techniques between probability sample and big data. First, the determining conditions for choosing the technique, sample type and available information, are presented. Then a suitable statistical technique is selected and applied for survey integration.
For a more depth analysis the selected statistical technique is applied to different models, in the case of integration between two probability samples, or to different types of big data, in the case of integration between a probability sample and big data. At the end, the estimates obtained in each case study will be examined with particular regard to the relationship between the technique applied and different models or different types of big data.
The paper is organized as follows: first the used dataset is presented. The type of sample and the information available in each sample, which constitute the basis for the application of the methodologies, will be selected from the dataset, so an in-depth knowledge of it is essential.
Subsequently the approaches used for integrating probability samples are discussed. In the macro approach the samples are combined to obtain more efficient estimator of the parameter of interest. In the macro approach the aim is creating a single synthetic dataset containing information available in both sample and then use it for estimation. The methods are applied to different types of information available in the samples and different sizes of the samples.
Later statistical techniques used for integrating probability sample and big data are discussed. Some words are spent to explain the argument. A few words are spent on the general characteristics of big data to explain how it was possible to obtain big data starting from the dataset used for the analysis. The proposed method is applied to different types of big data.
For every case simulations are implemented and results are analyzed. Finally, general conclusions are drawn and future research are indicated.
For a more depth analysis the selected statistical technique is applied to different models, in the case of integration between two probability samples, or to different types of big data, in the case of integration between a probability sample and big data. At the end, the estimates obtained in each case study will be examined with particular regard to the relationship between the technique applied and different models or different types of big data.
The paper is organized as follows: first the used dataset is presented. The type of sample and the information available in each sample, which constitute the basis for the application of the methodologies, will be selected from the dataset, so an in-depth knowledge of it is essential.
Subsequently the approaches used for integrating probability samples are discussed. In the macro approach the samples are combined to obtain more efficient estimator of the parameter of interest. In the macro approach the aim is creating a single synthetic dataset containing information available in both sample and then use it for estimation. The methods are applied to different types of information available in the samples and different sizes of the samples.
Later statistical techniques used for integrating probability sample and big data are discussed. Some words are spent to explain the argument. A few words are spent on the general characteristics of big data to explain how it was possible to obtain big data starting from the dataset used for the analysis. The proposed method is applied to different types of big data.
For every case simulations are implemented and results are analyzed. Finally, general conclusions are drawn and future research are indicated.
File
Nome file | Dimensione |
---|---|
Tesi non consultabile. |