| Tesi etd-06222017-112501 | 
    Link copiato negli appunti
  
    Tipo di tesi
  
  
    Tesi di laurea magistrale
  
    Autore
  
  
    RICATTO, MATTIA  
  
    URN
  
  
    etd-06222017-112501
  
    Titolo
  
  
    Towards Effective Use of Massive Cancer Genomic Data with Cluster Computing Frameworks
  
    Dipartimento
  
  
    INGEGNERIA DELL'INFORMAZIONE
  
    Corso di studi
  
  
    INGEGNERIA BIOMEDICA
  
    Relatori
  
  
    relatore Dott. Bechini, Alessio
  
    Parole chiave
  
  - apache spark
- biological data mining
- distributed algorithms
    Data inizio appello
  
  
    14/07/2017
  
    Consultabilità
  
  
    Completa
  
    Riassunto
  
  "Too much information, not enough knowledge" is one of the maxims of these first two decades of the 21th century. Thanks to the technological advances, an unprecedented amounts of data are now available, and these data collections become so large and complex - this is why they are called Big Data - that traditional data processing application software is inadequate to deal with them. Biomedical sciences are already massively contributing to the Big Data revolution, due to advances in genome sequencing technology and digital imaging, growth of clinical data warehouses, increased role of the patient in managing his own health information. In this work, thanks to Apache Spark - a fast and general engine for large-scale data processing, with built-in modules for streaming, SQL, machine learning and graph processing - it has been possible to work with The Cancer Genome Atlas data - a project that aims to catalogue genetic mutations responsible for cancer, using genome sequencing and bioinformatics - in order to develop a scalable and reproducible method for data preparation and data investigation Succesively, such method has been applied in order to investigate Copy Number Variations data with classification algorithms tailored for distribute computing on Apache Spark. The results are encouraging and underline the effectiveness of data mining on biomedical big data.
    File
  
  | Nome file | Dimensione | 
|---|---|
| Tesi_intera.pdf | 5.33 Mb | 
| Contatta l’autore | |
 
		