logo SBA

ETD

Archivio digitale delle tesi discusse presso l’Università di Pisa

Tesi etd-03302026-171514


Tipo di tesi
Tesi di laurea magistrale
URN
etd-03302026-171514
Titolo
Design and Secure Deployment of Hierarchical Infrastructures: The ODA Case Study
Dipartimento
INGEGNERIA DELL'INFORMAZIONE
Corso di studi
CYBERSECURITY
Parole chiave
  • ansible
  • apache kafka
  • cloud computing
  • data replication
  • data streaming
  • edge computing
  • hierarchical architecture
  • infrastructure as code
  • kafka mirrormaker
  • mitre att&ck
  • security-as-code
  • security-by-design
  • stride
  • threat modeling
  • zero trust
Data inizio appello
15/04/2026
Consultabilità
Completa
Riassunto (Inglese)
The exponential growth of Industry 4.0 and the Internet of Things has catalyzed a paradigm shift from centralized cloud infrastructures to distributed edge computing. Modern data streaming applications, such as the Observable Data Access (ODA) platform developed within the NEST project, require hierarchical Edge-to-Cloud architectures to scale dependably across geographically dispersed locations. However, distributing these microservice-based applications across untrusted transport networks and Zero Trust environments introduces critical security vulnerabilities. Transitioning from a localized, implicitly trusted deployment to a wide-area ecosystem exposes the infrastructure to severe threats, ranging from network eavesdropping to advanced local administrative compromises.

To address these challenges without altering the underlying microservice logic, this thesis proposes a comprehensive Security-as-Code methodology. Operating under the constraint that the baseline ODA application must be treated as a trusted black box, the research delegates security enforcement entirely to the infrastructure and transport layers. This strategy leverages the Infrastructure as Code (IaC) paradigm, utilizing Ansible not merely as a standard configuration provisioning tool, but as a continuous, declarative security orchestrator capable of autonomously maintaining the desired security state across the entire distributed fleet.

The architectural foundation relies on Apache Kafka and Kafka MirrorMaker 2 to establish the data replication pipelines. The study formalizes and evaluates two distinct replication topologies: the Child-Push architecture, characterized by decentralized replication governance where edge nodes actively push data to the core, and the Root-Pull architecture, which centralizes governance by having the core actively pull data from the edge. To systematically evaluate the security posture of these paradigms, a macro-architectural Threat Modeling phase is conducted using the STRIDE and MITRE ATT&CK frameworks. By profiling three specific adversarial personas (the Network Adversary, the Malicious Observer, and the Compromised Administrator) the analysis maps the vulnerabilities inherent to each topology and defines a strategic baseline of actionable mitigations.

These theoretical mitigations are subsequently translated into an automated Proof of Concept. The declarative Ansible repository is engineered to dynamically scaffold the infrastructure, enforcing security at multiple levels. To mitigate the Network Adversary, the pipeline automates the generation and distribution of an internal Public Key Infrastructure, establishing mutual TLS (mTLS) for all inter-node communications. To protect against the Malicious Observer, sensitive deployment artifacts and cryptographic materials are isolated at the host level through POSIX file system hardening and Ansible Vault encryption. Furthermore, to contain the blast radius of a Compromised Administrator, the orchestrator dynamically injects fine-grained Kafka Access Control Lists (ACLs) and network bandwidth quotas directly into the application layer. Crucially, the deployment features an event-driven, autonomous self-healing mechanism capable of detecting and reverting unauthorized configuration tampering without human intervention.

To verify the efficacy of these declarative defenses, both the Child-Push and Root-Pull architectures were provisioned on a live, VPN-routed testbed and actively subjected to simulated cyber-attacks. The empirical outcomes conclusively demonstrated the resilience of the applied countermeasures. The mTLS enforcement successfully dropped unauthorized network probing, the POSIX boundaries restricted unprivileged local credential harvesting, and the dynamic access controls combined with autonomous drift reconciliation effectively mitigated data poisoning and application-level Denial of Service attacks. Alongside the security validation, an empirical performance benchmark quantified the physical overhead of the security layers. The evaluation revealed that the computational footprint remains highly sustainable, requiring minimal CPU usage during sustained operations on resource-constrained edge devices.

By contrasting the operational realities and benchmarking insights of the two replication architectures, the research concludes that the optimal architectural choice is strictly contextual. The Child-Push architecture emerged as the recommended solution for highly decentralized deployments operating over standard public networks, as it natively traverses Network Address Translation gateways and distributes the computational burden of encryption across the edge. Conversely, the Root-Pull architecture was identified as highly suitable for tightly integrated enterprise environments operating over dedicated virtual private networks. By centralizing the replication engines, it adheres to Kafka best practices and provides unparalleled administrative governance over the entire pipeline, albeit introducing significant orchestration complexity at scale. Ultimately, this thesis demonstrates that by leveraging a declarative Security-as-Code pipeline, hierarchical data streaming topologies can be securely deployed and autonomously maintained without sacrificing data confidentiality, integrity, or availability.
Riassunto (Italiano)
File