A Unified Batch-and-Streaming Data Architecture for Machine Learning Applications Incorporating Predictive Fault Detection and Validation
Abstract
Machine learning (ML) products increasingly depend on data platforms that must simultaneously support high-throughput batch analytics, low-latency streaming decisions, and continuously evolving schemas, features, and model requirements. Yet many enterprises still operate split architectures where batch ETL, real-time pipelines, and ML lifecycle tooling are assembled as loosely coupled systems, amplifying operational risk, data quality regressions, and silent ML failures. This paper proposes UBSDA (Unified Batch-and-Streaming Data Architecture), a lakehouse-centered reference architecture that unifies batch and streaming ingestion, storage, transformation, and feature publication while embedding predictive fault detection and validation as first-class capabilities. UBSDA introduces (i) a single data truth layer for both offline training and online inference, (ii) contract-driven schema governance with evolution support, (iii) multi-stage validation gates that combine statistical checks, constraint-aware learning, and drift monitoring, and (iv) a fault prediction service trained on pipeline telemetry to anticipate failures before service-level objectives are violated. We detail the architecture, formalize validation and fault-risk scoring, and present an evaluation methodology showing how unified storage plus proactive detection reduces duplicated transformations, shortens recovery loops, and improves ML reliability across domains.
How to Cite This Article
Sai Kiran Pullela (2024). A Unified Batch-and-Streaming Data Architecture for Machine Learning Applications Incorporating Predictive Fault Detection and Validation . Global Multidisciplinary Perspectives Journal (GMPJ), 1(4), 69-73. DOI: https://doi.org/10.54660/GMPJ.2024.1.4.69-73