The goal of this study entails the design of a high-fidelity predictive model for Autism Spectrum Disorder (ASD), combining multi-modal behavioural, developmental, and interactional data with the aim offostering early diagnosis and intervention planning. This methodology takes advantage of a novel multimoduledeep learning pipeline, namely temporal modelling, multi-modal fusion (MMF), and causal inferencebasedlearning. These components are fully designed with the peculiar heterogeneous traits of ASD in mind.Five different specialized modules are introduced: (1) Dynamic Time Warping Auto-Encoder (DTW-AE), wheredynamic temporal alignment and compression of longitudinal developmental features are performed usingDTW-embedded LSTM autoencoder capturing individualized developmental trajectories; (2) Cross-Modal GraphConvolutional Network (CM-GCN): a Cross-Modal Graph Convolutional Network—that jointly learns intermodalityrelationships and fuses behavioural modalities together for eye gaze, facial dynamics, and prosody;(3) LSCE-CL: contrastive learning-based for the extraction of latent representation of social reciprocity from unstructured video/audio data; (4) BSTN: Bayesian Structural Time-Series Network for estimating causal impacts of interventions on behavioural outcome and, thus, conducting counterfactual simulations; (5) AAM-RIS: An adaptive attention-based module for real-time scoring of social engagement signals within downstream diagnostic classifications. The data inputs are developmental logs, behavioural sensor data, and interaction videos recorded by clinicians. The integrated pipeline substantiates gains over the baselines, with diagnostic accuracy of 93-95%, AUC-ROC of 0.95-0.97, and early ASD detection (>3 years) accuracy of 88-90%. Furthermore, it was found that DTW-AE increases temporal representation accuracy by ~15%, CM-GCN improves the classification accuracyby ~12%, and BSTN reduces the error in the intervention forecasting process by ~15% RMSE. The study adds neural alignment of developmental timestamp series, real-time graph fusion of behavioural signals, and counterfactual modelling for ASD prediction. Future work will include scaling to become systems for personalizedtherapy recommendation and real-time clinical deployment for neurodevelopmental monitoring purposes.
