Fault detection in district heating (DH) substations is critical for energy efficiency and reliability. However, it is challenged by scarce fault labels, low-frequency data, privacy concerns, and battery-constrained gateways. We propose a novel hybrid semi-supervised federated domain adaptation architecture for fault detection in DH. We use a one-class variational autoencoder (VAE) to leverage heterogeneous sensor streams from 434 distributed substations. First, we perform cross-network unsupervised pre-training on multi-sourced data from two independent real-world DH networks, fusing their return temperature dynamics into a robust shared manifold. Second, we leverage maintenance metadata to selectively allow verified-normal clients for per-round fine-tuning of the model. Third, we drastically reduce uplink costs by compressing each client's weight delta using 10% top-k sparsification and demonstrate that our pipeline enables robust few-shot finetuning with 20% of the normal operational data while retaining high detection performance. By strategically training, our method achieves F1 and G-mean scores of up to 97% and an AUC ≥ 99% on real-world DH data. To our knowledge, this is the first work to study cross-domain data fusion in the DH field for fault detection, aiming to enhance and enable effective, scalable, and energy-efficient monitoring of substations.