Explaining machine learning models trained to predict Copernicus DEM errors in different land cover environments | Natural Hazards Research Australia

Explaining machine learning models trained to predict Copernicus DEM errors in different land cover environments

This study trains five machine learning models to correct vertical biases in the Copernicus DEM and explain them using SHapley Additive exPlanation (SHAP) values.

Research theme

Sustainable, safe and healthy natural landscapes

Publication type

Journal Article

Published date

07/2026

Author	Michael Meadows , A/Prof Karin Reinke , Simon Jones
Abstract	Machine learning models are increasingly used to correct the vertical biases (mainly due to vegetation and buildings) in global Digital Elevation Models (DEMs), for downstream applications which need ‘‘bare earth" elevations. The predictive accuracy of these models has improved significantly as more flexible model architectures are developed and new explanatory datasets produced, leading to the recent release of three model-corrected DEMs (FABDEM, DiluviumDEM and FathomDEM). However, there has been relatively little focus so far on explaining or interrogating these models, especially important in this context given their downstream impact on many other applications (including natural hazard simulations). In this study we train five separate models (by land cover environment) to correct vertical biases in the Copernicus DEM and then explain them using SHapley Additive exPlanation (SHAP) values. Comparing the models, we find significant variation in terms of the specific input variables selected and their relative importance, suggesting that an ensemble of models (specialising by land cover) is likely preferable to a general model applied everywhere. Visualising the patterns learned by the models (using SHAP dependence plots) provides further insights, building confidence in some cases (where patterns are consistent with domain knowledge and past studies) and highlighting potentially problematic variables in others (such as proxy relationships which may not apply in new application sites). Our results have implications for future DEM error prediction studies, particularly in evaluating a very wide range of potential input variables (160 candidates) drawn from topographic, multispectral, Synthetic Aperture Radar, vegetation, climate and urbanisation datasets.
Year of Publication	2026
Journal	Artificial Intelligence in Geosciences
Date Published	07/2026
DOI	https://doi.org/10.1016/j.aiig.2025.100141
Locators	DOI \| Google Scholar

Related projects

Project
Correcting vertical errors in a global Digital Elevation Model to derive a bare earth terrain surface for improved flood modelling in data-scarce regions