Nikolaus Umlauf, Universität Innsbruck
Flexible Distributional Regression Models for Very Large Datasets
During the last decades there has been an increasing interest in distributional regression models that allow to model all distributional parameters, such as location, scale and shape and thereby the entire data distribution conditional on covariates. In particular, the framework of structured additive distributional regression models enables to specify different types of effects such as linear, non-linear or interaction effects on all the distribution parameters hence providing a very flexible and generic framework suited for many complex real data problems. However, estimation of distributional regression models using datasets beyond $10^6$ observations is a difficult task. We propose a new method which is based on ideas of stochastic gradient descent algorithms and can deal easily with much larger data sets. Moreover, the algorithm performs automatic variable selection and its performance is in most cases superior or at least equal to other implementations for distributional regression. An implementation is provided in the R package bamlss (https://cran.r-project.org/package=bamlss). We illustrate the usefulness of the approach by implementing a state-of-the-art prediction model for flash occurrence and counts in complex terrain using a neural network distributional regression model.