> Or is it lacking scalability in practice? Only speaking from my own little per...

NeedMoreTime4Me · on Jan 8, 2022

Do you have an anecdotal guess on the scalability barrier maybe? Like does it take too long with more than 10,000 data points having 100 features? Just to get a feel.

shakow · on Jan 8, 2022

Please don't quote me on that, as it was academic work in a given language and a given library and might not be representative of the whole ecosystem.

But in a nutshell, on OK-ish CPUs (Xeons a few generations old), we started seeing problems past a few thousands points with a few dozens features.

And not only was the training slow, but also the inference: as we used the whole sampled chain of the weights distributions parameters, not only was memory consumption a sight to behold, but inference time quickly grew through the roof when subsampling was not used.

And all that was on standard NNs, so no complexity added by e.g. convolution layers.

rsfern · on Jan 9, 2022

The main bottleneck in GP models is the inversion of an NxN covariance matrix, so training with the most straightforward algorithm has cubic complexity (and quadratic memory complexity). 10k instance is what I’ve seen as the limit of tractability.

The input dimensionally doesn’t necessarily matter since it’s kernel method, but if you have many features and want to do feature selection or optimize parameters things can really stack up.

There are scalable approximate inference algorithms, and pretty good library support (gpflow, gpytorch, etc), but it seems like they are not widely known, and there are definitely tradeoffs to consider among the different methods.