HN2new | past | comments | ask | show | jobs | submitlogin

Initially we wrote the library to help us answer the question about what change may be causing impact to our customers' experiences. Out of the billions of time series metrics we have we knew there were about 10,000 or so likely (for this initial use case), possible candidates and we wanted to reduce that set as far as possible and either (1) find /the/ candidate for problem or (2) produce a short list of things for the humans to look into. In order to get to the point where we could ensure high likelihood of apropos correlations, we needed to do some work on the signals first.

First we detect if any of the possible candidate signals were born or died in the interesting time period. We use these time points to reduce the window we'll use to pass to the correlation functions. We can also detect any changepoints in the time series and apply similar logic. Once we've determined the best window bounds for each candidate signal, we use pearson and spearman correlation functions to get a score for the pair of signals -- the initial signal that started the inquiry and the candidate signal using the determined time window.

The code is about 98% data preparation, signal analysis, and window determination and about 2% correlation work.

(I've tried to summarize quite a bit, let me know if you'd like clarifications or have other questions.)



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: