algorithm - Combining daily Welford computed variance into monthly -
i'm using welford's method compute running variance , standard deviation described many times on stack overflow , john d cook's excellent blog post.
i store timestamp, count, sum, , running calculation of "sk" , stdev in database table per day.
i combine or rollup daily computed count,sum, , sk values monthly standard deviation.
john d cook's blog has another post provides algorithm combine 2 "runningstats" 1 (see operator+ method). works combining 2 days 1. use iterate through days combine days in month. however, unlike calculating daily stdev have large number of samples must dealt in streaming fashion, have access daily data, , single formula combine days in month @ once. lend creation of database view.
it not appear summing sk values , dividing total monthly count - 1 produces accurate variance.
example data:
date, count, sum, sk, stddev 1-jun-15, 60, 514, 1556.733336, 5.14 2-jun-15, 51, 455, 1523.686274, 5.52 3-jun-15, 61, 556, 1494.196722, 4.99 ...
let x1, ..., xn
values 1 day. then, understand it, sk
defined (mathematically; rolling implementation different) follows.
mk = (sum_{i=1}^n xi) / n [mk = sum / count] sk = sum_{i=1}^n (xi - mk)^2.
the problem summing sk
column each value computed respect different mean, overall variance underestimate. instead, should have term like
sum_{i=1}^n (xi - (mk + delta))^2,
which rewrite in terms of existing quantities.
sum_{i=1}^n (xi - (mk + delta))^2 = sum_{i=1}^n (xi - mk - delta)^2 = sum_{i=1}^n ((xi - mk)^2 - 2 (xi - mk) delta + delta^2) = sk + n delta^2.
here delta
monthly mean minus daily mean. once again, not warrant numerical stability.
Comments
Post a Comment