algorithm - Combining daily Welford computed variance into monthly -


i'm using welford's method compute running variance , standard deviation described many times on stack overflow , john d cook's excellent blog post.

i store timestamp, count, sum, , running calculation of "sk" , stdev in database table per day.

i combine or rollup daily computed count,sum, , sk values monthly standard deviation.

john d cook's blog has another post provides algorithm combine 2 "runningstats" 1 (see operator+ method). works combining 2 days 1. use iterate through days combine days in month. however, unlike calculating daily stdev have large number of samples must dealt in streaming fashion, have access daily data, , single formula combine days in month @ once. lend creation of database view.

it not appear summing sk values , dividing total monthly count - 1 produces accurate variance.

example data:

date, count, sum, sk, stddev 1-jun-15, 60, 514, 1556.733336, 5.14 2-jun-15, 51, 455, 1523.686274, 5.52 3-jun-15, 61, 556, 1494.196722, 4.99 ... 

let x1, ..., xn values 1 day. then, understand it, sk defined (mathematically; rolling implementation different) follows.

mk = (sum_{i=1}^n xi) / n     [mk = sum / count] sk = sum_{i=1}^n (xi - mk)^2. 

the problem summing sk column each value computed respect different mean, overall variance underestimate. instead, should have term like

sum_{i=1}^n (xi - (mk + delta))^2, 

which rewrite in terms of existing quantities.

  sum_{i=1}^n (xi - (mk + delta))^2 = sum_{i=1}^n (xi - mk - delta)^2 = sum_{i=1}^n ((xi - mk)^2 - 2 (xi - mk) delta + delta^2) = sk + n delta^2. 

here delta monthly mean minus daily mean. once again, not warrant numerical stability.


Comments

Popular posts from this blog

PHP DOM loadHTML() method unusual warning -

python - How to create jsonb index using GIN on SQLAlchemy? -

c# - TransactionScope not rolling back although no complete() is called -