Distribution Statistics


Alexander Liss






Analysis of random values is supported with histograms, which show distribution of a random value.

Analysis of stochastic processes, especially comparative analysis of stochastic processes with multiple parameters could be supported with a similar statistics – a Distribution Statistics, which is defined here.

A Distribution Statistics shows distribution of one parameter of the stochastic process in relation to another parameter, or it shows distribution of a parameter over time.

These statistics are useful, for example, in analysis of trading.

     A Distribution Statistics allows presentation various uneven distributions in a compact and visual form.


     When there are two characteristics of events A and B, and there is a set of N events and hence a set of pairs


(A1,B1), …, (AN,BN)


a histogram of “A at B” is created as follows.

     First, an interval [z0,z1] selected, which contains values


B1, …, BN,


z0 < B0 and BN <= z1


     Second, it is divided on sub-intervals using points bi


z0 = b0 < b1 < … < bn = z1


     For each of n intervals [bi-1,bi), with i from 1 to n, one adds-up values Aj of events, where the characteristic Bj belongs to the interval [bi-1,bi). The result is ai.

     Note, that the very last interval has to be inclusive: [bn-1,bn].   

     Now, for each interval [bi-1,bi), there is a number ai, where i=1,…,n. We denote this set:




     Instead of adding up values Aj values Aj of events, where the characteristic Bj belongs to the interval      (bi-1,bi], one could average them. This creates a different type of statistic with averages over “buckets”.

     When there are a few statistics (a1,…,ak) for the same set of intervals [bi-1,bi), for example there are a few statistics accumulated for one minute time intervals, they are presented with series:


(b0)a11,a21, … ak1(b1)a12,a22, … ak2 …(bn-1)a1n,a2n, … akn(bn)


     When characteristic B is time of an event, the same procedure defines a Distribution Statistics over time.

When ti is an end of time interval, starting with t1 = “start time” + “time interval” and ending with tn = “end time”, then one presents a set of statistics a1,…,ak in a compact form:


(t0)a11,a21, … ak1(t1)a12,a22, … ak2 … (tn-1)a1n,a2n, … akn(tn)


To normalize the presentation, instead of values ti, values


zi = ti – “start time”


are used:


(0)a11,a21, … ak1(z1)a12,a22, … ak2 … (zn-1)a1n,a2n, … akn(zn)