Distribution Statistics

Alexander Liss

02/07/2009

Analysis of random values is supported with histograms, which show distribution of a random value.

Analysis of stochastic processes, especially comparative analysis of stochastic processes with multiple parameters could be supported with a similar statistics – a Distribution Statistics, which is defined here.

A Distribution Statistics shows distribution of one parameter of the stochastic process in relation to another parameter, or it shows distribution of a parameter over time.

These statistics are useful, for example, in analysis of trading.

A Distribution Statistics allows presentation various uneven distributions in a compact and visual form.

When there are two characteristics of events A and B, and there is a set of N events and hence a set of pairs

(A1,B1), …, (AN,BN)

a histogram of “A at B” is created as follows.

First, an interval [z0,z1] selected, which contains values

B1, …, BN,

z0 < B0 and BN <= z1

Second, it is divided on sub-intervals using points bi

z0 = b0 < b1 < … < bn = z1

For each of n intervals [bi-1,bi), with i from 1 to n, one adds-up values Aj of events, where the characteristic Bj belongs to the interval [bi-1,bi). The result is ai.

Note, that the very last interval has to be inclusive: [bn-1,bn].

Now, for each interval [bi-1,bi), there is a number ai, where i=1,…,n. We denote this set:

(b0)a1(b1)a2(b2)…(bn-1)an(bn)

Instead of adding up values Aj values Aj of events, where the characteristic Bj belongs to the interval      (bi-1,bi], one could average them. This creates a different type of statistic with averages over “buckets”.

When there are a few statistics (a1,…,ak) for the same set of intervals [bi-1,bi), for example there are a few statistics accumulated for one minute time intervals, they are presented with series:

(b0)a11,a21, … ak1(b1)a12,a22, … ak2 …(bn-1)a1n,a2n, … akn(bn)

When characteristic B is time of an event, the same procedure defines a Distribution Statistics over time.

When ti is an end of time interval, starting with t1 = “start time” + “time interval” and ending with tn = “end time”, then one presents a set of statistics a1,…,ak in a compact form:

(t0)a11,a21, … ak1(t1)a12,a22, … ak2 … (tn-1)a1n,a2n, … akn(tn)

To normalize the presentation, instead of values ti, values

zi = ti – “start time”

are used:

(0)a11,a21, … ak1(z1)a12,a22, … ak2 … (zn-1)a1n,a2n, … akn(zn)