**Distribution
Statistics**

Alexander Liss

Analysis of random values is supported with
histograms, which show distribution of a random value.

Analysis of stochastic processes, especially
comparative analysis of stochastic processes with multiple parameters could be
supported with a similar statistics – a Distribution Statistics, which is
defined here.

A Distribution Statistics shows distribution of one
parameter of the stochastic process in relation to another parameter, or it
shows distribution of a parameter over time.

These statistics are useful, for example, in
analysis of trading.

A Distribution Statistics allows
presentation various uneven distributions in a compact and visual form.

When there are two characteristics of
events A and B, and there is a set of N events and hence a set of pairs

(A_{1},B_{1}),
…, (A_{N},B_{N})

a
histogram of “A at B” is created as follows.

First, an interval [z0,z1] selected, which
contains values

B_{1}, …, B_{N},

z_{0} < B_{0}
and B_{N} <= z_{1}

Second, it is divided on sub-intervals
using points b_{i}

z_{0} = b_{0}
< b_{1} < … < b_{n} = z1

For each of n intervals [b_{i-1},b_{i}),
with i from 1 to n, one adds-up values A_{j} of events, where the
characteristic B_{j} belongs to the interval [b_{i-1},b_{i}).
The result is a_{i}.

Note, that the very last interval has to be
inclusive: [b_{n-1},b_{n}].

Now, for each interval [b_{i-1},b_{i}),
there is a number a_{i}, where i=1,…,n. We denote this set:

(b_{0})a_{1}(b_{1})a_{2}(b_{2})…(b_{n-1})a_{n}(b_{n})

Instead of adding up values Aj values A_{j}
of events, where the characteristic B_{j} belongs to the interval (b_{i-1},b_{i}], one
could average them. This creates a different type of statistic with averages
over “buckets”.

When there are a few statistics (a^{1},…,a^{k})
for the same set of intervals [b_{i-1},b_{i}), for example
there are a few statistics accumulated for one minute time intervals, they are
presented with series:

(b_{0})a^{1}_{1},a^{2}_{1},
… a^{k}_{1}(b_{1})a^{1}_{2},a^{2}_{2},
… a^{k}_{2} …(b_{n-1})a^{1}_{n},a^{2}_{n},
… a^{k}_{n}(b_{n})

When characteristic B is time of an event, the
same procedure defines a Distribution Statistics over time.

When t_{i} is an end of time interval,
starting with t_{1} = “start time” + “time interval” and ending with t_{n}
= “end time”, then one presents a set of statistics a^{1},…,a^{k}
in a compact form:

(t_{0})a^{1}_{1},a^{2}_{1},
… a^{k}_{1}(t_{1})a^{1}_{2},a^{2}_{2},
… a^{k}_{2} … (t_{n-1})a^{1}_{n},a^{2}_{n},
… a^{k}_{n}(t_{n})

To normalize the presentation, instead of values t_{i},
values

z_{i}
= t_{i} – “start time”

are
used:

(0)a^{1}_{1},a^{2}_{1},
… a^{k}_{1}(z_{1})a^{1}_{2},a^{2}_{2},
… a^{k}_{2} … (z_{n-1})a^{1}_{n},a^{2}_{n},
… a^{k}_{n}(z_{n})