View Full Version : Streaming statistics
T'ai Chi
25th March 2006, 07:13 AM
I wrote this today
http://www.statisticool.com/streamstat.htm
All well-known stuff.
My question at the bottom of the page- does anyone know?
Seems fruitful, especially for Internet, genome, and satellite data.
Complexity
25th March 2006, 08:26 AM
Why would I want to read anything that you wrote, you loathesome troll?
loathesome troll? please be more polite.
Iamme
25th March 2006, 05:36 PM
What other statistics can we make streaming, you ask?
Well? How about the statisic of having to go to the bathroom, causing p-streaming to the power of number 1? :)
Zep
25th March 2006, 07:24 PM
If it's all well-known stuff, why are we bothering to rehash it here?
Angus McPresley
26th March 2006, 03:32 AM
I actually blogged about something very similar to this a while back (http://markandmarjorie.blogspot.com/archives/2004_10_10_markandmarjorie_archive.html#1097590023 16158723). I can't quite figure out if we're talking about the same thing...
homer
26th March 2006, 03:37 AM
I just had to visit your link didn't I ? My brain hurts .
T'ai Chi
26th March 2006, 05:16 AM
I actually blogged about something very similar to this a while back (http://markandmarjorie.blogspot.com/archives/2004_10_10_markandmarjorie_archive.html#1097590023 16158723). I can't quite figure out if we're talking about the same thing...
Yes, that's exactly the same, and I also do the same with the standard deviation.
I'm wondering if such things can be done for many other statistics.
T'ai Chi
26th March 2006, 05:17 AM
(double post)
Ed
26th March 2006, 06:10 AM
Why would I want to read anything that you wrote, you loathesome troll?
loathesome troll? please be more polite.
That was polite.
Removed mod code - do not use the mod code again, that is for mod team use only.
T'ai Chi
26th March 2006, 06:44 AM
*sigh*
Ed
26th March 2006, 08:21 AM
That was polite.
Removed mod code - do not use the mod code again, that is for mod team use only.
Really?
Is that in the rules? Which one?
ImaginalDisc
26th March 2006, 08:40 AM
Really?
Is that in the rules? Which one?
Come on Ed, that's like saying "But the police cars aren't only for the police." the mod tag is an enforcement tool.
ClusterBoy
26th March 2006, 11:22 AM
My question at the bottom of the page- does anyone know?
Seems fruitful, especially for Internet, genome, and satellite data.
Specifically:
Anything that can be written as a function f(x)g(n) and f and g are polynomials.
(where n is your number of samples)
In general:
Anything that can be considered as a markov chain across your run, e.g. maximum value of an RNG over n iterations
Not the median (or indeed any non-trivial percentile). Not the mode.
Use WinBUGS. Some stats can be stored in summary mode (which uses what you term 'streaming'), some only in sample mode (where you have to store the whole lot). Its well known, and kinda useful, but hardly ground breaking stuff...
jj
26th March 2006, 07:15 PM
Somebody who has more time than I do should put some effort into discussing the arithemetic accuracy that this kind of old thing needs to work decently in the real world.
T'ai Chi
26th March 2006, 07:28 PM
Specifically:
Anything that can be written as a function f(x)g(n) and f and g are polynomials.
(where n is your number of samples)
In general:
Anything that can be considered as a markov chain across your run, e.g. maximum value of an RNG over n iterations
Not the median (or indeed any non-trivial percentile). Not the mode.
Use WinBUGS. Some stats can be stored in summary mode (which uses what you term 'streaming'), some only in sample mode (where you have to store the whole lot). Its well known, and kinda useful, but hardly ground breaking stuff...
Thanks Cluster, I'll check it out.
Angus McPresley
27th March 2006, 01:39 AM
Somebody who has more time than I do should put some effort into discussing the arithemetic accuracy that this kind of old thing needs to work decently in the real world.
Well, when extreme accuracy is needed, you can always take my favorite approach: implement a Fraction class (with all the necessary arithmetic operators) that represents each value as a numerator and a denominator (reducing them as necessary), until you ask for a decimal result. Your results will then be EXACTLY accurate, to any arbitrary number of decimal places.
This trick doesn't work for every situation -- if there are roots involved, then you need to convert them to decimal (or create a Root class that does the same sort of thing :). But it should work for the averaging scenario...
T'ai Chi
4th May 2007, 09:20 PM
I updated the article somewhat:
http://www.statisticool.com/streamstat.htm
© 2001-2009, James Randi Educational Foundation. All Rights Reserved.
vBulletin® v3.7.5, Copyright ©2000-2009, Jelsoft Enterprises Ltd.