View Full Version : Flipped out (coins and probability)
T'ai Chi
5th May 2007, 06:01 PM
Check out this Excel file I made:
http://www.statisticool.com/CoinFlip.xls
It keeps track of the %Heads over time, the frequentist version.
(press F9 to refresh)
I'm trying to make a similar visual of probability based on subjective reasoning but haven't had any luck. Anyone have any ideas, or have seen any graph for subjective probability before?
It would help greatly for an educational tool.
Thabiguy
6th May 2007, 03:07 AM
I'm trying to make a similar visual of probability based on subjective reasoning but haven't had any luck. Anyone have any ideas, or have seen any graph for subjective probability before?
I assume that by subjective probability you mean Bayesian probability.
The value would vary depending on how much you want to admit you know about the source of the tosses.
1. If you know that the source is a pseudo-random number generator that eventually cycles through all possible values in a pseudo-random fashion, but you don't know the details of the actual algorithm used, then the Bayesian probability of the next toss being heads is always 1/2, regardless of previous tosses. You get the same result if you know that the source is not a pseudo-random number generator but some actual entropy (for example timing of the user's request to generate the next toss).
2. If you know that the source is a deterministic pseudo-random number generator and you know the actual algorithm, but don't know the initial state, then you will be able to use the previous tosses in Bayesian inference about the internal state of the generator. The exact results will vary depending on the specific algorithm and the generated tosses, but generally, you may expect the probability to remain more or less 1/2 until you generate enough tosses (about as many as the number of bits in the internal state) and afterwards it will always be either 1 or 0 (you narrowed down the possible internal state to a single value and the process ceases to be random).
3. Probably of most interest to you (but also the least realistic) is the situation where you know nothing about the source of the tosses and your only a priori knowledge is the sequence of tosses that have already been generated. In that case, the Bayesian probability of the next toss being heads is given by Laplace's rule of succession and will be p = (number of heads so far+1)/(number of tosses so far+2). This will be initially different from frequentist probability estimate but will approach it with increasing number of tosses.
Zep
6th May 2007, 06:04 AM
I extended the spreadsheet out to 1000 entries, and noted that the end point was not always very close to 50%. Sometimes it was over, but more frequently it was under. Why?
1) Because the random number generator on Excel is not truly random (http://support.microsoft.com/kb/86523).
2) The random number so generated is in the range: 0 <= n < 1 Note the inequality on the end - the generated number is always some digits of accuracy less than 1, but it may conceivably be zero. That is, there is a bias in the range - a very small one, I agree, but it is there.
The problem is that the mechanism you are using to then turn that random number into a 1/0 coin-flip is to round the random number up or down to the nearest integer. And since there is a slight bias towards zero in the randomness, I would expect a slight tendency in the long run to "undershoot" the ones, and "overshoot" the zeroes. Which reality tends to confirm!
Given that the algorithm for RAND() is public knowledge, and some arithmetical cleverness, there should be no reason why you cannot build an unbiased coin-flip spreadsheet.
Good luck!
Jekyll
6th May 2007, 10:37 AM
I extended the spreadsheet out to 1000 entries, and noted that the end point was not always very close to 50%. Sometimes it was over, but more frequently it was under. Why?
2) The random number so generated is in the range: 0 <= n < 1 Note the inequality on the end - the generated number is always some digits of accuracy less than 1, but it may conceivably be zero. That is, there is a bias in the range - a very small one, I agree, but it is there.
Are you sure this isn't selection bias on your part?
[0,0.5) (i.e. 0<=n<0.5) is exactly the same size as [0.5,1) even if we allow for binary rounding errors.
Jarom
6th May 2007, 11:55 AM
I extended the spreadsheet out to 1000 entries, and noted that the end point was not always very close to 50%. Sometimes it was over, but more frequently it was under. Why?
1) Because the random number generator on Excel is not truly random (http://support.microsoft.com/kb/86523).
2) The random number so generated is in the range: 0 <= n < 1 Note the inequality on the end - the generated number is always some digits of accuracy less than 1, but it may conceivably be zero. That is, there is a bias in the range - a very small one, I agree, but it is there.
The problem is that the mechanism you are using to then turn that random number into a 1/0 coin-flip is to round the random number up or down to the nearest integer. And since there is a slight bias towards zero in the randomness, I would expect a slight tendency in the long run to "undershoot" the ones, and "overshoot" the zeroes. Which reality tends to confirm!
Given that the algorithm for RAND() is public knowledge, and some arithmetical cleverness, there should be no reason why you cannot build an unbiased coin-flip spreadsheet.
Good luck!Neither your reason 1 nor your reason 2 explains the observation you made in the first sentence. Reason 1 fails to explain it because although the Excel random number function is not random, it is close enough to be indistinguishable from random by the test you've done. Reason 2 fails to explain it because it is irrelevant except in the case that one of the thousand generated values turned out to be 0. This is highly improbable, and even so only biases that one result.
The correct explanation is reason 3: Your lack of training with statistics causes your expectations of what the results "should" look like to be mistaken.
It is improbable (approximately (1/sqrt(2000*pi)) ~= 0.0126 probability) to get 500 heads in 1000 flips of an unbiased coin. On average, the square of the amount by which the number of heads differs from 500 will be 1000. There should be no bias towards a low number of heads or a high number of heads, and any claimed bias would need to be justified by looking at a very large number of 1000-toss trials and recording the results.
T'ai Chi
6th May 2007, 01:32 PM
"I extended the spreadsheet out to 1000 entries, and noted that the end point was not always very close to 50%. Sometimes it was over, but more frequently it was under. Why?"
P(Heads) = .5 is not required for this exercise. Whatever the frequency tends toward in the long run is what the probability is.
T'ai Chi
18th May 2007, 05:40 PM
bumper cars
Zep
18th May 2007, 11:39 PM
Are you sure this isn't selection bias on your part?
[0,0.5) (i.e. 0<=n<0.5) is exactly the same size as [0.5,1) even if we allow for binary rounding errors.Interestingly, I thought it was exactly that initially. Which is why I did the research.
You need to remember that this is not a function of mathematics - if it were, we would not be having this conversation, and I would agree fully with you. It is a matter of how the randomising function is implemented in a limited representation in a computer program.
The random number returned is actually a real number with a limited number of digits of accuracy. Therefore it has a finite set of possible specific values it can be. One of these values MAY be zero, but it can never be 1. And since TC's spreadsheet uses the under-or-over-0.5 method to generate heads-or-tails, you are actually putting this dividing point just very slightly too high.
Conceptually, think of it like a line of 100 fence-posts across a farm. You can pick one of the fence-posts randomly, but NOT the 100th one at one end. So your range of choices is from 1 to 99, not 1 to 100. If you decide that fence-posts 50 and under are "heads", and the rest "tails", you get ratios of 50/99 = heads, and 49/99 = tails. That is, a slight bias to one of side.
Remember, this is to do with computer representation of mathematics, not the mathematics itself. That I have no argument with.
Zep
18th May 2007, 11:44 PM
Neither your reason 1 nor your reason 2 explains the observation you made in the first sentence. Reason 1 fails to explain it because although the Excel random number function is not random, it is close enough to be indistinguishable from random by the test you've done. Reason 2 fails to explain it because it is irrelevant except in the case that one of the thousand generated values turned out to be 0. This is highly improbable, and even so only biases that one result.
The correct explanation is reason 3: Your lack of training with statistics causes your expectations of what the results "should" look like to be mistaken.
It is improbable (approximately (1/sqrt(2000*pi)) ~= 0.0126 probability) to get 500 heads in 1000 flips of an unbiased coin. On average, the square of the amount by which the number of heads differs from 500 will be 1000. There should be no bias towards a low number of heads or a high number of heads, and any claimed bias would need to be justified by looking at a very large number of 1000-toss trials and recording the results.
Sure. Whatever you say. I guess my 30+ years in computer programming, including designing and writing digital randomisation code, count for nothing.
Slimething
19th May 2007, 04:34 AM
A bias also exists in the rounding mechanism. The correct method of rounding numbers (http://www.pacific.edu/college/psychology/Statistics/Rounding.html)is to round up to the next integer at a value of 5 UNLESS the last digit will be odd. That is, 0.055 rounds to 0.06 but 0.045 rounds to 0.4.
Spreadsheets don't do this normally. On encountering a 5, spreadsheets round up which most of us accept because the bias introduced is fairly small. However, over a large number of trials, the error introduced may be significant.
The reason for this is because of precision. as Zep has pointed out the bias of a range that only asymptotically approaches 1. A final digit of 5 in a given value can itself represent the continuum of numbers from 4.5 to 5.49999999... so precise rounding must account for this possibility. That is why statisticians use the seemingly screwball method of rounding I described rather than the usual "5 is always up" rule.
69dodge
19th May 2007, 04:54 AM
Conceptually, think of it like a line of 100 fence-posts across a farm. You can pick one of the fence-posts randomly, but NOT the 100th one at one end. So your range of choices is from 1 to 99, not 1 to 100. If you decide that fence-posts 50 and under are "heads", and the rest "tails", you get ratios of 50/99 = heads, and 49/99 = tails. That is, a slight bias to one of side.
What if there are 101 fence-posts in all?
Paul C. Anagnostopoulos
19th May 2007, 09:46 AM
The random number returned is actually a real number with a limited number of digits of accuracy. Therefore it has a finite set of possible specific values it can be. One of these values MAY be zero, but it can never be 1. And since TC's spreadsheet uses the under-or-over-0.5 method to generate heads-or-tails, you are actually putting this dividing point just very slightly too high.
I don't think it matters that it can never be 1. The questions are: Are there the same number of discrete floating-point values in [0.0, 0.5) as there are in [0.5, 1.0)? And does the generator produce them all with equal probability?
I believe the answer to the first question is yes, since the fraction is represented as an n-bit binary number. There are the same number of values in $[0, 2^{n}/2)$ as there are in $[2^n/2, 2^n)$, and $2^n/2$ represents 0.5.
~~ Paul
Jorghnassen
19th May 2007, 09:54 AM
I never realised how crappy the RNG on Excel was.
/maybe because I don't use Excel...
rockoon
19th May 2007, 11:05 AM
In normal programming situations where the random number generator generates a number between 0.0 and 0.99999....
The proper method of getting a coin-flip out of this is:
Truncate(2 * RandomNumber)
A) This ensures no rounding bias
B) The multiplication is error free in the floating point domain because at least one of the factors is a power of two (note that overflows arent possible here)
C) Its clear and concise, obvious in its 2 distinct output values.
As for if excel has some form of Truncate() .. beats me.. in most languages this would be called trunc() .. in VisualBasic its called int()
Paul C. Anagnostopoulos
19th May 2007, 11:47 AM
The proper method of getting a coin-flip out of this is:
Truncate(2 * RandomNumber)
Are you sure? Let's say we have an 8-bit mantissa. The generator returns a number in the range:
0:0 through 0:255
When you multiply this by 2, you end up with:
0:0 through 0:255 through 1.0 through 1:254
There is a bias toward 0 after truncation.
~~ Paul
Edited to add: I'm just chatting here. I've never thought much about this before.
69dodge
19th May 2007, 10:21 PM
Are you sure? Let's say we have an 8-bit mantissa. The generator returns a number in the range:
0:0 through 0:255
When you multiply this by 2, you end up with:
0:0 through 0:255 through 1.0 through 1:254
There is a bias toward 0 after truncation.
The number has been multiplied by 2. So, 0:255 isn't possible. Only even numbers are possible.
rockoon's method is ok. So is rounding to the nearest integer without multiplying by 2, if you round 0.5 up to 1. They're the same here.
The multiply-then-truncate method generalizes to integers k greater than 2. Multiplying a random real in [0, 1) by k, then truncating, yields a random integer in [0, k). There are small biases, though, if the random real is a binary fraction and k isn't a power of 2. To eliminate the biases, multiply by $2^n / \lfloor 2^n/k \rfloor$ instead of by k before truncating, where n is the number of bits of precision of the random real. The result might be too big (i.e., >= k); if so, throw it away and generate another random number. To help avoid round-off errors, add $0.5 / \lfloor 2^n/k \rfloor$ after the multiplication and before the truncation.
Paul C. Anagnostopoulos
20th May 2007, 07:29 AM
The number has been multiplied by 2. So, 0:255 isn't possible. Only even numbers are possible.
Duuuhhhhh
:crazy:
:coal:
~~ Paul
69dodge
20th May 2007, 08:48 PM
To help avoid round-off errors, add $0.5 / \lfloor 2^n/k \rfloor$ after the multiplication and before the truncation.
My turn to say "duh"...
That number is wrong. It needs to be smaller by a factor of 2n. It should be $1/(2^{n+1}\lfloor 2^n/k \rfloor)$.
Probably, you should just ignore my whole idea for eliminating the biases. It appears to require more bits of precision than I'm assuming we have. I need to think about it some more. I suppose it would work if the random real has relatively few bits of precision, e.g., it's a single-precision floating-point number, but we can do the other calculations in double precision.
Zep
20th May 2007, 09:09 PM
What if there are 101 fence-posts in all?That is equivalent of varying (in this case, slightly increasing) the allowable range of finite selection points possible. Which is what you really want to do in this case. Or at least be aware that the precision of the computer representation MAY have a bearing on the "randomness" of your generator.
You are also getting into the issue of odds/evens in the bit representation of numbers, as noted above. Depending on your computer architecture, and also on the internal representation and handling of numbers in your OS in many cases, you MAY have situations where you can indeed find unexpected behaviour in this area.
A general guide (and by no means an authoritative answer) is to stick with simple integer representations, preferably in large binary formats. Most real numbers in computer format tend to lose precision the larger they are, unless special provisions are made (including software arithmetic operations).
Of course, there are other ways of generating random binary numbers. For example, taking a feed off an external "random" source, such as the internal computer clock, or some form of chaotic generator or scintillator (PEAR's EGGs were such a source).
Or a direct feed from inside Uri Geller's brain... ;)
rockoon
20th May 2007, 10:48 PM
Also of note is that the methods discussed here are preferable to methods that work off the other end (the least significant bits) of the stochastic variable because many random number generators (LCG's mainly) have very poor cycle lengths on the least significant bits.
In some LCG implimentations, the least signifcant bit alternates 0, 1, 0, 1, 0, 1, ....
Remember that the generation of random numbers is too important to be left to chance.
I cringe whenever I see C code of the form:
x = rand() % y;
I understand why its coded that way .. because the programmer doesn't want to deal with knowing the range of rand()'s output which is implimentation-specific. Double-ignorance doesnt make a right.
ThatSoundAgain
21st May 2007, 01:44 AM
Of course, there are other ways of generating random binary numbers. For example, taking a feed off an external "random" source, such as the internal computer clock, or some form of chaotic generator or scintillator (PEAR's EGGs were such a source).
Back when I was messing around with programming 8-bit computers (like AMOS Basic on the Amiga), you had the option to manually seed the random number generator. IIRC, it would still be a different number on each run even if you hardcoded the seed. I'm guessing the RNG used the system clock.
One common trick was to use user input as a seed, most commonly by timing mouse or keyboard input. How would this be viewed in respect to truly random numbers? The input, I think, could not be said to be truly random, but it is externally dependent and impossible to predict for the user, at it was measured in milliseconds.
Zep
21st May 2007, 03:00 AM
Back when I was messing around with programming 8-bit computers (like AMOS Basic on the Amiga), you had the option to manually seed the random number generator. IIRC, it would still be a different number on each run even if you hardcoded the seed. I'm guessing the RNG used the system clock.
One common trick was to use user input as a seed, most commonly by timing mouse or keyboard input. How would this be viewed in respect to truly random numbers? The input, I think, could not be said to be truly random, but it is externally dependent and impossible to predict for the user, at it was measured in milliseconds.Yep, the lower bits of the clock/timer were often used for random seed. Still is in many cases! For such computers, you were usually randomly choosing from only a small range (random location on a screen, how many enemy robots, etc), so precision was not an issue at all. Therefore this method was perfectly adequate, and very fast back in the day.
Zep
21st May 2007, 03:01 AM
Also of note is that the methods discussed here are preferable to methods that work off the other end (the least significant bits) of the stochastic variable because many random number generators (LCG's mainly) have very poor cycle lengths on the least significant bits.
In some LCG implimentations, the least signifcant bit alternates 0, 1, 0, 1, 0, 1, ....
Remember that the generation of random numbers is too important to be left to chance.
I cringe whenever I see C code of the form:
x = rand() % y;
I understand why its coded that way .. because the programmer doesn't want to deal with knowing the range of rand()'s output which is implimentation-specific. Double-ignorance doesnt make a right.I could not have said it better.
© 2001-2008, James Randi Educational Foundation. All Rights Reserved.
vBulletin® v3.7.3, Copyright ©2000-2008, Jelsoft Enterprises Ltd.