View Full Version : Computer Randomness?
ReFLeX
25th July 2005, 07:59 AM
I was attempting to explain Monte Carlo simulations to a guy who was telling me that "80% of all statistics are false" is not a joke. This because I said "statistically significant", to which he replied 'statistics is all wrong, anyway'...
However. He denied that computers are capable of randomness, on the basis that they are systematic and can only perform sequences of operations. Thereby if a computer was to generate an extremely large list of three digit numbers, there would be a discoverable sequence.
Now, I'm fairly confident that computers are more capable of randomness than humans. But are they perfectly random? How is this accomplished, on a basic level? Perhaps not on a basic level at all, I'm thinking.
roger
25th July 2005, 08:17 AM
The algorithms that your computer uses are called pseudo-random number generators. That's because they use a deterministic algorithm - if you know the previous number generated, you can figure out what the next number was. That's anything but random. What they do is give the appearance of randomness, while fairly uniformly distributing the values over the number space.
There ARE hardware devices that can be attached that produce true randomness, if necessary. Other schemes use techniques like reseeding the deterministic algorithm with some fairly random number, such as the low order bits of the time when the user presses a key. I'll let others quibble about the extent to which schemes like that might be called "random".
davidhorman
25th July 2005, 08:56 AM
there would be a discoverable sequence.
Define "discoverable". I think you could spend a long time trying to discover the sequence in a list of pseudo-random numbers from a computer.
Other schemes use techniques like reseeding the deterministic algorithm with some fairly random number, such as the low order bits of the time when the user presses a key.
My idea was to use the electrical noise on the soundcard's line in to generate random numbers. It seemed to work fairly well, in as much as there was a 50/50 distribution of parities on the sound samples, which is as far as I investigated.
David
DrMatt
25th July 2005, 10:13 AM
There are many schemes for creating random numbers on a computer that don't depend on computing, for instance taking input from a geiger counter. But even the "A pattern would emerge" is not a correct description of sequences which can be computed, such as the decimal expansion of Pi.
Edited to add: I'm extremely skeptical of the utility of trying to talk sense into people who argue that the findings of basic mathematics are wrong wrong wrong.
I once knew somebody who wanted to buy a house while she was sky high in credit card debt and getting higher and higher. The credit union referred her to an advisor who specialized in planning how to get out of debt. She reported to me that she'd been sent to an "evil lady", because she refused to accept the realities of compound interest--and that a credit line is not an infinite line of cash. Shortly thereafter, she got a bad credit rating. Years later, she reportedly got psychiatric help. But at the time, no amount of facts, arithmetics, or beating her head against reality would get through to her.
ReFLeX
25th July 2005, 11:51 AM
I try to stay optimistic by thinking that hopefully I can persuade at least one person to start looking things up, to get themselves educated. That is, after all, what the JREF is all about. It's not about throwing up your hands and saying, "No, you're wrong."
Stimpson J. Cat
25th July 2005, 01:29 PM
As Roger said, computer algorithms are deterministic. However, that does not mean that, as your friend said, "if a computer was to generate an extremely large list of three digit numbers, there would be a discoverable sequence.".
Extremely good pseudo-random number generators exist. So good that even the most sensitive tests we know of cannot distinguish between the output of these algorithms and a truly random process.
As to what any of this has to do with the claim that "statistics is all wrong anyway", I cannot even imagine. My suspicion is that he simply does not have any idea what he is talking about.
Dr. Stupid
gnome
25th July 2005, 01:48 PM
I think people misunderstand the problem with statistics--it's not so much that the information is invalid, as that it's so often misinterpreted, unintentionally or deliberately.
Paul C. Anagnostopoulos
25th July 2005, 03:58 PM
Sounds like the guy may have read too many syllogisms for his own good.
~~ Paul
|
|
|
V
Zep
25th July 2005, 04:17 PM
33.3% of all people surveyed agree that 27.6% of all statistical results are made up.
BillC
25th July 2005, 06:16 PM
"89.6 per cent of statistics are made up on the spot" -- Vic Reeves.
pmurray
25th July 2005, 08:34 PM
Without a source of noise provided by hardware, a random number generator will be deterministic. There are different algorithms for producing random numbers, and they have different "strengths", cryptographically speaking.
A secure random number generator (eg: as provided by the SecureRandom java class) has the following property: even if you know exactly how the generator works, you cannot deduce from it's output alone what state the generator is in.
In order to work at all, a random number generator must be given an initial source of entropy, or "seed". Without that, it will always produce the same sequence of numbers. A weak way to do this is to seed it with the current time in milliseconds. It is week because if you know roughly what time the generator was seeded, you can try seeding an identical generator untill you get a match.
The java "SecureRandom" seeds itself by starting up a number of threads and timing them. The time taken by the threads relies on the details of what the machine is doing, and is pretty safe. For a project I worked on I found that this initialisation was takinf too long (several seconds), so I used the current time, the current free memory, and the current contents of the system "temp" directory as a source of entropy. Good enough, I hope.
You might use thousands of 32-bit numbers generated psudeorandomly. All will depend deterministically on the initial (say) 128 bits - so there are far far fewer possible sequences than if the generator was truly random. But that far far smaller number is 2^128, which is pretty safe.
Generally speaking, 128 bits of entropy is is enough for any purpose, owing to the thermodynamic minimum amount of energy that a computing device would have to dissipate to attack it by brute force.
JR "BOB" Dobbs
25th July 2005, 09:56 PM
The random number generators in slot machines are checked out by an independent testing laboratory for "pure" randomness, in that they pass our conventional tests for randomness. If you flip a coin 100 times, it's possible to get 100 heads, and it would be just as random as getting 50 heads. Random number generators are typically geared towards coming up with statistically correct number sequences. The entropy of the system is maintained because the timing of the player pressing the "spin" button is a determining factor in the outcome. Needless to say, the player has no idea what state the processor is in.
LW
26th July 2005, 02:46 AM
Originally posted by roger
That's because they use a deterministic algorithm - if you know the previous number generated, you can figure out what the next number was
Small nitpick: if you know enough previous numbers, you can compute the next.
There are algorithms for pseudo-random number generation where the next number depends on more than one previous number.
ReFLeX
26th July 2005, 07:13 AM
Originally posted by Stimpson J. Cat
As to what any of this has to do with the claim that "statistics is all wrong anyway", I cannot even imagine. My suspicion is that he simply does not have any idea what he is talking about.
Dr. Stupid ...Monte Carlo simulations, right. The powerful computer alternative to manual hypothesis testing in statistics: http://en.wikipedia.org/wiki/Monte_Carlo_simulation
Interesting history in there that I didn't know about:
Perhaps the most famous early use was by Fermi in 1930, when he used a random method to calculate the properties of the newly-discovered neutron. Monte Carlo methods were central to the simulations required for the Manhattan Project, though were strongly limited by the computational tools at the time. However, it was only after electronic computers were first built (from 1945 on) that Monte Carlo methods began to be studied in depth. In the 1950s they were used at Los Alamos for early work relating to the development of the hydrogen bomb, and became popularized in the fields of physics and operations research. The Rand Corporation and the U.S. Air Force were two of the major organizations responsible for funding and disseminating information on Monte Carlo methods during this time, and they began to find a wide application in many different fields.
Uses of Monte Carlo methods require large amounts of random numbers, and it was their use that spurred the development of pseudorandom number generators, which were far quicker to use than the tables of random numbers which had been previously used for statistical sampling.
ReFLeX
26th July 2005, 07:16 AM
Originally posted by gnome
I think people misunderstand the problem with statistics--it's not so much that the information is invalid, as that it's so often misinterpreted, unintentionally or deliberately. This is exactly the point I needed to get across: that statistically significant results doesn't mean the results are wrong (wouldn't that be funny to suddenly learn) and that it isn't the fault of statistics when humans decide to exploit them.
MRC_Hans
26th July 2005, 07:34 AM
One smart thing about a PRNG is that you can have a sequence repeated. This is very useful while building and debugging programs that use random numbers, because you can make the program react predictably by using the same seed on each run.
Hans
ReFLeX
26th July 2005, 12:58 PM
Originally posted by MRC_Hans
One smart thing about a PRNG is that you can have a sequence repeated. This is very useful while building and debugging programs that use random numbers, because you can make the program react predictably by using the same seed on each run.
Hans Yeah, I'm always doing that... debugging those random number programs ... *shakes head*
DrDave
27th July 2005, 02:45 AM
For a project I worked on, one of the applications required a random binary number to be input as the seed to the encryption (or something like that!).
Anyway, it provided a random number generator in which you moved the mouse 'at random' to generate said number, or you could pick your own number.
However, one of the chief techie guys didn't trust the generator as it wasn't random enough for his liking. So he got a coin and flipped it 128 times and recorded heads as 1, and tails as 0.
The real daft bit came at the end when he looked at the number and decided it still wasn't random enough, and swapped a bunch of the digits around :confused:
Quite how he decided the first set of numbers wasn't random is beyond me!
Dave
Soapy Sam
27th July 2005, 04:05 AM
In the OP, ReFlex uses the expression "Perfectly random".
I wonder - does that term have a rigorously defined meaning?
There seems an inherent contradiction here. If a number is perfectly random, then no number can be "more" random. So the number should be, in principle, calculable in advance- hence not random at all. Only imperfectly random numbers would be wholly unpredictable.
"Random numbers", like "infinite numbers", leave me feeling that I was off school the day they were explained.
ReFLeX
27th July 2005, 04:44 AM
Originally posted by DrDave
...The real daft bit came at the end when he looked at the number and decided it still wasn't random enough, and swapped a bunch of the digits around... My psychology text has a special section on the non-intuitiveness of true random sequences. Supposing there is two possible outcomes, then a person is more likely to identify as random a sequence that alternates often than one with pairs and trios of the same outcome in a row.
A-B-A-B-B-A-B-A-B
B-A-A-B-A-B-B-A-A
Though the upper sequence is less likely to occur in an equal probability scenario, that is how people envision randomness, oddly. The same thing applies to lotteries in which you pick four numbers. Something like 40% of all possible numbers contain the same digit twice, yet almost everyone picks a number that has four unique digits, because it seems more probable. But, as we know, each sequence is just as likely to win as "9-9-9-9". In a perfectly random scenario, that is. Which brings me to:
Originally posted by Soapy Sam
In the OP, ReFlex uses the expression "Perfectly random".
I wonder - does that term have a rigorously defined meaning?Yes. Each possible outcome is exactly equally likely to occur.
There seems an inherent contradiction here. If a number is perfectly random, then no number can be "more" random.The number is not random on its own. It is how the number is selected that produces randomness.
So the number should be, in principle, calculable in advance- hence not random at all. Only imperfectly random numbers would be wholly unpredictable.Err, no. You're overthinking this. Random just means every outcome is just as probable as another. If a random number generator gives me "666" as a random number, there is nothing special about that number. What's perfectly random about it is how the number was chosen.
"Random numbers", like "infinite numbers", leave me feeling that I was off school the day they were explained.Understand infinity... if only...
MRC_Hans
27th July 2005, 05:00 AM
One thing people often fail to understand is the probability thing. Partly because they try to apply it after the fact. This is the reason some numbers or sequences are considered more "random looking" than others. However, we must realize that once we have a sequence, there is NO WAY to determine whether it is random or not. Only by trying to predict the number or sequence IN ADVANCE can we hope to determine whether it is random.
An example:
A
Is that random? We cannot say.
AB
Random now? Still cannot say, but if I now say: The next letter will be 'C', and we subsequently get
ABC
THEN I can say that it is probably NOT random.
Hans
Soapy Sam
27th July 2005, 07:31 AM
ReFLeX-
Gotcha. It's random number generation
rather than random number generation.
And a random number table is not a table of random numbers, but a random table of numbers.
This is probably stunningly obvious to you, but suddenly makes a lot more sense to me. :)
69dodge
27th July 2005, 10:40 AM
A Monte Carlo algorithm takes as input a sequence of numbers ("random numbers"), and produces as output an answer to whatever question it was designed for. It doesn't know or care about how the numbers were generated, i.e., whether they were generated via a random process or not; its answer just depends on what the numbers themselves are. Some number sequences will cause it to give the right answer, or very nearly the right answer, and other number sequences will cause it to give a wrong answer.
The important point, however, is that although we generally don't know which are the good sequences and which are the bad ones, we do know that the vast majority of them are good. Therefore, generating one randomly is very likely to result in a good one. So that's what we do.
69dodge
27th July 2005, 11:13 AM
Originally posted by Soapy Sam
In the OP, ReFlex uses the expression "Perfectly random".
I wonder - does that term have a rigorously defined meaning?
There seems an inherent contradiction here. If a number is perfectly random, then no number can be "more" random. So the number should be, in principle, calculable in advance- hence not random at all. Only imperfectly random numbers would be wholly unpredictable.
"Random numbers", like "infinite numbers", leave me feeling that I was off school the day they were explained.You could have a bunch of numbers that are all "equally random." :D
But, actually, you're very close to the truth. One can rigorously define "random" (Google for "algorithmic information complexity"), but it turns out that, for most random numbers, it's impossible to prove that they are random.
Roughly, the idea is that a number is not random if it can be compressed, e.g., by a file compression program like WinZip. But even if WinZip can't compress it, perhaps some other program can, in which case it's still not random. So, you can see why proving that a number is random might be harder that proving that it's not random.
ReFLeX
27th July 2005, 11:43 AM
Originally posted by 69dodge
A Monte Carlo algorithm takes as input a sequence of numbers ("random numbers"), and produces as output an answer to whatever question it was designed for. It doesn't know or care about how the numbers were generated, i.e., whether they were generated via a random process or not; its answer just depends on what the numbers themselves are.I'm only a third year student, and this came from my required stats class for psych. Thus I don't know a lot about this, but are you sure that applies to using the algorithm for hypothesis testing, that the numbers don't need to be random? Like in the quote above from wikipedia, it was the advent of Monte Carlo methods that led to the need for random number generation.
Some number sequences will cause it to give the right answer, or very nearly the right answer, and other number sequences will cause it to give a wrong answer.
The important point, however, is that although we generally don't know which are the good sequences and which are the bad ones, we do know that the vast majority of them are good. Therefore, generating one randomly is very likely to result in a good one. So that's what we do. Again, I'm not sure what a "bad" sequence is. If the vast majority of sequences are desirable, then generating one in any other way is also very likely to result in a good one. When I get home I'll get out the notes and give you the explanation we got of Monte Carlo simulations.
Roughly, the idea is that a number is not random if it can be compressed...?!? This is the same misconception as Soapy Sam's. Randomness is not a property of a number. It is a property of the method of arriving at that number. If a number exists, then there must be a way to arrive at that number randomly.
The idea
27th July 2005, 12:32 PM
Originally posted by 69dodge
Roughly, the idea is that a number is not random if it can be compressed, e.g., by a file compression program like WinZip. But even if WinZip can't compress it, perhaps some other program can, in which case it's still not random. So, you can see why proving that a number is random might be harder that proving that it's not random.
Are there some restrictions that a program must satisfy for it to be considered a legitimate compression program?
Suppose a program specifies that some particular big number (for example, a hundred decimal digits) is compressed to (or encoded as) some particular small number (for example, twenty decimal digits). For any particular big number and small number combination, there is such a program. Thus, it would seem that no big number is random.
Thumbo
27th July 2005, 01:20 PM
Originally posted by The idea
Suppose a program specifies that some particular big number (for example, a hundred decimal digits) is compressed to (or encoded as) some particular small number (for example, twenty decimal digits). For any particular big number and small number combination, there is such a program. Thus, it would seem that no big number is random.
A non-mathematicians answer, which will no doubt get shot down in flames by the experts here...
Although the output of the program is 20 digits, to re-create the original 100 digits you need more than just those 20 digits: you also need to program itself, which can also be expressed as a string of digits of some length.
So to refine the previous explanation, the sequence of 100 digits is apparently random (or at least, not shown to be non-random) if the 20 output digits plus the digits in the string representing the decompression program itself equals more than 100.
The program itself carries information which has to be included when deciding on the compressibility or otherwise of the digit string.
69dodge
27th July 2005, 02:05 PM
Originally posted by Thumbo
A non-mathematicians answer, which will no doubt get shot down in flames by the experts here...No, your answer is exactly right.
69dodge
27th July 2005, 02:41 PM
Originally posted by ReFLeX
are you sure that applies to using the algorithm for hypothesis testing, that the numbers don't need to be random?How would a computer know whether the numbers it's given are "really random" or not? It just does whatever it does, on whatever numbers it's given. If you give it the same numbers, it follows the same steps, and comes up with the same answer, regardless of where you got the numbers from originally.?!? This is the same misconception as Soapy Sam's. Randomness is not a property of a number. It is a property of the method of arriving at that number.We could give a different name, like "compressibility", to the property of the number itself, to avoid confusion with the concept of a random process. But compressibility corresponds pretty well to one's intuitive idea of a random-looking number. If a number's digits contain any sort of nonrandom pattern to them, the number will be more or less compressible, depending on exactly how nonrandom the pattern is. And the output of a random process will (probably) not be compressible. So all the concepts are closely related.
ReFLeX
27th July 2005, 05:56 PM
Ok, I can't find my stats binder, so I'm going to try to explain how I understand Monte Carlo simulations. It would help if you are familiar with the t-test, which is the standard statistical test to see if study results are significant. Anyway, what it tells you is how likely it is that an effect has been measured, in, say, a double-blind medication test. If the scores are 2, 2, and 2 in the control group and 3, 3, and 3 in the treated group, then it will render a high probability that there is an effect being measured. But, it has a limited power for a number of reasons I can't remember. I do know that it uses a fixed table of numbers. A better way to find out if your results are significant is the Monte Carlo method. That is when you use a computer to assign a label to each datum, (ostensibly, the label could be a number) and then randomly re-assign them all to either the treated or control group. Somehow, it takes note of the result of doing that, I'm blurry on the details. But then it repeats the re-assignment again and again and this is what renders whether the treatment has an effect or not. Right now it's all slipped away from me, but you do see why they need to be randomly assigned. With only 20 participants there are millions of possible re-assignments, which is why a computer is needed to do them.
...see what you make of that, because I believe we are talking at cross-purposes here. Sorry for the horrific explanation, I would have much rather looked it up, but maybe this way will clear up my understanding as well.
The idea
27th July 2005, 07:08 PM
Originally posted by Thumbo
So to refine the previous explanation, the sequence of 100 digits is apparently random (or at least, not shown to be non-random) if the 20 output digits plus the digits in the string representing the decompression program itself equals more than 100.
Of course, the program itself has to be expressed in some language, so I have to ask the following.
Are there some restrictions that the language must satisfy for it to be considered a legitimate language?
The concern is that if the language allows various particular, complicated programs to be expressed in very concise form, then the fact that those programs can be expressed in concise form may tell us more about the language than about the programs.
69dodge
27th July 2005, 09:49 PM
The Wikipedia article on algorithmic information theory (http://en.wikipedia.org/wiki/Algorithmic_information_theory) addresses the issue of different languages. The complexity of a string differs only by a constant between different languages, regardless of the length of the string. So, for sufficiently long strings, which language you choose hardly matters.
Whichever language you choose, you're supposed to stick with it for all strings.
frozenfred
29th July 2005, 09:45 AM
From what I remember, "Monte Carlo" method just means simply running the experiment over and over, and tabulating the results to determine the probabilities. if i wanted to figure out the statistical results of rolling two dice and adding them up (ala Craps), i could
a) do some math (not Monte Carlo)
b) physically roll the dice 10,000,000 times and plot the results and look at it (Monte Carlo method by definition)
c) write a computer program to SIMULATE the rolling of the dice 10million times, and look at the results (also Monte Carlo)
Now, as to whether computers can REALLY generate random numbers, I'd say no. If i knew everything the computer did before the generation of the number, then i could tell you 100% of the time what the next value would be.
No matter how much info i have about physical dice, the table, wind, humidity, dust particles in the air... i would NOT be able to accuratly predict the next roll.
Now, Computers can generate SIMULATED random numbers. What this is often defined as is that given enough samples, every possible outcome is represented equally.
If i write a program that generates the numbers 1-10, as the number of trials approaches infinity, the distribution of the 10 outcomes approaches a flat line.
Note that that doesn't mean the absolute number of outcomes for each value will all aproach 1/10 of the total, or that the outcomes will all become equal. In fact, it's possible that the values become further and further apart.
Take a coin flip. if i flip it 10 times, i may end up with 7 heads and 3 tails. if i flip it 1000 times, i may end up with 527 head and 473 tails. i've gone from a difference of 4 to a difference of 54. but the percentages actually get closer to even... from 70/30 to 52.7/47.3.
as the number of trials increase, i would expect the percentages to get closer and closer to 50/50, while the absolut difference keeps getting bigger.
Floyt
1st August 2005, 03:40 PM
Originally posted by ReFLeX
*snip*
A better way to find out if your results are significant is the Monte Carlo method. That is when you use a computer to assign a label to each datum, (ostensibly, the label could be a number) and then randomly re-assign them all to either the treated or control group.
*snip*
I think what you are referring to here is the process of bootstrapping. IIRC this is not a Monte Carlo approach in the strict sense, as it does not deal in probabilistic outcomes. The rationale is that if, due to a small sample size, you do not have a good idea of your data's distribution, the best indicator of that distribution is (duh) the data itself. So the algorithm re-samples the entire data set a number of times, re-distributing the data at random each time. Thus a larger data set than you actually have available is simulated, which might enable you to draw better conclusions.
pmurray
4th August 2005, 08:08 PM
Originally posted by DrDave
So he got a coin and flipped it 128 times and recorded heads as 1, and tails as 0.
The real daft bit came at the end when he looked at the number and decided it still wasn't random enough, and swapped a bunch of the digits
Doing that reduces the entropy of the sequence. It reduces the set of possible sequences from "all 128-bit sequences" to "all 128-bit sequences that also ook random".
Gestahl
9th August 2005, 05:05 PM
It's cute when non-computer scientists try to answer computer questions...
Most random numbers are generated these days with an entropy pool. This pool is derived from network packet timings, interrupt timings from keyboards and mice, and other non-deterministics (from the point of view of the computer). It only switches to the PRNG when there is not enough entropy data to use for generating the random numbers. In fact, the entropy pool data is used to seed the PRNG, and when there is no entropy, the previous few random numbers are used. In this way, most random number generators on a computer will pass every single statistical test for randomness you care to choose.
For even more randomness, CDs of random data are generated from nuclear decay and other *absolutely* random data. I think there is even a web service where you can do it, but I am too lazy to do a google search for you.
Neat experiment if you can find a UNIX box...
$cat /dev/urandom
You should see tons of data, and then it stops. Move your mouse, and new data can be generated. If you use /dev/random (non-garaunteed randomness), the output never stops.
Rasmus
9th August 2005, 05:26 PM
Originally posted by Gestahl
I think there is even a web service where you can do it, but I am too lazy to do a google search for you.
Since I looked for this recently and know what to google (http://www.google.com/search?q=random+numbers) for:
random.org (http://www.random.org/)
Rasmus.
© 2001-2009, James Randi Educational Foundation. All Rights Reserved.
vBulletin® v3.7.5, Copyright ©2000-2010, Jelsoft Enterprises Ltd.