PDA

View Full Version : JREF Challenge Statistics


Pages : [1] 2

T'ai Chi
5th February 2006, 08:43 AM
I've been thinking about this for quite some time, and finally put up a webpage on it

http://www.statisticool.com/jrefchallengestats.htm

Does anyone believe that after 1,000+ tests that are statistical in nature are carried out, that anyone will win by chance?

This applies not only to the JREF preliminary tests, but also similar tests by other skeptical organizations.

Mercutio
5th February 2006, 09:20 AM
"will"? no. "could"? yes.

T'ai Chi
5th February 2006, 10:01 AM
Hypothetically, how about after 10,000 tests?

At what point would the not winning by chance be 'odd' ?

Pyrrho
5th February 2006, 10:44 AM
Chance is only involved as it pertains to the actual tests, as in measuring results against those expected by chance. In uncontrolled tests we might expect someone to "win" by chance, but these are controlled tests presumably designed to eliminate chance. In these tests, "win" is equivalent to "demonstrating paranormal ability" as defined in each test. It's not like playing slot machines. Given that, I would not expect anyone to "demonstrate paranormal ability" by chance, especially if the test conditions are properly designed. We're talking about abilities that violate the laws of physics, after all, and there is no good reason, under proper, scientifically controlled conditions as is done with the JREF Challenge tests, to expect a chance observance of such a violation.

Mercutio
5th February 2006, 10:45 AM
Obviously, that depends on the alpha levels of each test, which will vary depending on claim. Remember, each test is the test of a specific claim, and to arbitrarily assign a different alpha level to your overall analysis is unfounded. If a claimant says he can do something 9 times out of 10, the test can end in failure with even an average of 8 out of 10; the test can thus end considerably earlier than it would if we were merely testing something against chance alone. Those data are not then appropriate to test against chance alone.

Combining many short runs is not the same as taking one long run--Rhine's lab made that mistake, and (at least in his casual talks) so does Sheldrake.

In sum, your proposed test is not a very good method of analyzing the test data.

ETA: Pyrrho beat me to it.

rwguinn
5th February 2006, 10:48 AM
Hypothetically, how about after 10,000 tests?

At what point would the not winning by chance be 'odd' ?

Each test is a "Stand Alone" event. Like throwing dice or flipping a coin, the current test is not dependent on the previous test. The probability of "Success by chance" does not change after any number of tests--unless the current testee has learned new tricks from the previous tests...

eri
5th February 2006, 11:18 AM
In addition, to win the million, they have to be able to pass the preliminary test AND the final test. So even if they managed to pull off 10,000/1 odds once, they'd have to do it again.

T'ai Chi
5th February 2006, 11:55 AM
In uncontrolled tests we might expect someone to "win" by chance, but these are controlled tests presumably designed to eliminate chance. In these tests, "win" is equivalent to "demonstrating paranormal ability" as defined in each test. It's not like playing slot machines. Given that, I would not expect anyone to "demonstrate paranormal ability" by chance, especially if the test conditions are properly designed. We're talking about abilities that violate the laws of physics, after all, and there is no good reason, under proper, scientifically controlled conditions as is done with the JREF Challenge tests, to expect a chance observance of such a violation.

I think you're misunderstanding something. In controlled tests one would still expect someone to win by chance. Ask yourself what alpha means. It involves probability.

Crispy Duck
5th February 2006, 12:33 PM
I think you're misunderstanding something. In controlled tests one would still expect someone to win by chance. Ask yourself what alpha means. It involves probability.

Most of the test protocols I've read here are specifically designed to ensure that the odds of winning by chance are around 1 in a million. This is easily achieved, given that most tests are also specifically designed to consist of a series of judgement-free hit-or-miss actions by the testee. For example, if a dowser is trying to identify one pot containing water from twenty, the odds of getting it right by guessing are 1 in 20. That's too easy, so one could ask him to repeat the task to give 1 in 400 odds, then again three more times for total odds of 1 in 3.2 million.

So, assuming odds of 1 in a million for an average preliminary protocol, it would still be 100 to 1 against someone winning by chance after 10,000 tests have been done.

One would hope that, long before 10,000 tests have been done, society will have given up this sort of superstitions anyway... :)

(My glass is half-full)

T'ai Chi
5th February 2006, 12:44 PM
Most of the test protocols I've read here are specifically designed to ensure that the odds of winning by chance are around 1 in a million.


I'm specifically talking about the preliminary tests. I'ev been informed that the alpha for these is typically .001.

Mercutio
5th February 2006, 01:00 PM
What percentage of the applicants have a claim that is statistical in nature?

T'ai Chi
5th February 2006, 01:04 PM
What percentage of the applicants have a claim that is statistical in nature?

Great question. The numbers of tests that are statistical in nature aren't easy to get at, so who knows.

Hitch
5th February 2006, 01:21 PM
How many preliminary tests were conducted last year (2005)? I may have missed or forgotten some, but I think it was three. Achau Nguyen, Angela Patel, and Deja Gateward. (Somebody correct me if I missed any.)

Given the rate that applicants devise a protocol that can be agreed upon, it will take a very long time before 10,000 tests are conducted and somebody beats the challenge on a fluke.

T'ai Chi
5th February 2006, 01:26 PM
it will take a very long time before 10,000 tests are conducted and somebody beats the challenge on a fluke.

I agree. 10000 was just a hypothetical number to illustrate, because one would expect, on average, someone to pass by chance after 1000, if alpha is typically .001 for such tests.

I start talking about the JREF preliminary tests that are statistical in nature, because the JREF tests are the most well known, but also consider similar tests done by many skeptical organizations for my argument.

Pyrrho
5th February 2006, 01:31 PM
I think you're misunderstanding something. In controlled tests one would still expect someone to win by chance. Ask yourself what alpha means. It involves probability.
Perhaps you could explain what alpha means, for those who do not know, and how that definition supports the expectation that someone would "win" by chance. I don't accept the chore of explaining alpha. You're the statistician; educate us, please.

In this example:

http://www.randi.org/jr/032902.html


We numbered ten JREF coffee-mugs from 1 to 10 on the outside bottoms. For the baseline part of the test (20 "open" trials in which all those present would know in which cup the target had been placed) Mike was first asked to choose one of ten face-down shuffled cards bearing numerals from 1 to 10, and that choice would designate where the target would be placed, each time. I had asked him to carefully "scan" the floor area of our library in advance to make sure there were no distracting elements present, and he himself carefully chose the positions of each of the ten cups on the floor. He was encouraged by me to adjust the placement of the cups as many times as he needed to, during this phase. He'd told us, first, that at least five feet of separation was required between each cup, but that he could work with just three feet between them. I immediately insisted that he must use at least five feet, since I did not want to allow an excuse later on that the spacing had been inadequate. As it turned out, he chose to have some cups within a foot of one another. But we could not interfere with his choice, since he assured us that all was sufficient for his needs.

Mike also asked that several metallic objects (trophy cups, plaques, steel devices) be removed from the bookshelves nearby. At his request, a teaspoon was taken to the next room because he said that the silver could also attract his stick; that spoon was made of aluminum. But, again, we did not correct his statements.

For the "open" phase of the preliminary test procedure, the target package was placed in the designated cup, which was then openly placed in the spot Mike had chosen for it, mouth-down. He then scanned all ten cups, and declared — both by pointing and verbally — where he believed that his stick had detected the target. Another number was then selected, and the procedure was repeated, twenty times in all. His score was 100% in these "open" tests.

Pause. Let me explain here the purpose of the baseline test of twenty "open" detections, in which the location of the target is known in advance. It served five distinct purposes, which is why we always use such a procedure:

(1) The performer has the opportunity to try out the setup, and make any necessary changes, adjustments, or re-locations that he thinks are needed. Mr. G. changed the location of the ten cups on the floor many times before the "open" detection trials were completed, and finally declared his total satisfaction with the placements, and with the conditions.

(2) The process of randomizing numbers, etc., which is sometimes unfamiliar or unknown to the performer, becomes clear. For Mike, we prepared ten cards bearing numbers from one to ten, shuffled them face-down, and asked him to choose one for each test.

(3) The performer becomes familiar with the sequences and rules of the test. With Mike, we changed only one factor: we began with plastic cups, but because of the bulk of the target package, we switched to using the JREF coffee mugs.

(4) The performer has the opportunity of deciding for himself — in the "open" tests — whether it's his powers, or just his foreknowledge of the answer, that is actually at work. Mike was convinced of the former.

(5) After the "blind"test is done, following the "open" series, the performer cannot offer the excuse that his powers were not working at this time. Mike obtained 100% results during the "open" test, quickly and positively, showing that he was quite able to use his powers.

Following the "open" sequence, for each of the "blind" tests, Mr. G. and I stepped out of the library area, and two other persons randomly (by choosing a face-down card, as before) placed the target package in position, then they left the area and informed us that the target was in place. Mike and I re-entered, alone, and he made his determination while I watched carefully to be sure that he did not nudge any cups, or otherwise attempt to use any means but the movements of his forked stick, to make his guess; at no time was any such procedure observed. After Mike made his guess on each trial, the other two persons were invited back in, and we recorded the results. That procedure was repeated ten times.

...

The results were that when Mike G. knew the location of the concealed target (the "open" tests), he obtained 100% results. When the test procedure was double-blinded, he obtained exactly what chance alone would call for: one out of ten correct.


What's the alpha for these tests, and how it is calculated? I think there's enough information there for a calculation to be made.

T'ai Chi
5th February 2006, 01:43 PM
What's the alpha for these tests, and how it is calculated? I think there's enough information there for a calculation to be made.

One doesn't calculate alpha. You set alpha before the test is done.

It is the probability of making a type 1 error, that is, the probabilty of rejecting the hypothesis tested when the hypothesis is in fact true.

Typically it is .05, but for more extreme claims it is typically set lower, to .01 or lower. For the preliminary tests it is typically .001.

As far what alpha in that specifc preliminary test was set to, you'd have to ask the JREF. They don't make such numbers readily available, so I don't know.

eri
5th February 2006, 02:17 PM
rwguinn made the best point, I think. The tests are seperate events. Someone losing one does not increase the chance of someone else winning. If you flip a coin 100 times and get all heads, the probability of you getting heads the next time is still 1/2.

T'ai Chi
5th February 2006, 02:23 PM
If alpha = .001 for each test, and the tests are sufficiently similar, you'd expect someone who doesn't have paranormal powers to pass the test 1 in 1000 times on average.

rwguinn
5th February 2006, 08:33 PM
If alpha = .001 for each test, and the tests are sufficiently similar, you'd expect someone who doesn't have paranormal powers to pass the test 1 in 1000 times on average.
That ASS-U-MEs that they all take exactly the same test.

T'ai Chi
5th February 2006, 09:46 PM
That ASS-U-MEs that they all take exactly the same test.

You're incorrect. "Similar" does not necessarily mean "exactly the same".

TV's Frank
5th February 2006, 10:05 PM
Completely unrelated, but does T'ai Chi's avatar give anyone else a headache?

Pope130
5th February 2006, 10:43 PM
Completely unrelated, but does T'ai Chi's avatar give anyone else a headache?
More of an eyeache than a headache in my case, but I do find it distracting and anoying. I suppose if you choose an avatar to reflect your personality this sort of thing is bound to happen.

Robert

T'ai Chi
5th February 2006, 10:47 PM
I suppose if you choose an avatar to reflect your personality this sort of thing is bound to happen.


Please try to stay on topic and not resort to veiled insults.

I forgive you.

SixSixSix
5th February 2006, 11:15 PM
It does seem theoretically possible.

Let's take a simple case where someone says that they can predict whether a coin will come up heads or tails. A test is designed whereby they don't get to touch or even see the coin (they assure us that this will not affect their powers). Assume you require 10 coin tosses, and they need to get all of them right to pass the preliminary test. As any computer programmer could tell you, that's a 1/1024 chance by chance alone - not by any means impossible.

For the actual test, let's say that they double the number of coin tosses. That's a 1/1048576 (and yes, I'm enough of a geek to type that from memory) of chance coming to the rescue, and 1/1024 * 1/1048576 is slightly less probable than 1 in a billion.

I have no idea what the tests look like, but I'm guessing that 1 in a billion chances are a fair bit less than what would be accepted as some sort of positive demonstration. While it is not likely that even 10000 people or more will pull this off... it's not impossible by any means.

Mercutio
6th February 2006, 04:54 AM
The trick is, 666, the test will depend on the claim. If the claim is that a person can predict 90% of coin tosses, it can be falsified much more easily than a claim of 55%. But the trick is, a test that fails to achieve a claimed 90% may be ended before sufficient data are collected to answer the 55% claim; as such, it would be inappropriate to include those data in a simple "against chance" test. This would have nothing to do with whether the person had any powers at all, but simply a practical consideration of test design.

drkitten
6th February 2006, 07:18 AM
I'm specifically talking about the preliminary tests. I'ev been informed that the alpha for these is typically .001.

Apparently you were (ahem) misinformed.

The standardized alpha cutoff of 0.001 for a preliminary test is JREF's nominal maximum that they will accept for for a preliminary test, when it is practical to calculate.

Depending upon what the claim is, the claimant may suggest something that is much less probable than 0.001, or even something for which calculating an alpha cutoff is impractical because we can't even determine a baseline situation.

As a simple example, a recently accepted protocol involved the claimant suggesting he could summon UFOs. Offhand, I don't know how to estimate the a priori probability of something that has never been reliably seen in human history, but I suspect it's much less than 0.001. Similarly, if I claim to be able to levitate for thirty seconds without any physical support, that would certainly be a paranormal claim, almost certainly be accepted by the JREF, and much less likely than the nominal alpha cutoff. On the other hand, if I claim to e able to detect whether a given person (perhaps by being given a personal article and using some form of psychperception), we can directly calculate the probabiliy of my getting N correct answers simply by guessing, and set N to be "high enough" to give us the desired cutoff.

Until and unless we can calculate actual alpha cutoffs for each test as it is performed, we will not be able to assess the overall probability that the JREF challenge will be met "by chance alone."

drkitten
6th February 2006, 07:25 AM
In this example:

http://www.randi.org/jr/032902.html



What's the alpha for these tests, and how it is calculated? I think there's enough information there for a calculation to be made.

There is not enough information there for us to calculate an alpha value.

The test description makes it very clear what "chance" performance is : random guessing wll result in "Mike" finding one item out of ten correctly, a 10% chance (or more formally, p = 0.10 per trial). Since the trials are independent, the chance of him getting two out of two correctly would be 0.01 (0.10^2), and more generally, the chance of him getting N out of N correctly would be (0.10^N).

However, we are not told how many he would have needed to get correctly to succeed on this test. If he were required to correctly find all ten items, the alpha cutoff would be 0.10^10, or 0.0000000001, one in ten b-for-billion. If he were only required to get eight out of ten, the mathematics gets a little more complicated and I'd have to pull out the binomial distribution to answer it. (Please don't make me do math. You wouldn't like me when I do math.)

ChristineR
6th February 2006, 07:29 AM
Apparently you were (ahem) misinformed.

The standardized alpha cutoff of 0.001 for a preliminary test is JREF's nominal maximum that they will accept for for a preliminary test, when it is practical to calculate.

Depending upon what the claim is, the claimant may suggest something that is much less probable than 0.001, or even something for which calculating an alpha cutoff is impractical because we can't even determine a baseline situation.

As a simple example, a recently accepted protocol involved the claimant suggesting he could summon UFOs. Offhand, I don't know how to estimate the a priori probability of something that has never been reliably seen in human history, but I suspect it's much less than 0.001. Similarly, if I claim to be able to levitate for thirty seconds without any physical support, that would certainly be a paranormal claim, almost certainly be accepted by the JREF, and much less likely than the nominal alpha cutoff. On the other hand, if I claim to e able to detect whether a given person (perhaps by being given a personal article and using some form of psychperception), we can directly calculate the probabiliy of my getting N correct answers simply by guessing, and set N to be "high enough" to give us the desired cutoff.

Until and unless we can calculate actual alpha cutoffs for each test as it is performed, we will not be able to assess the overall probability that the JREF challenge will be met "by chance alone."


Excellent point, and I was going to add that you cannot really calculate the statistical odds of a success by deception, or by delusion. That is, there is no reason to assume that all of the candidates are using random guessing as their strategies, so there is no reason to believe that their results will be randomly distributed. Of course the JREF works to eliminate any correlation between these other factors and actual results, so over a large sample the results will be random. But the samples in question are not really large because every test is different.

CFLarsen
6th February 2006, 07:31 AM
I've been thinking about this for quite some time, and finally put up a webpage on it

You write on your page:

I think it would be interesting if they made the data more accessible. Not everyone can afford to fly to Florida, forget about their job and etc., and spend what most likely would be weeks searching through paper files.

Making the data more accessible will require money. Are you a paying member of JREF?

Each test is a "Stand Alone" event. Like throwing dice or flipping a coin, the current test is not dependent on the previous test. The probability of "Success by chance" does not change after any number of tests--unless the current testee has learned new tricks from the previous tests...

rwguinn made the best point, I think. The tests are seperate events. Someone losing one does not increase the chance of someone else winning. If you flip a coin 100 times and get all heads, the probability of you getting heads the next time is still 1/2.

Correct. The page is based on the flawed assumption that after a string of "heads", there will be a bigger chance of "tails".

A rookie error only someone totally ignorant of statistics would make.

drkitten
6th February 2006, 07:45 AM
Correct. The page is based on the flawed assumption that after a string of "heads", there will be a bigger chance of "tails".

I disagree. T'ai Chi is fairly explicit about the hypothesis that he is testing, and it's not related to the gambler's fallacy at all:


This information could allow one to test the incredible notion that Randi, or skeptics in general, exert a "negative energy" on those they are testing, and cause the results to be worse than what one would expect.

My reading is that this is yet another thinly-disguised accusation of cheating on the part of the JREF. The idea, of course, being that if we had seen 10,000 preliminary tests at a nominal alpha cutoff of 0.001, then the results are being biased against success, or alternatively that Randi & Co. are not giving claimants a fair shot at the million. In principle, this is no different than my noticing that almost no one manages to find the lady at the three-card Monte game down on the corner, and that therefore it's probably rigged.

Unfortunately, we don't yet have a sufficient sample size to be able to make any meaningful determinations, and at the current rate of three or four preliminary tests per year, I don't expect to have enough data within my lifetime. Nor should T'ai, unless he expects his martial arts practice to grant him a supernaturally prolonged life.

ChristineR
6th February 2006, 07:48 AM
You write on your page:



Making the data more accessible will require money. Are you a paying member of JREF?





Correct. The page is based on the flawed assumption that after a string of "heads", there will be a bigger chance of "tails".

A rookie error only someone totally ignorant of statistics would make.


Except that the tests are NOT independent. Each claiment has the opportunity to learn from previous claiments. This could be signifgant if the claiment plans to cheat, or if the claiment really has psychic powers, but they happen to be tempremental.

Jekyll
6th February 2006, 07:49 AM
Correct. The page is based on the flawed assumption that after a string of "heads", there will be a bigger chance of "tails".

A rookie error only someone totally ignorant of statistics would make.
To be fair, this does mean that the probability of someone passing one of the first 10,000 preliminary tests is not just based on an estimation of the pass rate but our knowledge of existing failures.

I can see how this could be confusing for the poor chap.

eri
6th February 2006, 08:00 AM
Except that the tests are NOT independent. Each claiment has the opportunity to learn from previous claiments. This could be signifgant if the claiment plans to cheat, or if the claiment really has psychic powers, but they happen to be tempremental.

I can see how it might be useful if they plan to cheat, but the test will most likely be given by different people and have an at least slightly different protocal, so I'm not sure knowing that other people failed will help all that much.

When I think of results that depend on previous results, I picture pulling numbers out of a hat - for each one you pull out, the chance of getting a specific number increases.

T'ai Chi
6th February 2006, 08:06 AM
The standardized alpha cutoff of 0.001 for a preliminary test is JREF's nominal maximum that they will accept for for a preliminary test, when it is practical to calculate.


First, one doesn't calculate alpha, one sets it before the experiment. Second, ... OK? I'm not sure how this goes against what I've been saying that .001 is the typical alpha for a preliminary tset.

For example, from http://www.randi.org/jr/08-24-01.html, Randi writes


As always, as described in the rules, a preliminary test for the JREF prize would be performed. That test would have odds of only 1 in 1,000 against the results being positive by chance alone. Should your product pass this preliminary test, we would be prepared, as outlined in our published rules, to go to the second and final test for the million-dollar prize.



Similarly, if I claim to be able to levitate for thirty seconds without any physical support,...


I'm not talking about all possible preliminary tests, but only those that are statistical in nature, and I make this very clear. The claims you describe above are not statistical in nature, they're not like having 10 cups, with gold under one of them, and the person gets 20 trials, etc.

T'ai Chi
6th February 2006, 08:08 AM
Are you a paying member of JREF?


What I spend my money on is none of your business.

T'ai Chi
6th February 2006, 08:14 AM
My reading is that this is yet another thinly-disguised accusation of cheating on the part of the JREF.


I'm not sure why you have the need to read something sinister into it.

The 2 things that one would hope to get out of seeing such data are:

1) the actual data! It would be nice to actually see some

2) testing incredible claims that skeptics make results lower than expected by chance

These are scientific matters, not pot-shots at JREF.

CFLarsen
6th February 2006, 08:22 AM
What I spend my money on is none of your business.

Indeed. But if you request something that will cost JREF money, the very least you could do is to support JREF with money.

If not, you insist that others pay for what you want.

CFLarsen
6th February 2006, 08:24 AM
Except that the tests are NOT independent. Each claiment has the opportunity to learn from previous claiments. This could be signifgant if the claiment plans to cheat, or if the claiment really has psychic powers, but they happen to be tempremental.
You are wrong. The tests are independent. Each claimant has a test designed to test the specific claim.

ChristineR
6th February 2006, 08:40 AM
You are wrong. The tests are independent. Each claimant has a test designed to test the specific claim.

The fact remains that claiments have an opportunity to learn from other claiment's test, or even their own previous tests. The tests are independent only if the the only factor in the outcomes of the tests is chance. Clearly chance is a big part of most tests, but the fact is that most claiments are NOT using random choice as their strategy.

One obvious example would be a person that looks at a test by a previous claiment and sees a way to cheat not anticipated by the JREF. This might inspire the new claiment to practice and then apply with the same protocol as the previous claiment.

drkitten
6th February 2006, 08:45 AM
First, one doesn't calculate alpha, one sets it before the experiment.


Wrong.

Second, ... OK? I'm not sure how this goes against what I've been saying that .001 is the typical alpha for a preliminary tset.

Because 0.001 is not the "typical" alpha, but the nominal maximum alpha.

Case in point, the experiment with Mike cited earlier. If Mike were required to get 20/20 correct to pass the preliminary test, the alpha cutoff would not be 0.001, but 0.0000000001, and we have no knowledge about whether Mike is "typical."

T'ai Chi
6th February 2006, 08:46 AM
Indeed. But if you request something that will cost JREF money, the very least you could do is to support JREF with money.

If not, you insist that others pay for what you want.

I'm merely raising the idea.

Speculating about possible costs is rather moot. You have no idea how much it would cost, neither do I.

T'ai Chi
6th February 2006, 08:51 AM
Wrong.


I have about 20 stat books that say that one doesn't calculate alpha, one sets it before the experiment. One calculates a p-value, but not alpha.


Case in point, the experiment with Mike cited earlier. If Mike were required to get 20/20 correct to pass the preliminary test, the alpha cutoff would not be 0.001, but 0.0000000001, and we have no knowledge about whether Mike is "typical."

If we had all the numerical details of this preliminary test that is statistical in nature, we wouldn't have to speculate.

Harlequin
6th February 2006, 08:55 AM
I think it's an interesting idea and I'd support the notion of publishing the actual results from every test, but it certainly would require a lot of work to go back through years of tests - work and money.
I've noticed a few times where Randi has referred to past tests and how the records have not been kept, so I suspect it would not be easy to compile a representative sample. You might find that only the really strange results (obviously not strange enough to pass...) were kept and not the ordinary, chance-level results.
I'd suggest that it would be less work to just start recording the results now for all future tests. If someone is keen and wants to go through the past 2-3 years worth of tests and take that as a starting set of data, it might also be worthwhile. That, of course, also takes money or a suitably-qualified volunteer.

How difficult would it be to start recording this type of information from now on?
Analysis will either show something interesting or it will confirm that there is nothing strange about the test results. Either way, it seems useful.

JMA
6th February 2006, 08:58 AM
Combining many short runs is not the same as taking one long run--Rhine's lab made that mistake, and (at least in his casual talks) so does Sheldrake.

Dum question but: Why is it not the same?

Harlequin
6th February 2006, 08:59 AM
I have about 20 stat books that say that one doesn't calculate alpha, one sets it before the experiment. One calculates a p-value, but not alpha.
If you are designing an experiment specifically for statistical analysis, you should base it on a pre-determined alpha.

If you have already performed an experiment without bothering to think about this ahead of time, you can calculate what exactly the alpha was for that experiment. It doesn't depend on results, but only on the structure of the experiment - probability of individual success, number of repetitions.

drkitten
6th February 2006, 09:00 AM
I have about 20 stat books that say that one doesn't calculate alpha, one sets it before the experiment. One calculates a p-value, but not alpha.

Then you didn't read the textbooks properly. I just calculated the alpha cutoff for Mike's experiment above, to the degree that it was possible with the information given.

Harlequin
6th February 2006, 09:03 AM
Dum question but: Why is it not the same?
One reason is: you could think of it sort of like adding up all the cumulative errors of each test.
It's sort of like measuring something really big with a small ruler. If your small ruler is a little bit off (say one mm), it will result in a larger error than if you use a large measuring tape, even if it has a slightly larger error. Especially when you include the fact that you will screw up a little bit everytime you take a measurement.

There are some other reasons why it is not the same, but they are sort of similar in concept.

ChristineR
6th February 2006, 09:03 AM
I have about 20 stat books that say that one doesn't calculate alpha, one sets it before the experiment. One calculates a p-value, but not alpha.



If we had all the numerical details of this preliminary test that is statistical in nature, we wouldn't have to speculate.

In practice with these oddball tests I think it's a combination of both. With the CSICOP test of Natasha Demikina for example, they started by finding six volunteers that had missing and/or extra body parts, then determined the alpha of 4 of 7 and 5 of 7, and set the bar at 5 of 7.

drkitten
6th February 2006, 09:06 AM
How difficult would it be to start recording this type of information from now on?

Already being done -- check the "Challenge Applications" thread.


Analysis will either show something interesting or it will confirm that there is nothing strange about the test results. Either way, it seems useful.[...]

To our great-great-great-grandchildren, perhaps.

The current rate of testing is fewer than five preliminary tests per year. Assuming first that all tests actually achieve the maximally permissive 0.001 nominal alpha cutoff, second, that this rate continues, and third that the data storage lasts long enough, we would expect "by chance" to see one success at the preliminary test sometime in the late 22nd or early 23rd century, although it wouldn't be surprising if we didn't see a success "by chance" for several centuries after that. If the JREF lasts that long, it will a) be extremely surprising, and b) provide excellent material for a senior thesis for a statistics major at Starfleet Academy.

Harlequin
6th February 2006, 09:07 AM
In practice with these oddball tests I think it's a combination of both. With the CSICOP test of Natasha Demikina for example, they started by finding six volunteers that had missing and/or extra body parts, then determined the alpha of 4 of 7 and 5 of 7, and set the bar at 5 of 7.
Actually, it was based on a decision that to get acceptable levels of alpha-risk (probability of a false positive), you needed 5 correct out of 7. Whatever the probability of that is. I can't be bothered to calculate, but I assume the probability of having a false positive is over 1 in 1000 for this.

Harlequin
6th February 2006, 09:09 AM
Already being done -- check the "Challenge Applications" thread.



To our great-great-great-grandchildren, perhaps.



Actually, I don't care so much about someone winning by chance. I believe it will never happen - unless they also cheat.

However, useful data could be obtained with something like 20 tests. This would tell us if the results overall are statistically significant - which would indicate either cheating on the part of the applicants, or the Evil Randi-rays if the results are abnormally low. Either one would be nice to know.

drkitten
6th February 2006, 09:12 AM
Actually, it was based on a decision that to get acceptable levels of alpha-risk (probability of a false positive), you needed 5 correct out of 7. Whatever the probability of that is. I can't be bothered to calculate, but I assume the probability of having a false positive is over 1 in 1000 for this.

Actually (my tables aren't that fine-grained, and I can't be bothered to do the calculations), it looks like getting 5/7 correct with an a priori probability of getting an individual trial right of 1/7 is about right on the money for an alpha cutoff of 0.001.

CFLarsen
6th February 2006, 09:13 AM
The fact remains that claiments have an opportunity to learn from other claiment's test, or even their own previous tests. The tests are independent only if the the only factor in the outcomes of the tests is chance. Clearly chance is a big part of most tests, but the fact is that most claiments are NOT using random choice as their strategy.

How do you know this?

One obvious example would be a person that looks at a test by a previous claiment and sees a way to cheat not anticipated by the JREF. This might inspire the new claiment to practice and then apply with the same protocol as the previous claiment.

Can you give an example?




I'm merely raising the idea.

No, you are not merely raising the idea. You have put so much effort into this that you have put up a webpage about it.

Speculating about possible costs is rather moot. You have no idea how much it would cost, neither do I.

It would definitely be outside the current budget. You want something, but refuse to pay for it. When you don't get what you want, you can continue to criticize JREF.

Which, I believe, is the sole reason for this thread and your page.

drkitten
6th February 2006, 09:14 AM
However, useful data could be obtained with something like 20 tests.

Huh? The expected number of successes in 20 tests with alpha cutoff of 0.001 is 0.020 -- one fiftieth of a success.

In practical terms, this means no successes are expected in the next twenty tests.

How would you detect the "Evil Randi-rays" lowering the number of expected successes below zero?

drkitten
6th February 2006, 09:18 AM
Can you give an example?
.

Be serious. Nadia what's her name got caught peering around a blindfold. Now I know that when I apply to take the test, I can't use the standard peering around a blindfold test, so I'll have to use something else like the concealed earbug trick.

Uri got caught using sleight of hand on cameras. Therefore I need to make sure that whatever I do will not show up on film.

Harlequin
6th February 2006, 09:22 AM
Huh? The expected number of successes in 20 tests with alpha cutoff of 0.001 is 0.020 -- one fiftieth of a success.

In practical terms, this means no successes are expected in the next twenty tests.

How would you detect the "Evil Randi-rays" lowering the number of expected successes below zero?

You need to actually use the results of the tests. If you know that one applicant got 6 correct out of 10, it tells you a lot more than just knowing that they failed to get 8 out of 10 (or whatever the agreed performance should have been).

Finding out that, in total, applicants are significantly more successful than chance would be its own form of test. Of course it could just indicate that there is a lot of cheating in these tests...

ETA: this boils down to the difference between using discrete data vs. continuous data for analysis. Just counting pass/fail means you need a huge amount of trials to get useful results.

ChristineR
6th February 2006, 09:27 AM
How do I know what? As for learning, a great deal of information about the tests is made public. I think we can assume that many claiments have privately practiced tests that have been done in the past. I also assume that this has discouraged many claims. Seems like there used to be more.

As for not using random chance as a strategy, there are basically two sorts of claiments: those who are trying to cheat, and those who think they have paranormal power. Clearly cheaters are not using guessing as a strategy.

Those who think they have powers are using a variety of strategies, some wholly internal (I got a feeling...) and some external. The JREF strives to eliminate any correlation between external clues and the outcomes, but it's clear that the guessing itself is NOT random in most cases. It's based on psychological factors that are hard to pin down.

If I could give an example of an existing protocol that I knew how to cheat, I might decide to earn a million dollars BEFORE I made the flaw public. That was a hypothetical case to show that claiments do in fact vary their strategies based on the experiences of previous claiments. Apparently some of the recent dowsing tests involved someone trying to cheat, but they were caught. Over and over again we hear claiments say "I know now why I failed, I'm going to practice so I can get it right next time."

Harlequin
6th February 2006, 09:27 AM
I personally would like to know how much cheating actually goes on in these tests. If we find that there is not statistically significant variance from chance in the overall results of all applicants, then we can probably say we're doing a good job on stopping the cheaters.

Results like that of Natasha's show us that cheating is still slipping through (I suspect), so I'd like to know how bad the problem is.

This kind of analysis is a good way to tell.

Mercutio
6th February 2006, 10:36 AM
Dum question but: Why is it not the same?
Very good question, actually, given that it has led to problems in the parapsychology research field.

Short answer for now, because I have a class to get to:

Flipping 10 coins, you expect 5 H, 5 T. But, with only 10 coins, 6, 7, or 8 H is not that rare at all, and 9 or 10, although rare, certainly something you might find if you spent just one day flipping coins.

Flipping 100 coins, you expect 50 H...but it is much more difficult to get the same percentage of H as in the smaller sample. 60H you might find, but 70 is already very rare, 80 you probably won't find in several days' attempts at flipping 100 coins in a row. 90 or 100 could take you weeks. (Basically, with just 10 flips, you only need to be off by 4 from a priori probability in order to get 90%; if you flip 100, you need to be off by 40 flips to get the same percentage. A much more difficult task.)

The same discrepancy exists between any 2 sample sizes. If we are trying to demonstrate our ability based on flipping 100 coins, we cannot compare our result simply to .5, or even to the distribution arrived at by flipping 1000 coins a sufficient number of times to generate an empirical sampling distribution (or by mathematically deriving the same sampling distribution, using s/rootN). Rhine's lab originally allowed subjects to end trials when they chose; by always ending after a run of successes, the accumulated data could (if compared to the large-N null probability of the accumulated scores) achieve statistical significance.

Bottom line--compare small runs to small runs, large runs to large runs. A bunch of small runs put together must be compared to the more varied distribution that is appropriate.

CFLarsen
6th February 2006, 10:46 AM
How do I know what?

That most claimants are NOT using random choice as their strategy.

As for learning, a great deal of information about the tests is made public. I think we can assume that many claiments have privately practiced tests that have been done in the past. I also assume that this has discouraged many claims. Seems like there used to be more.

As for not using random chance as a strategy, there are basically two sorts of claiments: those who are trying to cheat, and those who think they have paranormal power. Clearly cheaters are not using guessing as a strategy.

Those who think they have powers are using a variety of strategies, some wholly internal (I got a feeling...) and some external. The JREF strives to eliminate any correlation between external clues and the outcomes, but it's clear that the guessing itself is NOT random in most cases. It's based on psychological factors that are hard to pin down.

Neither group is using guessing as a strategy.

If I could give an example of an existing protocol that I knew how to cheat, I might decide to earn a million dollars BEFORE I made the flaw public. That was a hypothetical case to show that claiments do in fact vary their strategies based on the experiences of previous claiments. Apparently some of the recent dowsing tests involved someone trying to cheat, but they were caught. Over and over again we hear claiments say "I know now why I failed, I'm going to practice so I can get it right next time."

No examples, then.

T'ai Chi
6th February 2006, 12:59 PM
Then you didn't read the textbooks properly. I just calculated the alpha cutoff for Mike's experiment above, to the degree that it was possible with the information given.

Find one cite that says one calculates alpha.

T'ai Chi
6th February 2006, 01:01 PM
No, you are not merely raising the idea.


Yes, I am merely raising the idea.


It would definitely be outside the current budget. You want something, but refuse to pay for it. When you don't get what you want, you can continue to criticize JREF.


I am not "refusing to pay for it". Not sure where you got that absurd idea.


Which, I believe, is the sole reason for this thread and your page.

Everyone is entitled to their silly beliefs. :D

69dodge
6th February 2006, 01:07 PM
Rhine's lab originally allowed subjects to end trials when they chose; by always ending after a run of successes, the accumulated data could (if compared to the large-N null probability of the accumulated scores) achieve statistical significance.I don't see how that would work. The coin doesn't know when we've decided to end a trial; it just does what it does. How can we get it to come up heads more often than it otherwise would, just by saying "ok, that was the end of that trial, now we're starting a new trial"?

Or, to put it another way (which is really the same way even though it doesn't sound like it), if we don't stop a trial until it has more than the expected number of successes, some of our trials will turn out to be very long, which will wash out the statistical significance of the excess of sucesses.

I think.

Maybe.

No?

drkitten
6th February 2006, 01:09 PM
Find one cite that says one calculates alpha.

Me, with an example, a half-dozen posts upthread.

drkitten
6th February 2006, 01:12 PM
I don't see how that would work. The coin doesn't know when we've decided to end a trial; it just does what it does. How can we get it to come up heads more often than it otherwise would, just by saying "ok, that was the end of that trial, now we're starting a new trial"?

The coin doesn't know, but the person performing the experiment does.


Or, to put it another way (which is really the same way even though it doesn't sound like it), if we don't stop a trial until it has more than the expected number of successes, some of our trials will turn out to be very long, which will wash out the statistical significance of the excess of sucesses.

I think.

Maybe.

No?

No. Because you simply have to wait for a long enough run, which we know will happen eventually (by the Drunkard's Walk theorem). Even if you don't wait that long on this trial, the cumulative effect of a half-dozen positive-but-not-significant experiments might be enough to produce an overall finding of significance in the hands of a sufficiently corrupt statistician.

CFLarsen
6th February 2006, 01:20 PM
Yes, I am merely raising the idea.

Bull.

I am not "refusing to pay for it". Not sure where you got that absurd idea.

I have yet to see you state that you are willing to pay for it.

Everyone is entitled to their silly beliefs. :D

Time will tell. You have tried, again and again, to chip away at skeptics and Randi in particular. I have yet to see any indication that you have honest intentions.

drkitten
6th February 2006, 01:24 PM
Yes, I am merely raising the idea.

Your previous posting history argues strongly against this.

CFLarsen
6th February 2006, 01:26 PM
Your previous posting history argues strongly against this.
I would call it evidence.

T'ai Chi
6th February 2006, 01:27 PM
Me, with an example, a half-dozen posts upthread.

No dude, a stat book.

drkitten
6th February 2006, 01:28 PM
No dude, a stat book.

No, dude. An example.

T'ai Chi
6th February 2006, 01:30 PM
Bull.


We disagree.


I have yet to see you state that you are willing to pay for it.


I have yet to state that I'm buying red shoes tomorrow, therefore I am apparently buying red shoes tomorrow. :D

As mentioned to you, we have no idea of costs. Talking about costs is rather moot.


Time will tell. You have tried, again and again, to chip away at skeptics and Randi in particular. I have yet to see any indication that you have honest intentions.

I'm not interested in your personal beefs and dredging of bulletin board soap operas from the past. If you are threatened and/or feel that asking for stats is "chipping away", there's not much one can do about that.

T'ai Chi
6th February 2006, 01:31 PM
Your previous posting history argues strongly against this.

Past history says nothing about if I am raising an idea or not here, which I am.

T'ai Chi
6th February 2006, 01:33 PM
No, dude. An example.

Show me one textbook that says one calculates alpha.

drkitten
6th February 2006, 01:35 PM
Past history says nothing about if I am raising an idea or not here, which I am.

As the convicted burglar said when they found him in the back garden late at night, with a set of lockpicks and a jemmy in his hand, "Past history says nothing about if I am just ducking out of the wind for a quick smoke or not here, which I am."

CFLarsen
6th February 2006, 01:37 PM
We disagree.

And you're wrong.

I sincerely hope so. I doubt I am. The evidence all speaks in favor that I am right.

I have yet to state that I'm buying red shoes tomorrow, therefore I am.

As mentioned to you, we have no idea of costs. Talking about costs is rather moot.

I wasn't suggesting that you paid for all of it, just some of it, e.g., by supporting JREF. This, you won't do.

You want others to pay for what you demand. I predict that later, you will complain that JREF didn't do what you wanted, in your continuous efforts to belittle JREF.

Zzz. I'm not interested in your personal beefs. If you are threatened, and feel that asking for stats is "chipping away", I suggest you grow up.

I am not "threatened", Justin, and I sincerely doubt that Randi is, too. If you think your feeble attacks diminish the work and value of skeptics, you are severely deluded.

If you want to "threaten" skeptics, I suggest you start playing honest and try to find some worthwhile issues with some substance.

This seems rather impossible for you. Go figure.

drkitten
6th February 2006, 01:38 PM
Show me one textbook that says one calculates alpha.

I'm really sorry that your reading comprehension is so poor, but I will not read you a bedtime story.

If you really think that one cannot calculate alpha values, then what did I just calculate for Mike's experiment above?

T'ai Chi
6th February 2006, 01:46 PM
The evidence all speaks in favor that I am right.


That's an interesting opinion.


This, you won't do.


Says you. I've never said I would or wouldn't.


You want others to pay for what you demand.


I have not "demanded" anything.


I predict that later, you will complain that JREF didn't do what you wanted, in your continuous efforts to belittle JREF.


You should apply for the challenge and see if you can will a million. Really.


I am not "threatened",


You just have some odd need to say I've "demanded" things, that I won't pay for things, and other fairy tales.

T'ai Chi
6th February 2006, 01:49 PM
I'm really sorry that your reading comprehension is so poor, but I will not read you a bedtime story.

If you really think that one cannot calculate alpha values, then what did I just calculate for Mike's experiment above?

I'm not sure where you are getting lost here dr. I'm not interested in your math. I've been asking to see even one textbook that says one calculates alpha.

T'ai Chi
6th February 2006, 01:55 PM
As the convicted burglar said when they found him in the back garden late at night, with a set of lockpicks and a jemmy in his hand, "Past history says nothing about if I am just ducking out of the wind for a quick smoke or not here, which I am."

Of course, someone like you might convict him, not even asking if he is in his own backyard. :D

drkitten
6th February 2006, 01:56 PM
I've been asking to see even one textbook that says one calculates alpha.

And I'm telling you that I'm not going to provide you with one, because I provided you with an example of such a calculation instead. I don't care a half cup of warm spit what you think your textbooks say, because it's obvious that you don't understand them, so it would not be a useful way to spend my time tracking down citations.

If you want to convince me that you're worth educating in elementary statistics and logical thinking, first convince me that you're educatable.

CFLarsen
6th February 2006, 01:59 PM
That's an interesting opinion.

It's not an opinion, Justin. Your posting history here, and elsewhere, shows only one thing: You want to bring down skeptics and JREF in particular.

Says you. I've never said I would or wouldn't.

Will you, yes or no?

I have not "demanded" anything.

Bull.

You should apply for the challenge and see if you can will a million. Really.

Let's see what happens.

You just have some odd need to say I've "demanded" things, that I won't pay for things, and other fairy tales. You're very threatened to do that, IMO..

If that suits your little fantasy world. :rolleyes:

Why do you hate skeptics so much, Justin? Don't say that you don't, because your posting history is evidence that you do.

T'ai Chi
6th February 2006, 02:01 PM
I don't care a half cup of warm spit what you think your textbooks say, because it's obvious that you don't understand them, so it would not be a useful way to spend my time tracking down citations.


Here's an exact quote from one book I got:


In good statistical practice, alpha is specified in advance before any samples are drawn so that results will not influence the choice for the level of significance.
Understanding Basic Statistics, Brase and Brase, 1997


They don't talk about calculating alpha as you're claiming.

Looks like you're wrong here.

T'ai Chi
6th February 2006, 02:02 PM
Your posting history here, and elsewhere, shows only one thing: You want to bring down skeptics and JREF in particular.


Your opinion is interesting. Wrong, but interesting.


Will you, yes or no?


It all depends on specifics.


Bull.


We disagree.


Why do you hate skeptics so much, Justin? Don't say that you don't, because your posting history is evidence that you do.

You have a very active imagination. :D Gotta wonder about those who see hate and attacks in everything, and why they get so threatened.

rwguinn
6th February 2006, 02:05 PM
I'm not sure where you are getting lost here dr. I'm not interested in your math. I've been asking to see even one textbook that says one calculates alpha.
A numerical value that cannot be calculated. A mathematical result thatmcannot be determined mathematically. Amazing.
If it cannot be calculated, it isn't real anyway.
So why pick nits?

drkitten
6th February 2006, 02:10 PM
Here's an exact quote from one book I got:

Like I said, you don't understand the textbook.

Here's you're quote again, suitably emphasized.


In good statistical practice, alpha is specified in advance before any samples are drawn so that results will not influence the choice for the level of significance.

Obviously, this raises the possibility that alpha is not specified in advance (which in this description -- and both I and Harlequin agree -- is poor practice, but poor practice is not the same as impossibility.) In the event that alpha is not specified in advance, then, as Harlequin put it,


If you have already performed an experiment without bothering to think about this ahead of time, you can calculate what exactly the alpha was for that experiment.

... but this is generally poor practice, because it permits the obtained "results [to] influence the choice of the level of significance," or in other words is an open invitation to cherry-picking.

The most common time that you will see this mistake made (other than in the hands of exceedingly incompetent researchers) is in so-called "post hoc" analysis where the alpha value is adjusted in secondary (unplanned) analyses subsequent to a main analysis, usually an ANOVA.

So even your own preferred citation implies that you can calculate alpha, but that it's preferable not to do so.

ChristineR
6th February 2006, 02:13 PM
At the risk of making things worse, I believe that the way it works is the researcher chooses an alpha value to fit the circumstances, then calculates p-values for the various outcomes. Then one can match possible outcomes to the chosen alpha values. You could describe what you've done as "calculating the alpha values" that go with the various outcomes.

Conversely, if you have a very restricted experiment (i.e., Natasha Demikina) you might first calculate all the p-values (for seven possible outcomes), and then pick a de facto range of alpha.

CFLarsen
6th February 2006, 02:15 PM
It all depends on specifics.

What specifics?

You have a very active imagination. :D Gotta wonder about those who see hate and attacks in everything, and why they get so threatened.

Just answer the question: Why do you hate skeptics so much?

T'ai Chi
6th February 2006, 02:17 PM
Like I said, you don't understand the textbook.


So you say.

Say an average is theoretically distributed normally with a mean of 5 and a standard deviation of 1.

We take 20 samples and observe a mean of 4.7.

Test the hypothesis that mu = 5.

This book just sets alpha = .05. Then it goes on to calculate a test statistic, and a p-value, then compares the p-value to alpha, and ends up not rejecting the null hypothesis that mu = 5.

But you're saying you can calculate alpha. Can you do it here please to shut me up?

drkitten
6th February 2006, 02:19 PM
At the risk of making things worse, I believe that the way it works is the researcher chooses an alpha value to fit the circumstances, then calculates p-values for the various outcomes.

That's standard practice and the preferred method of doing it, certainly. But in many cases -- usually involving either post hoc analysis or in cleaning up the mess left by a semi-competent researcher -- it's necessary to recreate alpha values from the experimental protocol in order to determine what the implicit rejection threshold would have been.


Conversely, if you have a very restricted experiment (i.e., Natasha Demikina) you might first calculate all the p-values (for seven possible outcomes), and then pick a de facto range of alpha.

It doesn't even need to be that restricted an experiment; this kind of argument comes up all the time in various reports of paranormal coincidences. Calculating the probabiliy of a "Type I error" (which is essentially, calculating an alpha value) is a routine assignment for first term statistics students.

T'ai Chi
6th February 2006, 02:20 PM
What specifics?


It depends on what skeptical organizations do if anything.


Just answer the question: Why do you hate skeptics so much?

I answered your "question" with about as much respect as it warranted.. You may not have liked my answer though. :D

drkitten
6th February 2006, 02:26 PM
Say an average is theoretically distributed normally with a mean of 5 and a standard deviation of 1.

We take 20 samples and observe a mean of 4.7.

Insufficient information given. What's the distribution of the samples? Was the null hypothesis accepted or rejected?


But you're saying you can calculate alpha. Can you do it here please to shut me up?

With sufficient information given, I can.

As a matter of fact, it's a routine problem. (See this test, problem 5 (www.cambridgestudents.org.uk/subjectpages/maths/asalmaths/pastpapers/9709_s03_qp_7.pdf) for an example, although they phrase it as "calculate the probability of a type I error.")

Similarly, this page here, problem 15 (http://www.csulb.edu/~rodrigue/geog200/lab7.html) asks students to "calculate the probability that you could have gotten results as extreme as yours," again asking for them to calculate the alpha value of the described experiment.

As I said, I'm really sorry you don't understand your own textbooks.

CFLarsen
6th February 2006, 02:30 PM
It depends on what skeptical organizations do if anything.

But in this case, you already know what JREF does. Will you support it financially, yes or no?

I answered your "question" with about as much respect as it warranted.. You may not have liked my answer though. :D

Whether I like it or not has nothing to do with it. What I am wondering is if you think that you are making progress here.

Do you honestly think that you are successful in getting your arguments through? This constant refusal to back up your own claims with evidence, and refusal to clarify points in your argumentation - what possible good can that do to whatever point you are trying to make?

Does your refusal to answer questions about your claims mean that you insist that we merely take your word for it?

Or is it because you think we are too stupid to understand your explanations?

Or perhaps you have no answers at all? You are nothing but a troll, eager for whatever attention you can get, be it ever so vapid as all attention is on Internet forums?

I really can't think of any other alternatives. If you feel like clarifying, it would be most helpful.

To your own argument, that is. You are not scoring any points by not clarifying. I hope you can see that.

strathmeyer
6th February 2006, 02:32 PM
...although they phrase it as "calculate the probability of a type I error.")

Either provide an example, or admit that you were wrong. When you dance around and pretend to not be an idiot, everyone can still tell that you are an idiot.

T'ai Chi
6th February 2006, 02:35 PM
As a matter of fact, it's a routine problem. (See this test, problem 5 (www.cambridgestudents.org.uk/subjectpages/maths/asalmaths/pastpapers/9709_s03_qp_7.pdf) for an example, although they phrase it as "calculate the probability of a type I error.")


Dr, alpha is still set in that problem. You've just been asked to solve backwards to find what it was set at.


As I said, I'm really sorry you don't understand your own textbooks.

Yes, you've mentioned that. It still reeks of ad hom and adds nothing to your argument.

Here's another book outlining the steps of testing hypotheses


1. state Ho and Ha just as in a test of significance
2. Think of the problem as a decision problem, so that the probabilities of Type I and Type II errors are relevant
3. Because of Step 1, Type I errors are more serious. So choose an alpha (significance level) and consider only tests with probability of Type I error no greater than alpha.
4. etc.
Introduction to the Practice of Statistics, Moore and McCabe, 2003


3. you choose an alpha, you don't calculate an alpha.

T'ai Chi
6th February 2006, 02:37 PM
But in this case, you already know what JREF does. Will you support it financially, yes or no?


It all depends.


Do you honestly think that you are successful in getting your arguments through? This constant refusal to back up your own claims with evidence, and refusal to clarify points in your argumentation - what possible good can that do to whatever point you are trying to make?

Does your refusal to answer questions about your claims mean that you insist that we merely take your word for it?

Or is it because you think we are too stupid to understand your explanations?

Or perhaps you have no answers at all? You are nothing but a troll, eager for whatever attention you can get, be it ever so vapid as all attention is on Internet forums?


I'm not interested in your personal beefs. It is OK, I forgive you.

CFLarsen
6th February 2006, 02:50 PM
It all depends.

You know what JREF does. Will you support them financially, yes or no?

I'm not interested in your personal beefs. It is OK, I forgive you.

I think it is a bit disturbing that you have taken on this air of supreme superiority. You do actually believe that you are above everyone else, even to the point where you can regally - or even divinely - "forgive" people for insisting that you back up your claims with evidence.

petre
6th February 2006, 02:54 PM
Dr, alpha is still set in that problem. You've just been asked to solve backwards to find what it was set at.



Yes, you've mentioned that. It still reeks of ad hom and adds nothing to your argument.

Here's another book outlining the steps of testing hypotheses



3. you choose an alpha, you don't calculate an alpha.

You cannot calculate the value of a variable either, you choose the value of a variable.

T'ai Chi
6th February 2006, 02:56 PM
You know what JREF does. Will you support them financially, yes or no?


It all depends.


I think it is a bit disturbing that you have taken on this air of supreme superiority. You do actually believe that you are above everyone else, even to the point where you can regally - or even divinely - "forgive" people for insisting that you back up your claims with evidence.

I'm sorry you feel (mistakengly) that way.

Mercutio
6th February 2006, 03:01 PM
Wow. Tai, for someone who claims to love stats, you really do not understand them.

Do you have a copy of Dunn (2001)? He takes the time that most other stats authors do not, to explain how alpha is determined from the relative costs of type I and II errors. Unlike most authors, he takes pains to show that the conventions of .05, .01, or .001 are arbitrary (but traditional, and avoided "because authors would not want to appear capricious rather than careful.")

Glass & Hopkins (1984 is the version I have), treats alpha as the result of a bayesian process, as does Rosenberg (1990).

T'ai Chi
6th February 2006, 03:06 PM
Wow. Tai, for someone who claims to love stats, you really do not understand them.


I understand the quotes from books I've presented perfectly well. If I don't, you're free to demonstrate where I don't.


Do you have a copy of Dunn (2001)?


No I don't. I'll check it out though. Thanks.


, to explain how alpha is determined from the relative costs of type I and II errors. Unlike most authors, he takes pains to show that the conventions of .05, .01, or .001 are arbitrary (but traditional, and avoided "because authors would not want to appear capricious rather than careful.")


I'ev seen discussions about the costs of making type 1 and type 2 errors before. But does that book have a formula that one can plug in cost(type I) and cost(type II), and possibly other things, and out comes an alpha?

Mercutio
6th February 2006, 03:09 PM
So you say.

Say an average is theoretically distributed normally with a mean of 5 and a standard deviation of 1.

We take 20 samples and observe a mean of 4.7.

Test the hypothesis that mu = 5.

This book just sets alpha = .05. Then it goes on to calculate a test statistic, and a p-value, then compares the p-value to alpha, and ends up not rejecting the null hypothesis that mu = 5.

But you're saying you can calculate alpha. Can you do it here please to shut me up?Ah! I see the problem. (maybe). You are conflating two different sets of calculations. Of course we cannot take the numbers you give here and calculate alpha. What we need to know is what this theoretical distribution of numbers is measuring. If we are looking at medical research, for instance, where a drug is very costly and a false alarm would be expensive...or where an illness is devastating, and a miss would be intolerable. These things, which can be quantified in terms of dollar costs or person-hours or other numbers, are the relative costs of type I and type II errors in the real world (not in some hypothetical normal distribution. The relative costs of a drug, the prevalence of a disease in the population, those sorts of things are part of the analysis that goes into determining alpha.

And yes, one could determine that the ideal balance of type I and II error was an alpha of .07 (HIV drugs have been approved at that level, before the sample had sufficient power to have reached .05), although in such cases, social reasons will likely push researchers toward the .05 or .01 because everybody else uses them.

My grad stats courses spent enough time on this topic that I cannot simply think alpha is chosen out of thin air.

JMA
6th February 2006, 03:11 PM
One reason is: you could think of it sort of like adding up all the cumulative errors of each test.

And

Flipping 10 coins, you expect 5 H, 5 T. But, with only 10 coins, 6, 7, or 8 H is not that rare at all, and 9 or 10, although rare, certainly something you might find if you spent just one day flipping coins.

Flipping 100 coins, you expect 50 H...but it is much more difficult to get the same percentage of H as in the smaller sample. 60H you might find, but 70 is already very rare, 80 you probably won't find in several days' attempts at flipping 100 coins in a row. 90 or 100 could take you weeks. (Basically, with just 10 flips, you only need to be off by 4 from a priori probability in order to get 90%; if you flip 100, you need to be off by 40 flips to get the same percentage. A much more difficult task.)

OK. Thanks to both of you for the explanation. I understand better why now...

T'ai Chi
6th February 2006, 03:13 PM
Of course we cannot take the numbers you give here and calculate alpha.


Ok..


What we need to know is what this theoretical distribution of numbers is measuring.


Let's say weight in grams of a variety of beetle.

I'll even make up that cost(making Type I error) = $1, and cost(making Type II error) = 2$, if that will help get things moving.

Mercutio
6th February 2006, 03:14 PM
I'ev seen discussions about the costs of making type 1 and type 2 errors before. But does that book have a formula that one can plug in cost(type I) and cost(type II), and possibly other things, and out comes an alpha?
Check out Kirk (2nd ed is 1984; my guess is there is a newer), chapter 1.

Yes, most researchers choose .05 or .01 out of ignorance, but that does not mean it is the only way.

CFLarsen
6th February 2006, 03:25 PM
Wow. Tai, for someone who claims to love stats, you really do not understand them.

He doesn't just claim to love statistics. He claims that he has a degree in the field.

Yet, he keeps making one rookie error after another.

69dodge
6th February 2006, 03:32 PM
No. Because you simply have to wait for a long enough run, which we know will happen eventually (by the Drunkard's Walk theorem). Even if you don't wait that long on this trial, the cumulative effect of a half-dozen positive-but-not-significant experiments might be enough to produce an overall finding of significance in the hands of a sufficiently corrupt statistician.I understand the general idea of how it's supposed to work, but I don't think that it does work in the end when you look at the details. If the trials are long enough to get an excess of successes purely by chance, then they're also long enough that the small excess is not statistically significant. How could it be otherwise? The coin is fair.

I'm not sure of this, but that's how it seems to me, right now. If you or Mercutio could fill in some of the details, I'd be happy to learn.

Mercutio
6th February 2006, 04:04 PM
Ok..

Let's say weight in grams of a variety of beetle.

I'll even make up that cost(making Type I error) = $1, and cost(making Type II error) = 2$, if that will help get things moving.
First things first. Was I right?

Were you, in fact, complaining about the use of bayesian statistics allegedly to arrive at the same numbers you did through non-bayesian means...when in fact your calculations began with an assumed alpha level, whereas the bayesian inference in question was used to come up with an alpha level in the first place? It seems very like your complaint is based on a misunderstanding.

Your choice of .01 as seeming more reasonable than .05--was that essentially an intuitive use of bayesian statistics, yourself? Or did you use some other means of determining whether it seemed right?

T'ai Chi
6th February 2006, 04:56 PM
First things first. Was I right?


Don't know. You haven't shown any calculation yet.



, whereas the bayesian inference in question was used to come up with an alpha level in the first place?


By replacing the subjectiveness of choosing alpha with the subjectivenes of choosing prior odds?

Mercutio
6th February 2006, 05:36 PM
Don't know. You haven't shown any calculation yet.

You have a tough time picking up on context, don't you? Was I right about your confusing the two?


By replacing the subjectiveness of choosing alpha with the subjectivenes of choosing prior odds?
I'll take that as a yes.

T'ai Chi
6th February 2006, 08:38 PM
I'll take that as a yes.

You can interpret however you'd like. Not sure how you interpret me asking a question as a "Yes" though.

You said Dunn (2001), I believe. Is this a book, an article, what? And what is the title?

CFLarsen
7th February 2006, 01:30 AM
You can interpret however you'd like. Not sure how you interpret me asking a question as a "Yes" though.

You said Dunn (2001), I believe. Is this a book, an article, what? And what is the title?

All you need to do is answer "yes" or "no" to Mercutio's question.

What you don't need is to try to stir the discussion in another direction.

You are very quick to demand answers from others, but you shy away when the onus is on you.

drkitten
7th February 2006, 07:46 AM
Dr, alpha is still set in that problem. You've just been asked to solve backwards to find what it was set at.

Which is exactly the point I have been making for the past dozen or so posts.

What has been set is not an alpha value, but an acceptance criterion. To quote from the problem,
"It is decided to reject the null hypothesis [...] if the mean withdrawal time is less than 1.7 minutes." That's not setting "an alpha value." That's setting an acceptance criterion.

You are then asked to solve for "the probability of a Type I error," which is the definition of an alpha value.

In other words, given an ad hoc acceptance criterion and a testing scheme, solve for the alpha value.


Yes, you've mentioned that.

And you've just proven it -- again.

drkitten
7th February 2006, 07:49 AM
I understand the general idea of how it's supposed to work, but I don't think that it does work in the end when you look at the details. If the trials are long enough to get an excess of successes purely by chance, then they're also long enough that the small excess is not statistically significant. How could it be otherwise?

Not quite.

Remember that even a "fair" coin will yield a statistically significant result 5% of the time.

T'ai Chi
7th February 2006, 05:01 PM
In other words, given an ad hoc acceptance criterion and a testing scheme, solve for the alpha value.


Alpha determines the acceptance criterion, and vice versa.

Try again.

Steven Howard
7th February 2006, 09:56 PM
Alpha determines the acceptance criterion, and vice versa.

Try again.

Dude. Seriously. Read what you just wrote.

Alpha determines the acceptance criterion -- so if you know what alpha is, you can calculate the value of the acceptance criterion, yeah?

And vice versa. So if you know the acceptance criterion, you can ... calculate the value of alpha.

drkitten
8th February 2006, 07:14 AM
Alpha determines the acceptance criterion, and vice versa.


Which means that given an ad hoc acceptance criterion, you can calculate the alpha value of the relevant experiment and the probability of a Type I error.

What's the conceptual difficulty?

T'ai Chi
8th February 2006, 03:59 PM
Which means that given an ad hoc acceptance criterion, you can calculate the alpha value of the relevant experiment and the probability of a Type I error.


You don't do a hypothesis test saying 'I'll set the criterion at 5, then I'll solve for alpha'. You set the alpha at .05, then get the criterion.

Either way, subjectively setting the criterion and alpha are the exact same information. The point is that you're not calculating alpha from thin air, so to speak. You're just setting one subjectively, and inverting to find the other.

Mercutio
8th February 2006, 05:03 PM
You don't do a hypothesis test saying 'I'll set the criterion at 5, then I'll solve for alpha'. You set the alpha at .05, then get the criterion.

Either way, subjectively setting the criterion and alpha are the exact same information. The point is that you're not calculating alpha from thin air, so to speak. You're just setting one subjectively, and inverting to find the other.
I actually agree with you here. The bayesian inference process happens before this. You, in your analysis, appear to have pulled alpha out of...thin air. The authors do not. They weigh the costs and benefits of type I and II errors (admittedly there may be some subjectivity in this--hard costs are not always available, and trying to determine which factors are worth adding into the cost-benefit analysis can be a lot of work; still, the ability to specify even a range of values for, say, the base rate of an event, based on a posteriori numbers in the population, means that the bayesian process is a much more honest and useful method of determining alpha than the simple "it seemed reasonable" approach), and explain the process by which they determined the appropriate alpha AND criterion level.

Those who were criticising you for equating alpha and criterion level have been missing the same point you have. The bayesian step is not this one, but rather the bayesian step replaces your "out of thin air"/"reasonable"/"sensible" choice of alpha.

T'ai Chi
8th February 2006, 05:36 PM
You, in your analysis, appear to have pulled alpha out of...thin air. The authors do not. They weigh the costs and benefits of type I and II errors (admittedly there may be some subjectivity in this...


I mentioned that .05 and .01 are typically used in medical studies, and this is a medical study, and that extraordinary claims require stronger evidence therefore lower alpha. This is not exactly out of thin air, but using standard statistical methods that have worked for us well for over 100 years in a huge variety of fields.

In regard to setting the prior odds, Hyman said


Since then, and continuing into our time, thousands of individuals have made these claims. Yet, not one of these claims has withstood a scientific test.


I agree, therefore I'd choose to set alpha low.


Indeed, given that not one of these claimants have produced scientific evidence in support of their ability, it would be reasonable to assign odds of several thousand to one against the truth of the claim.


"reasonable". Quick, where is Andy to say one is foolish for saying this word, but ignore that Hyman said it? :D


I decided to assume that the prior odds in favor of the null hypothesis were 99:1.


Assume.

Why not 95:1? Why not 90:1? 75:1? Etc. This approach just replaces subjectively choosing alpha with subjectively choosing the prior odds. Different people choosing different priors, could lead to different results. Whereas setting alpha is pretty much standardized, therefore different people pretty much choose the same alpha of .05, .01, or .001.

If I ask a person to choose alpha, they'll typically say .05, .01, or .001. Ask a person to set prior odds, and see what range of numbers you get.


This means that I was also assuming that the prior odds against the alternative hypothesis are also 99:1.


Assuming.


The null hypothesis in our test is that the average number of correct matches will be one.


No disagreement there. That follows from mathematical expectation given the set up of the matching problem.


Even with my setting the prior odds at this modest level, the evidence provided by the outcome still fell far short of swinging the odds in her favor.


Note "setting", not calculating.


Those who were criticising you for equating alpha and criterion level have been missing the same point you have. The bayesian step is not this one, but rather the bayesian step replaces your "out of thin air"/"reasonable"/"sensible" choice of alpha.


Discussions of costs of making various errors are abundant in frequentist literature. The Bayesian step occurs when they set prior odds.

Mercutio
8th February 2006, 05:48 PM
I mentioned that .05 and .01 are typically used in medical studies, and this is a medical study, and that extraordinary claims require stronger evidence therefore lower alpha. This is not exactly out of thin air, but using standard statistical methods that have worked for us well for over 100 years in a huge variety of fields.
And when you realize why the alpha levels are set like as they have been, you will see that it is (tacit or expressed) bayesian inference at work.

I snipped the rest of your post--in it, you point out that certain inputs are "assum[ed]". Um...yeah, assumed based on a posteriori observation of prior occurance in the population...not pulled out of thin air, as your "assuming" comment leads one to believe. Other than that spin, you appear to accept that their choice of alpha (explicitly) and your own (implicitly) was based on fairly similar considerations of prior odds and costs of error.

I am a bit baffled, then, as to your disagreement with them. You go through the same process, but simply do not disclose what factors influenced your choice of alpha. They do, and you criticise them for "assuming" the same things that you simply sweep under the rug. Their process is much more honest and open than yours, but it appears that you have also engaged in your own form of bayesian inference at the same point they did. You just don't label it thus.

Is it only the wrong thing to do when someone else does it? Is it only the wrong thing to do when those who do it actually disclose the factors that they took into account?

Do you, at this point, acknowledge that your non-bayesian calculations address a different part of the problem than their bayesian inference? Or am I assuming too much understanding on your part?

drkitten
8th February 2006, 05:52 PM
You don't do a hypothesis test saying 'I'll set the criterion at 5, then I'll solve for alpha'.

You're not supposed to (in "real" research), no.

In practice, it happens all the time, especially in parapsychology.

Just as a sample case, consider the claim of T.C. Albin, "California Weatherman," as documented by KRAMER here on this forum. Quoting briefly from the claim and acceptance:


I happily submit that it will snow July 27th of this year [2005] in Oakland. To be exact some time between 12:00 am and 11:59 pm on July 27th of this year it will snow in Oakland California.


We accept your claim in which you state that you will "CAUSE IT TO SNOW IN OALKLAND CALIFORNIA on July 27th, 2005", providing that we agree that the Oakland, California you are citing here is the one across the bay from San Francisco, and that the snow will fall from the sky above, as if on a mid-winter Colorado Mountain ski slope, and that it be of meteorological origin, and NOT made artifically by some kind of snow-making machine. The snow must be as a result of the weather, and NOT man-made.

The claim is clear, as is the ad hoc acceptance criterion. (Perhaps needless to say, it didn't snow in Oakland.) But what's the alpha cutoff of this test?

69dodge
8th February 2006, 06:36 PM
Why not 95:1? Why not 90:1? 75:1? Etc. This approach just replaces subjectively choosing alpha with subjectively choosing the prior odds. Different people choosing different priors, could lead to different results. Whereas setting alpha is pretty much standardized, therefore different people pretty much choose the same alpha of .05, .01, or .001.

If I ask a person to choose alpha, they'll typically say .05, .01, or .001. Ask a person to set prior odds, and see what range of numbers you get.A couple of points:

1) Why is it bad if different people get different results? If two people have different beliefs before seeing the results of some experiment, it makes perfect sense for them to have different beliefs after seeing the results. Why should a single experiment cause anyone to throw away entirely the lifetime of experience that led him to his prior beliefs?

2) It matters where the subjectivity is introduced. Otherwise, why do any statistical calculations at all? Why not just eyeball the experimental results, and subjectively decide what we think they indicate? We should introduce our prior beliefs where they belong: in the prior probability distribution, whose purpose is exactly to represent those beliefs. Then, from that point on, everything can be calculated objectively.

It might not always be easy to decide whether we believe that the odds are, for example, 95:1 or 75:1, but we certainly can't expect a statistics handbook to decide for us. They're our beliefs, after all. If we don't know what we believe before an experiment is done, the results of the experiment can't tell us that. They can only tell us how we ought to change whatever prior beliefs we may have had.

Mercutio
8th February 2006, 08:25 PM
Nice post, 'dodge...can I steal some of your language for my stats classes?

69dodge
8th February 2006, 09:30 PM
Sure. Thanks.

T'ai Chi
9th February 2006, 03:53 PM
1) Why is it bad if different people get different results?


Not sure anyone said it is bad.


It might not always be easy to decide whether we believe that the odds are, for example, 95:1 or 75:1, but we certainly can't expect a statistics handbook to decide for us.


What analysis do you choose to accept when your priors lead to an opposite conclusion of someone who used different priors?

69dodge
10th February 2006, 12:08 AM
Not sure anyone said it is bad.I thought you mentioned it as an argument against Bayesian analysis. If you don't consider it to be an argument against Bayesian analysis, I'm glad, because I don't believe it is one. But then I don't understand why you did mention it.

What analysis do you choose to accept when your priors lead to an opposite conclusion of someone who used different priors?Mine, of course. And the someone should accept his.

CFLarsen
10th February 2006, 12:48 AM
Not sure anyone said it is bad.

Not sure you answered the pertinent question.

Why is it "reasonable" to set alpha to what you did?

T'ai Chi
10th February 2006, 02:33 PM
I thought you mentioned it as an argument against Bayesian analysis. If you don't consider it to be an argument against Bayesian analysis, I'm glad, because I don't believe it is one. But then I don't understand why you did mention it.


No, I don't think it is bad if people reach different conclusions.

It is somewhat confusing if people analyze the same data, but end up reaching different conclusions because their priors were different.

CFLarsen
10th February 2006, 02:36 PM
No, I don't think it is bad if people reach different conclusions.

It is somewhat confusing if people analyze the same data, but end up reaching different conclusions because their priors were different.
Exactly. Man, you nailed it. Cut straight to the core.

That's why we need to know why you think it is "reasonable" to set alpha to what you did.

Do you have an answer?

Mongrel
12th February 2006, 05:45 PM
Sorry to butt in here but from someone for who maths makes my head hurt :p

Could someone help me, I've interpreted this and that other thread (http://forums.randi.org/showthread.php?t=45357) and from what I can gather the differences are;

1) We try to work out the best odds we can, before the experiment, taking as many factors as we can into account

2) We run the experiments, then we work out the odds. If we want to we can massage the odds due to hind sight. If peer reviewed stuff uses this method it will play it safe because peer reviewers are happy to pull it apart

T'ai Chi
15th February 2006, 04:23 AM
Here again is the webpage of what I've been thinking of lately

http://www.statisticool.com/jrefchallengestats.htm

This applies not only to the JREF preliminary tests, but also similar tests by other skeptical organizations.

It would be nice if we could finally learn needed experimental tallies, p-values, & other stuff, to get an idea of if the scores are in line with what one would expect by change.

CFLarsen
15th February 2006, 04:42 AM
Here again is the webpage of what I've been thinking of lately

http://www.statisticool.com/jrefchallengestats.htm

This applies not only to the JREF preliminary tests, but also similar tests by other skeptical organizations.

It would be nice if we could finally learn needed experimental tallies, p-values, & other stuff, to get an idea of if the scores are in line with what one would expect by change.
I see that you don't set alpha. Why not?

T'ai Chi
15th February 2006, 04:15 PM
Originally Posted by T'ai Chi


Here again is the webpage of what I've been thinking of lately

http://www.statisticool.com/jrefchallengestats.htm

This applies not only to the JREF preliminary tests, but also similar tests by other skeptical organizations.

It would be nice if we could finally learn needed experimental tallies, p-values, & other stuff, to get an idea of if the scores are in line with what one would expect by change.


Cluas wrote


I see that you don't set alpha. Why not?


It is something that JREF and other skeptical organizations set prior to the experiment.

Oh, I just noticed, I wrote


It would be nice if we could finally learn needed experimental tallies, p-values, & other stuff, to get an idea of if the scores are in line with what one would expect by change.


or "cfl next post". And then Claus replied the next post.

I must be psychic. ;)

Mercutio
15th February 2006, 05:41 PM
or "cfl next post". And then Claus replied the next post.

I must be psychic. ;)
Or you can count. I am in the top ten posters, and CFL has more than twice my posts. In threads like this, he is even more present. In this thread, only you and one other poster have more posts than CFL, you can safely learn to hold your own tongue, and drkitten has posted mainly in the earlier pages.

Of course, what this means is that your naive bayesian analysis told you that this was an event with a high prior probability. Once again, you agree with a more formalized mathematical approach.


:D

Jeff Corey
15th February 2006, 07:28 PM
Odds are you're right, Behaviorman.

CFLarsen
16th February 2006, 01:26 AM
It is something that JREF and other skeptical organizations set prior to the experiment.

But that's different from the alpha you set. Which leads us back to your own value. "Reasonable", yes.

So, why is that "reasonable", if JREF and others set it differently? Do you know something they don't? Something that they should know?

Oh, I just noticed, I wrote

or "cfl next post". And then Claus replied the next post.

I must be psychic. ;)

Only you left out the ampersand, meaning "and", as well as choosing letters out of sequence. Your pathetic attempt of Bible-code post-hoc reasoning is pure woo: Leaving out what doesn't fit.

Let's see if we can use the same methods to find a universal truth:

It would be nice if we could finally learn needed experimental tallies, p-values, & other stuff, to get an idea of if the scores are in line with what one would expect by change.

"Tai sucx."

Gee, I'm a goddamn oracle!!

CFLarsen
18th February 2006, 08:02 AM
T'ai Chi,

Why is your own alpha value "reasonable", if JREF and others set it differently? Do you know something they don't? Something that they should know?

T'ai Chi
18th February 2006, 08:25 AM
T'ai Chi,
Why is your own alpha value "reasonable", if JREF and others set it differently?


Nowhere on my page

http://www.statisticool.com/jrefchallengestats.htm

did I mention "my alpha".

Let's go with the alphas skeptical organizations set, not what was talked about in a different thread.

CFLarsen
18th February 2006, 08:34 AM
Nowhere on my page

http://www.statisticool.com/jrefchallengestats.htm

did I mention "my alpha".

Let's go with the alphas skeptical organizations set, not what was talked about in a different thread.

OK, you refuse to answer the question.

T'ai Chi
18th February 2006, 08:35 AM
OK, you refuse to answer the question.

Nowhere on my page

http://www.statisticool.com/jrefchallengestats.htm

did I mention "my alpha".

Let's go with the alphas skeptical organizations set, not what was talked about in a different thread.

T'ai Chi
19th February 2006, 04:33 PM
So are people for seeing actual data in a summarized format from skeptical organizations that do such tests?

If not, why not?

drkitten
20th February 2006, 09:14 AM
So are people for seeing actual data in a summarized format from skeptical organizations that do such tests?


No, because the inevitable (over)simplifications necessary to put the test data into summarized format will result in the summaries being useless and actively misleading.

T'ai Chi
20th February 2006, 12:53 PM
No, because the inevitable (over)simplifications necessary to put the test data into summarized format will result in the summaries being useless and actively misleading.

So how would you analyze the data then without summarizing it?
(that is, if anybody had any)

Mercutio
20th February 2006, 01:34 PM
Um....individually. Which they already were. DrKitten is quite right, the attempt to combine such disparate studies is likely to be worse than useless.

One could argue that the preliminary nature of the tests, with their purpose as much "demonstrate that you can do what you say you can" as strict test, in and of itself precludes these data from being appropriate for a meta-analysis. The very same excuses that are used to dismiss the results post-hoc (if they are to be attended to) would invalidate the results from inclusion in a meta-analysis.

T'ai Chi
20th February 2006, 01:42 PM
DrKitten is quite right, the attempt to combine such disparate studies is likely to be worse than useless.


Why though?

If you have 20 dowsing experiments, done similarly (choosing gold under a cup, etc.) seems very reasonable to combine the results.

Seems to work in every other field, why not with skeptical organizations?

Gr8wight
20th February 2006, 02:30 PM
Why though?

If you have 20 dowsing experiments, done similarly (choosing gold under a cup, etc.) seems very reasonable to combine the results.

Seems to work in every other field, why not with skeptical organizations?

Because we don't have 20 dowsing experiments done simlarly. We have one dowsing experiment finding gold under a cup, one dowsing experiment finding addresses with a pendulum, one telepathy experiment sending thoughts to another person, one martial arts experiment attempting to stop an attacker without touching him... How do you combine those results?

By the way, can you tell me what element all of those tests had in common?

Mercutio
20th February 2006, 02:49 PM
Because we don't have 20 dowsing experiments done simlarly. We have one dowsing experiment finding gold under a cup, one dowsing experiment finding addresses with a pendulum, one telepathy experiment sending thoughts to another person, one martial arts experiment attempting to stop an attacker without touching him... How do you combine those results?

By the way, can you tell me what element all of those tests had in common?
In addition, we may have one tested against a claim of 100% accuracy, another tested against a claim of 90%, another against a claim of 60%... Each of these may (depending on the deal agreed to by both parties) result in a different cutoff level, which cannot be combined in a meaningful manner.

T'ai Chi
20th February 2006, 04:37 PM
Because we don't have 20 dowsing experiments done simlarly. We have one dowsing experiment finding gold under a cup, one dowsing experiment finding addresses with a pendulum, one telepathy experiment sending thoughts to another person, one martial arts experiment attempting to stop an attacker without touching him... How do you combine those results?


Tests of the 'how many did you get out of n trials, where each trial had a p probability of success' variety are done in a similar fashion; a binomial experiment.

Mercutio wrote


In addition, we may have one tested against a claim of 100% accuracy, another tested against a claim of 90%, another against a claim of 60%...


What a claimant believes about their performance doesn't interest me, but how they actually perform.

The issue of combining experiments aside, wouldn't it still be nice to see a list of such statistical results from the perliminary experiments, all in one place, from various skeptical organizations, without having to fly to the organizations to read through papers, made available to all interested parties, say, over the internet?

Mercutio
20th February 2006, 04:53 PM
What a claimant believes about their performance doesn't interest me, but how they actually perform.
What the claimant claims has a direct bearing on the test; it may determine that a test end as a failure in one case with results that were a small fraction of the required attempts of another case. Thus what a claimant believes about their perfomance has a direct bearing on how they will be allowed to perform in the test (that is, a person claiming 90% accuracy, who agrees to a 20-trial test, will fail a preliminary even if they score slightly above chance. Let us suppose that they score at a percentage rate that would make their performance statistically significant if they maintain it for only 100 trials; the problem is, their trial ended after 20 trials. It is impossible to know whether they would continue, or whether they would regress to the mean.)

More, since the tests are against a claimed level of performance, it is inappropriate to use a measure of effect size compared with chance performance (for similar reasons as above); without this, a traditional meta-analysis is impossible.

T'ai Chi
20th February 2006, 06:45 PM
(that is, a person claiming 90% accuracy, who agrees to a 20-trial test, will fail a preliminary even if they score slightly above chance.


I'm not interested in what a person believes about their performance, they could be mistaken, but how they actually perform, uch like I'm not interested in what a doctor thinks about a drug, but how the drug actually performs.

Mercutio
20th February 2006, 07:07 PM
I'm not interested in what a person believes about their performance, they could be mistaken, but how they actually perform, uch like I'm not interested in what a doctor thinks about a drug, but how the drug actually performs.
You missed my point.

Their performance is halted earlier when they are compared to a higher standard. This can throw a bias into the combined results. To throw them all onto the pile is completely inappropriate, statistically. If you wanted to test these folks against chance to begin with, with final test protocol conditions, on sample sizes sufficient to have the needed power, and then threw them into a meta-analysis, that would be just fine. But that is not what the initial challenge tests are, and you cannot pretend that they can be combined as if they were.

They served their purpose. They were not designed so serve yours.

T'ai Chi
20th February 2006, 07:47 PM
They served their purpose. They were not designed so serve yours.

Glad it was never claimed they were designed to serve my purposes...

In any case, in these tests, one compares what one expects to what the claimant actually does. You then measure the difference numerically to see if it is significantly far away.

Again, the issue of combining experiments aside, wouldn't it still be nice to see a list of such statistical results from the perliminary experiments, all in one place, from various skeptical organizations, without having to fly to the organizations to read through papers, made available to all interested parties, say, over the internet? Even something absurdly simple, like how many of the preliminary tests were on dowsers? Out of those, how many tested higher than what one would expect? Etc. Basic info interested parties would hope to find.

Mercutio
20th February 2006, 08:25 PM
Glad it was never claimed they were designed to serve my purposes...
Oh, heavens, let's not ever make claims...glad it was never claimed that they were claimed that they were designed to serve your purposes. Rather, you asked about combining data, and I did my best to try to help you understand why. That's all. No "claims" were made, so you can be safe.

In any case, in these tests, one compares what one expects to what the claimant actually does. You then measure the difference numerically to see if it is significantly far away.
No. We compare what the actual claim is to what the claimant actually does. We do not compare what we expect to see. There is a world of difference. We could, very easily, do the latter. The former has turned out to be considerably easier and quicker to do.

Again, the issue of combining experiments aside, wouldn't it still be nice to see a list of such statistical results from the perliminary experiments, all in one place, from various skeptical organizations, without having to fly to the organizations to read through papers, made available to all interested parties, say, over the internet? Even something absurdly simple, like how many of the preliminary tests were on dowsers? Out of those, how many tested higher than what one would expect? Etc. Basic info interested parties would hope to find.Perhaps. It would certainly be helpful to classes like mine. Even more helpful would be access of this sort to the raw data from the parapsychologists' labs. Doesn't Schwartz have some? (Maybe my memory is playing tricks). I know the most recent Bem precognition data would be a great set to do a time-series analysis on, to see if inadequate randomization predicts performance through a classical conditioning mechanism. If I am not mistaken, this database would be significantly larger and better controlled, since (in theory) the tests are not against claims but against chance. Of course, I would like to even see such experiments videorecorded (no, I am not holding them to a higher standard than, say, psych experiments; I would like to see a video archive for psych experiments as well) so that experimental methodology can be examined in a bit more detail than an article's methods section can manage.

Compared to such a database, the skeptic's data is small change.

CFLarsen
21st February 2006, 02:19 AM
No. We compare what the actual claim is to what the claimant actually does. We do not compare what we expect to see. There is a world of difference. We could, very easily, do the latter.

Precisely.

We cannot enforce our own views on what the claimant's abilities are. It is not just rude and condescending, it is also unscientific.

Mercutio
21st February 2006, 07:27 AM
A brief example:

Suppose we have a number of people claiming to be able to influence the outcome of a coin toss (or to predict it, if you prefer). If we were testing this claim against chance, we would have a simple binomial problem, testing against P = .50, and we would set up a suitable number of trials. If, though, we listen to the claimants, and adjust our tests accordingly, we can save time. Let us take the extreme example in which claimants say they have complete control, and will always be able to determine the coin's face. We can test this very easily--just start flipping. There is a .5 probability that (by chance alone) any given person will fail after one toss. But that person can stop then. The trial is over. If, on the other hand, the person got the first one right, then there is a .5 probability on the next trial (again, by chance alone). With enough claimants, we may have some people who are getting 5, or 10, or more coins called correctly before making a mistake (this all by chance alone--of course, if they *can* influence the outcome perfectly, they will never make the mistake. And yes, I recall that I am taking the extreme 100% position here, but it extrapolates to lesser claims). Now...if we take these data and combine them, they will very likely be significant. Why? Because we quit the trials earlier when they failed earlier, artificially boosting the number of successes.

In order to properly combine the data, they have to have been collected in a manner that is not subject to such a bias. These preliminary tests are subject to that bias, and thus will show, to the statistically naive observer, the illusion of an effect where there is none.

T'ai Chi
21st February 2006, 08:04 AM
We compare what the actual claim is to what the claimant actually does. We do not compare what we expect to see.


At the end of the day, you get numbers out of it that would be interesting to see and analyze.

T'ai Chi
21st February 2006, 08:10 AM
Let us take the extreme example in which claimants say they have complete control, and will always be able to determine the coin's face. We can test this very easily--just start flipping. There is a .5 probability that (by chance alone) any given person will fail after one toss. But that person can stop then. The trial is over. If, on the other hand, the person got the first one right, then there is a .5 probability on the next trial (again, by chance alone). With enough claimants, we may have some people who are getting 5, or 10, or more coins called correctly before making a mistake (this all by chance alone--of course, if they *can* influence the outcome perfectly, they will never make the mistake. And yes, I recall that I am taking the extreme 100% position here, but it extrapolates to lesser claims).


In this scenario, in order to do the test one assumes that the person making the claim is truthful about their claimed abilities. Unfortunately, that is the very thing one is trying to ascertain by the test in the first place. Perhaps they really only perform at the 90% level, or at the 70% level, or some other level.

It doesn't make much sense to say 'OK, since you're saying you perform at the K% level, we'll test you at that level, and if you don't perform at it, you're wrong.' It makes sense to say 'We know with regular coins we'd expect you to perform at the 50% level, and if you don't perform siginificantly away from this, you're wrong.'

CFLarsen
21st February 2006, 08:17 AM
In this scenario, in order to do the test one assumes that the person making the claim is truthful about their claimed abilities. Unfortunately, that is the very thing one is trying to ascertain by the test in the first place. Perhaps they really only perform at the 90% level, or at the 70% level, or some other level.

It doesn't make much sense to say 'OK, since you're saying you perform at the K% level, we'll test you at that level, and if you don't perform at it, you're wrong.' It makes sense to say 'We know with regular coins we'd expect you to perform at the 50% level, and if you don't perform siginificantly away from this, you're wrong.'

How do you expect a psychic to perform?

Jekyll
21st February 2006, 08:19 AM
It doesn't make much sense to say 'OK, since you're saying you perform at the K% level, we'll test you at that level, and if you don't perform at it, you're wrong.' It makes sense to say 'We know with regular coins we'd expect you to perform at the 50% level, and if you don't perform siginificantly away from this, you're wrong.'
Doing so makes the test longer, harder, and more expensive by requiring more trials.

It wastes the claiments time and the examiners.

It doesn't make allowence for results that we expect to be better than chance, see the girl with x-ray eyes, etc. .

What possible reason is there for doing it?

drkitten
21st February 2006, 08:48 AM
In this scenario, in order to do the test one assumes that the person making the claim is truthful about their claimed abilities. Unfortunately, that is the very thing one is trying to ascertain by the test in the first place.


And that's exactly what the test does. It's a simple binary test -- either the claimaint succeeeds or (more likely), they fail.

They either were truthful, or they weren't..

Perhaps they really only perform at the 90% level, or at the 70% level, or some other level.

Perhaps. But that's a different research question, one that would require substantially more time and resources to collect and analyze.

And one that cannot be answered in retrospect given the information we have available.

It doesn't make much sense to say 'OK, since you're saying you perform at the K% level, we'll test you at that level, and if you don't perform at it, you're wrong.'

Why not? It makes perfect sense to allow each claimant to define exactly what they feel they can do, and then test to see if they can perform as claimed.

Mercutio
21st February 2006, 09:19 AM
At the end of the day, you get numbers out of it that would be interesting to see and analyze.
No. As explained above, you get numbers out of it that are necessarily biased, due to the nature of the challenge tests. The only interesting analysis would be a demonstration to a stats or methods class as to why this would be a flawed use of data.

Mercutio
21st February 2006, 09:25 AM
In this scenario, in order to do the test one assumes that the person making the claim is truthful about their claimed abilities. Unfortunately, that is the very thing one is trying to ascertain by the test in the first place. Perhaps they really only perform at the 90% level, or at the 70% level, or some other level.
And in practice, the JREF representatives have allowed for some wiggle-room (x-ray girl was allowed mistakes, even though her claim could easily have meant that she would not have made mistakes). The cutoff performance is mutually agreed upon. JREF strongly suggests that claimants test their own abilities first. Again, the challenge is not the time to be ascertaining what their abilities are, it is the time to demonstrate their abilities as claimed.

It doesn't make much sense to say 'OK, since you're saying you perform at the K% level, we'll test you at that level, and if you don't perform at it, you're wrong.' It makes sense to say 'We know with regular coins we'd expect you to perform at the 50% level, and if you don't perform siginificantly away from this, you're wrong.' It makes perfect sense if you are interested in testing that question, which is a completely different question. The question we are testing is whether they can perform as they claim to. Period. Because of this, their data are useless for the type of post-hoc combinational analysis you were initially suggesting.

I see that three people beat me to this...well, tough. I am posting it anyway.

T'ai Chi
21st February 2006, 01:59 PM
Mercutio wrote:


The question we are testing is whether they can perform as they claim to. Period.


You aren't testing anything. Neither am I. Various organizations are.

That aside, their performance is measured by performance, not by what they believe about how they'll perform. The latter helps to set various cutoffs and help design the test though.


Because of this, their data are useless for the type of post-hoc combinational analysis you were initially suggesting.


From the reasoning I've seen, it is not convincing that meta-analysis does not apply to tests done by skeptical organizations.

And again, this issue of combining aside, what about seeing a list of the tests with their statistical results? Why is something so basic, so obviously interesting, so difficult to see.

drkitten
21st February 2006, 02:22 PM
And again, this issue of combining aside, what about seeing a list of the tests with their statistical results?

What about it? The list of the tests is easy enough to obtain. The statistical results may not be available for the reasons Mercutio has already discussed.

CFLarsen
21st February 2006, 02:42 PM
T'ai Chi,

How do you expect a psychic to perform?

Or, in fact, any kind of paranormal claimant to perform?

Mercutio
21st February 2006, 03:24 PM
You aren't testing anything. Neither am I. Various organizations are. "We" = anyone interested in this topic. You have suggested using their data in a secondary analysis; from that, I gathered that you are interested in finding the answers. My apologies if that was a mistake.

That aside, their performance is measured by performance, not by what they believe about how they'll perform. The latter helps to set various cutoffs and help design the test though.
You did not understand my reasoning, nor my example, then. Their performance in these tests is subject to a systematic bias, not because of the test design, but because the tests are not intended to provide data which would be appropriate for meta-analysis.

From the reasoning I've seen, it is not convincing that meta-analysis does not apply to tests done by skeptical organizations.
Then one of us is wrong. Please explain your reasoning; I have already explained why such data is inappropriate for meta-analysis. What is it that I have missed? Or, what is it that you do not understand?

And again, this issue of combining aside, what about seeing a list of the tests with their statistical results? Why is something so basic, so obviously interesting, so difficult to see.I have already given you my answer to this.

CFLarsen
22nd February 2006, 06:52 AM
You aren't testing anything. Neither am I. Various organizations are.

Have you ever taken any responsibility for anything you have ever done in your entire life?

T'ai Chi
22nd February 2006, 03:33 PM
Their performance in these tests is subject to a systematic bias,


We disagree.

And, again, the issue of meta-analysis aside, wouldn't the summarized data from all of these individual tests be nice to see? :)

drkitten
22nd February 2006, 03:47 PM
We disagree.

Which doesn't mean that your opinions are of equal validity.

What's wrong with Mercutio's coin-flipping experiment as an example of a valid test that even so is to biased to use for meta-analysis?

Mercutio
22nd February 2006, 03:49 PM
We disagree.
We have established this. I have explained why I believe you are wrong, and invited you to explain why you believe I am. "We disagree" leaves one of us wrong; if it is me, I want to know.

And, again, the issue of meta-analysis aside, wouldn't the summarized data from all of these individual tests be nice to see? :)
No. The summarized data would be misleading. The summarized data from parapsychologists' experiments, not subject to the same bias described earlier, would be very nice to see.

CFLarsen
22nd February 2006, 04:17 PM
T'ai Chi,

How do you expect a psychic to perform?

Or, in fact, any kind of paranormal claimant to perform?

T'ai Chi
24th February 2006, 02:49 PM
No. The summarized data would be misleading.


Not at all.

For example, if a dowser got 5 right out of 20 tries, showing a table that says 5 out of 20, with the probability of being correct for each try (which is the same from try to try) is useful information.

How about even more simple, like the number of preliminary challenges per year?

Or the gender distribution of applicants?

Or the % of different types of claims?

And many more.



The summarized data from parapsychologists' experiments, not subject to the same bias described earlier, would be very nice to see.

Sure.

But this is a thread on the statistics from the JREF challenge and similar stats from similar tests done by similar skeptical organizations.

CFLarsen
24th February 2006, 03:51 PM
T'ai Chi,

How do you expect a psychic to perform?

Or, in fact, any kind of paranormal claimant to perform?

Mercutio
24th February 2006, 04:06 PM
Not at all.
Yes, at all. Quite, in fact. And subtly so--enough so that you have not, it seems, grasped it yet.

For example, if a dowser got 5 right out of 20 tries, showing a table that says 5 out of 20, with the probability of being correct for each try (which is the same from try to try) is useful information.
Please go back and read my coin-flip example. It explains how an accumulation of fair tests can lead to a bias if taken in the collective. Your example here looks (and probably is) perfectly fair for a single test, but does nothing to alleviate the cumulative bias. Either you do not yet understand, or you are being dishonest, or I am missing something that you are unwilling or unable to show me.

How about even more simple, like the number of preliminary challenges per year?

Or the gender distribution of applicants?

Or the % of different types of claims?

And many more.
My former answer still applies; I think this would be far more interesting and useful as a summation of the parapsychologists' data. The challenge data are self-selected, and there really is no statistically sound reason to combine them. They are as available as most experimental data already, and of interest to fewer people, I would think.

Sure.

But this is a thread on the statistics from the JREF challenge and similar stats from similar tests done by similar skeptical organizations.So, all tests subject to that bias.

Ok, then it is a bad idea.

T'ai Chi
25th February 2006, 07:41 AM
Either you do not yet understand, or you are being dishonest, or I am missing something that you are unwilling or unable to show me.


Or you are incorrect in your reasoning of why you believe data from a skeptical organization is exempt from being scrutinized.


...there really is no statistically sound reason to combine them.


That remains to be seen. In fact, the data remains to be seen. :D

Again, let's ignore the issue of combining for now (something it seems you are having a very hard time doing). We wouldn't know if combining is even applicable until delving into the specifics of the tests. Ignoring the combining, it would still be nice to see the data to answer basic questions like

-what % of those taking the preliminary test are male?
-what % were testing dowsing? card guessing?
-how many preliminary tests per year?
-what is the closest someone has got to passing?
-where geographically do the people being tested come from?
-what % of those being tested, get retested?
-how much does it cost, on average, to get tested?

and many others, this for each skeptical organization who does such tests.


Ok, then it is a bad idea.

You are entitled to your opinion, sure.


http://www.statisticool.com/jrefchallengestats.htm

CFLarsen
25th February 2006, 07:43 AM
T'ai Chi,

And you are entitled to answer the questions or not:

How do you expect a psychic to perform?

Or, in fact, any kind of paranormal claimant to perform?

Mercutio
25th February 2006, 07:52 AM
Or you are incorrect in your reasoning of why you believe data from a skeptical organization is exempt from being scrutinized.
For the third time, then, I invite you to explain the error of my reasoning. I don't see it. I have explained it in sufficient detail that you should be able to point to where I have made my alleged mistake.


That remains to be seen. In fact, the data remains to be seen. :D
No, it does not remain to be seen; it has, if my reasoning is correct, been explained. I have asked you to explain why you think I am wrong. Feel free to run a simulation of my coin-flip experiment and empirically demonstrate to yourself the soundness of my argument.

The data remain to be seen because there is no good reason to knowingly present data which would be misleading in the aggregate.

Again, let's ignore the issue of combining for now (something it seems you are having a very hard time doing).I can understand why you would want to ignore it, but it won't go away. Address the issue, and then we won't have to ignore it. We can put it to rest.
We wouldn't know if combining is even applicable until delving into the specifics of the tests. Ignoring the combining, it would still be nice to see the data to answer basic questions like

-what % of those taking the preliminary test are male?
-what % were testing dowsing? card guessing?
-how many preliminary tests per year?
-what is the closest someone has got to passing?
-where geographically do the people being tested come from?
-what % of those being tested, get retested?
-how much does it cost, on average, to get tested?

and many others, this for each skeptical organization who does such tests.
Why? What use do you see for these data?

You are entitled to your opinion, sure.And I have explained my reasoning, and invited you to do the same. All opinions are not equal. Some are held for good reason, some are held out of ignorance. If mine is the latter, I want to know it.

CFLarsen
25th February 2006, 07:57 AM
Mercutio,

Better make a list.... ;)

Mercutio
25th February 2006, 08:06 AM
Mercutio,

Better make a list.... ;)
I don't do lists.

I could put my request in limerick form...

Is the problem with you, or with me?
I've asked you, times one, two, and three--
Please show my mistake--
That's all it would take--
So put up or shut up, T'ai Chi.

CFLarsen
25th February 2006, 08:22 AM
I do.


T'ai Chi,


Can you please explain the error of Mercutio's reasoning, instead of merely declaring that he is in error?


Will you run a simulation of Mercutio's coin-flip experiment and empirically demonstrate to yourself the soundness of his argument?


Can you tell us what element all of those tests mentioned by Gr8wight in post # 146 had in common?


What use do you see for these data you listed in #175?


Why is it "reasonable" to set alpha to what you did?


Why is your own alpha value "reasonable", if JREF and others set it differently?
Status: Refused to answer.

T'ai Chi
25th February 2006, 08:34 AM
For the third time, then, I invite you to explain the error of my reasoning.


Your reasoning against a possible meta analysis and seeing statistics (http://www.statisticool.com/jrefchallengestats.htm) is


Suppose we have a number of people claiming to be able to influence the outcome of a coin toss (or to predict it, if you prefer). If we were testing this claim against chance, we would have a simple binomial problem, testing against P = .50, and we would set up a suitable number of trials. If, though, we listen to the claimants, and adjust our tests accordingly, ...


One always listens to claimants to help design the test, sure. But when it comes down to analyzing the observed data, for any test, one does something like

ztai chi = (actual hits-hits expected by chance) / stuff

and not

zmercutio = (actual hits-hits expected by what claimant says) / stuff

and one compares ztai chi, not zmercutio, to a standard normal distribution, for example, to get a probability that will help us assess the claimant's performance.


...we can save time. Let us take the extreme example in which claimants say they have complete control, and will always be able to determine the coin's face. We can test this very easily--just start flipping. There is a .5 probability that (by chance alone) any given person will fail after one toss. But that person can stop then. ...


Besides the error above, your error here is saying the person can just stop then. Surely rules will be built into such a hypothetical preliminary test to prohibit optional stopping. Both claimant and tester agree on the number of trials beforehand, this is well known. Are you really claiming that JREF or another skeptical organization, when testing a claim of a statistical nature, will agree to test someone with only one or two, or even a few trials? That is doubtful to the extreme.

Your argument seems to be with the design of the test, not meta analysis.


... The trial is over. If, on the other hand, the person got the first one right, then there is a .5 probability on the next trial (again, by chance alone). With enough claimants, we may have some people who are getting 5, or 10, or more coins called correctly before making a mistake (this all by chance alone--of course, if they *can* influence the outcome perfectly, they will never make the mistake. And yes, I recall that I am taking the extreme 100% position here, but it extrapolates to lesser claims). Now...if we take these data and combine them, they will very likely be significant. Why? Because we quit the trials earlier when they failed earlier, artificially boosting the number of successes.


I'm really not interested in "if"s and "may"s from hypothetical data from hypothetical tests being "very likely significant". Without seeing the actual data, one doesn't know if the combined data will be significant or not. That is one of the points of the exercise. One can always dream up situations for anything where something can possibly go wrong, that is not impressive. Actually seeing the data is another story.

Statistics on the table, please?

Let's start out easy: How many preliminary tests are done each year?, and How many of these tests are statistical in nature? How many of those taking the preliminary test have been tested for dowsing?

T'ai Chi
25th February 2006, 08:45 AM
For example

on http://www.skepticreport.com/psychics/oz.htm

the expected number was calculated using the model of .5, what one would expect by chance, not by using what the claimant expected to get.

And there was no optional stopping. Both parties agreed to 100 trials beforehand.

Mercutio
25th February 2006, 08:58 AM
Thank you.

I am happy now; I know that it is not my reasoning that is wrong.

Your error is that you are continuing to look at each individual test, rather than at the cumulative nature of combining them in the meta-analysis. You also continue your misunderstanding of the reasoning behind alpha, but that is ok. The two z tests you compare are precisely the difference between laboratory parapsychology work (where the tc z reasoning determines alpha) and the challenge (where the merc z reasoning determines alpha and the cutoff). My hypothetical coin-flip example was merely a demonstration which combined the multiple challenge tests into one test, for ease of understanding. Your analysis here confirms that it is a sound argument.

My argument is with the test design, certainly; it is a design that is appropriate for the challenge, but inappropriate for a meta-analysis.

I do thank you for finally putting your reasoning on the table. My mind is at ease now.

CFLarsen
25th February 2006, 09:14 AM
For example

on http://www.skepticreport.com/psychics/oz.htm

the expected number was calculated using the model of .5, what one would expect by chance, not by using what the claimant expected to get.

Where does the claimant say anything about what he expected to get?

T'ai Chi
25th February 2006, 09:44 AM
Your error is that you are continuing to look at each individual test, rather than at the cumulative nature of combining them in the meta-analysis.


I'm interested in a possible meta analysis, as well as looking at results from individual tests (http://www.statisticool.com/jrefchallengestats.htm). Your reasoning why a meta analysis would be inappropriate on hypothetical tests is not convincing IMO for reasons discussed.

You appear to believe I'm saying let's get the data and do a meta analysis no matter what. I'm not. I'm saying let's possibly consider it to test the incredible notion of skeptics making results negative, but regardless, let's see the data being made more easily available because that would be of interest. The latter is the main point here.


The two z tests you compare are precisely the difference between laboratory parapsychology work (where the tc z reasoning determines alpha) and the challenge (where the merc z reasoning determines alpha and the cutoff).


I didn't say anything about determining alpha. Typically alpha is set to .001 for the JREF preliminary test. The z's I posted have to do with p-values, not alpha.

-What % of claimants are dowsers? It is said that this is the most common claim, but it would be best to get actual numbers if possible, for example.

-How much does it cost, on average, to get tested, for both the skeptical organization and for the person being tested? This is an interesting question too I believe. Don't care about specific numbers here, just ballparks for an average dowsing test for example.

Not really sure why there is even any debate about such an approach. But I'm sure I'll find out the grave errors I've made in next few replies! :D

CFLarsen
25th February 2006, 09:58 AM
For example

on http://www.skepticreport.com/psychics/oz.htm

the expected number was calculated using the model of .5, what one would expect by chance, not by using what the claimant expected to get.

Where does the claimant say anything about what he expected to get?

Mercutio
25th February 2006, 10:18 AM
Not really sure why there is even any debate about such an approach. But I'm sure I'll find out the grave errors I've made in next few replies! :D
Not from me; my questions are answered.

T'ai Chi
25th February 2006, 10:25 AM
From page 1, Mercutio wrote:


What percentage of the applicants have a claim that is statistical in nature?


Great question! I'd like to know that and other details. Maybe someday we will. :D

Mercutio
25th February 2006, 10:31 AM
Look through KRAMER's threads--perhaps the first question should be "what percentage of applicants have a claim that is comprehensible?". After all, if we limit ourselves to those who can put a meaningful sentence together, we are artificially narrowing the population!

CFLarsen
25th February 2006, 10:41 AM
For example

on http://www.skepticreport.com/psychics/oz.htm

the expected number was calculated using the model of .5, what one would expect by chance, not by using what the claimant expected to get.

Where does the claimant say anything about what he expected to get?

I can't find where he says this. Perhaps you could show us where?

T'ai Chi
25th February 2006, 10:59 AM
Look through KRAMER's threads


I looked through Kramer's threads, and didn't find the answer. Maybe I missed it, or maybe it wasn't addressed.

As I opined (http://www.statisticool.com/jrefchallengestats.htm)


The JREF has been making strides in making information on "the claims received, the correspondences exchanged between the JREF and the applicant, and subsequent protocol negotiations and test results" electronic, but not, as far as I can tell, the numerical results of past preliminary tests, which in my opinion are just as, if not more, interesting and relate more to the science.


It would be nice, ideal perhaps, if in the future there was a webpage one could go to and actually see all of this information in an organized format (as far as I'm concerned, names of applications can be omitted; I just want to see what matters in terms of the design and results), so one wouldn't have to stop life and fly to Florida, or search hundreds of posts (of only recent tests it must be noted) for something that may or may not be there.

CFLarsen
25th February 2006, 11:12 AM
It would be nice, ideal perhaps, if in the future there was a webpage one could go to and actually see all of this information in an organized format (as far as I'm concerned, names of applications can be omitted; I just want to see what matters in terms of the design and results), so one wouldn't have to stop life and fly to Florida, or search hundreds of posts (of only recent tests it must be noted) for something that may or may not be there.

We've already been through this. That costs money. Are you willing to contribute to make this possible, or do you just want to sit on your ass, making demands?

Where does the claimant say anything about what he expected to get?

I can't find where he says this. Perhaps you could show us where?

The Central Scrutinizer
25th February 2006, 12:51 PM
I don't do lists.

I could put my request in limerick form...

Is the problem with you, or with me?
I've asked you, times one, two, and three--
Please show my mistake--
That's all it would take--
So put up or shut up, T'ai Chi.

:dl:

You are the master! I bow before you.

Mercutio
25th February 2006, 01:14 PM
:dl:

You are the master! I bow before you.
Oh, come on! If it's not in person, it's just not worth it...

Gr8wight
25th February 2006, 09:17 PM
I don't remember if anyone has specifically mentioned this in this thread, but I think the thing that is hanging Tai Chi up is the fact that the JREF tests are not conducted as academic studies might be. If he is really looking for a more in depth understanding of the academic study of the paranormal, his queries would be much better directed at the parapsycology department of a major university. It is important that he understand that the JREF is not in any way engaging in studies of paranormal phenomena, nor are they interested in doing so. There are people already doing that (some actually responsibly).

The JREF's mission statement is clearly laid out on this website:
"Its aim is to promote critical thinking by reaching out to the public and media with reliable information about paranormal and supernatural ideas so widespread in our society today."

The $1,000,000 challenge is simply a tool to promote interest in the foundation. It's adversarial nature is designed to attract controversy so as to better publicise the foundation and its work. While the challenge is the most visible part of the JREF's activities, and the one we talk most about on these forums, I suspect Randi would tell you that the work he does at seminars, in classroom, and as a consultant (the stuff that is invisible to the majority of the general public) is far more important than demonstrating that Achau Nguyen cannot actually send his thoughts to another person.

CFLarsen
26th February 2006, 01:42 AM
I don't remember if anyone has specifically mentioned this in this thread, but I think the thing that is hanging Tai Chi up is the fact that the JREF tests are not conducted as academic studies might be. If he is really looking for a more in depth understanding of the academic study of the paranormal, his queries would be much better directed at the parapsycology department of a major university. It is important that he understand that the JREF is not in any way engaging in studies of paranormal phenomena, nor are they interested in doing so. There are people already doing that (some actually responsibly).

Consider the fact that T'ai Chi has repeatedly derided JREF and Randi for not doing science at all.

Consider the fact that T'ai Chi claims that science can't be done outside the lab.

Consider the fact that T'ai Chi shows no interest in paying for the extra work it would take to meet his demand of access to the files.

You take a guess what T'ai Chi is up to.

CFLarsen
26th February 2006, 02:13 AM
T'ai Chi,

Are you willing to contribute to make this possible, or do you just want to sit on your ass, making demands?

Where does the claimant say anything about what he expected to get?

I can't find where he says this. Perhaps you could show us where?

Gr8wight
26th February 2006, 07:07 AM
Consider the fact that T'ai Chi has repeatedly derided JREF and Randi for not doing science at all.

Consider the fact that T'ai Chi claims that science can't be done outside the lab.

Consider the fact that T'ai Chi shows no interest in paying for the extra work it would take to meet his demand of access to the files.

You take a guess what T'ai Chi is up to.

Oh, I have never had a doubt. The T' is short for troll.

CFLarsen
26th February 2006, 07:26 AM
Oh, I have never had a doubt. The T' is short for troll.
Perhaps. But T'ai Chi does not merely want to troll. Although there is definitely an issue of clamoring for attention, his primary goal is to cast as much doubt about skepticism and skeptics as possible.

Not valid criticism, which would be admirable, worthwhile and welcome, but an ongoing campaign to smear. Cheap pot shots and snide remarks, after which he runs away from real debate are his trademarks.

CFLarsen
27th February 2006, 01:50 AM
More evidence here of T'ai Chi's deceit. (http://forums.randi.org/showthread.php?t=52785)

De_Bunk
27th February 2006, 05:20 AM
I'd just like to see T'ai Chi answer anyone of the questions Claus asked......

But WE know that he will just continue to ignore...

DB

Jekyll
27th February 2006, 08:03 AM
Thank you.

I am happy now; I know that it is not my reasoning that is wrong.

Your error is that you are continuing to look at each individual test, rather than at the cumulative nature of combining them in the meta-analysis. You also continue your misunderstanding of the reasoning behind alpha, but that is ok. The two z tests you compare are precisely the difference between laboratory parapsychology work (where the tc z reasoning determines alpha) and the challenge (where the merc z reasoning determines alpha and the cutoff). My hypothetical coin-flip example was merely a demonstration which combined the multiple challenge tests into one test, for ease of understanding. Your analysis here confirms that it is a sound argument.

My argument is with the test design, certainly; it is a design that is appropriate for the challenge, but inappropriate for a meta-analysis.

I do thank you for finally putting your reasoning on the table. My mind is at ease now.
I don't want to give T.C. ammunition but;

Wouldn't the example you gave about the strings of heads be analysable as a geometric distribution?
We've lost a massive amount of information in the throwing away of data so our certainty of the head to tails ratio will be lower, but I think we can create an unbiased estimator of E(heads) as (Occurences of a string of heads of length L+1)/(Occurences of a string of length L).

In a similar way the data from a repeated JREF experiment of the form 2 failures for 20 trials could be analysed as a tail-truncated sum of two geometric distributions.

I'm mentioning this because if I'm right, the strings of successes generated by paranormal 'researchers' could be re-analsysed as a head truncated geometric distribution, to generate an unbiased estimator.

Thoughts anyone?

Mercutio
27th February 2006, 08:14 AM
I don't want to give T.C. ammunition but;
Don't worry about that--first off, he was comparing to .5, so this is a different animal altogether. Secondly, if it turns out he's right, that is what is important, not winning some argument.

Thoughts anyone?At first glance, that looks really neat--I would certainly defer to someone who knows more about math than I do (drkitten?), but I do think that addresses the systematic bias that I was talking about.

drkitten
27th February 2006, 09:09 AM
I would hardly claim to know more than Mercutio about statistics, but as far as I can tell, the question you asked is a legitimate one that T'ai Chi should have asked.


Wouldn't the example you gave about the strings of heads be analysable as a geometric distribution?
We've lost a massive amount of information in the throwing away of data so our certainty of the head to tails ratio will be lower, but I think we can create an unbiased estimator of E(heads) as (Occurences of a string of heads of length L+1)/(Occurences of a string of length L).

Yes, we could do this analysis. But we would need essentially to build our statistics, our estimator, and our tables of significance from scratch, and to perform meta-analysis on this kind of data under field conditions would be a nightmare.

In particular, building our estimator hinges crucially on the idea that there are lots of people, all running near-identical coin-flip experiments. Under field conditions, this is exactly what we don't get -- instead, we get one person who can "influence" coin flips, another person who can "predict" the fall of a pair of dice, a third who can clairvoy (is that a word?) the cards drawn from a conventional deck, all to different claimed threshholds of accuracy. And that's not counting the nutcases who believe that they can summon UFOs.

Again, the people who are in the greatest need of this kind of analysis are not the JREF, but the field researchers at the parapsychology department of Redbrick Uni; the JREF has neither the facilities, the interest, the mission, nor the capacity for doing this kind of meta-analysis.

Mercutio
27th February 2006, 10:23 AM
Yes, we could do this analysis. But we would need essentially to build our statistics, our estimator, and our tables of significance from scratch, and to perform meta-analysis on this kind of data under field conditions would be a nightmare.
Agreed. It would be one fun Monte Carlo simulation, though, no?

In particular, building our estimator hinges crucially on the idea that there are lots of people, all running near-identical coin-flip experiments. Under field conditions, this is exactly what we don't get -- instead, we get one person who can "influence" coin flips, another person who can "predict" the fall of a pair of dice, a third who can clairvoy (is that a word?) the cards drawn from a conventional deck, all to different claimed threshholds of accuracy. And that's not counting the nutcases who believe that they can summon UFOs.
Correct me if I am thinking fuzzy on this...it seems to me that the type of test and claimed level of accuracy are much less important in this analysis than on the sort that TC was asking for. Or maybe I am just looking at the bias I saw, and am ignoring some other source of bias. Again, the people who are in the greatest need of this kind of analysis are not the JREF, but the field researchers at the parapsychology department of Redbrick Uni; the JREF has neither the facilities, the interest, the mission, nor the capacity for doing this kind of meta-analysis.
Agreed wholeheartedly. Although it might (might, I say, I am speaking out of ignorance here) be a fun project for someone pursuing a math degree!

Jekyll
27th February 2006, 10:45 AM
Correct me if I am thinking fuzzy on this...it seems to me that the type of test and claimed level of accuracy are much less important in this analysis than on the sort that TC was asking for. Or maybe I am just looking at the bias I saw, and am ignoring some other source of bias.
It's still not really applicable to the JREF stats, just because of the pick 'n' mix nature of their tests. Something which is necessary for them to test every applicant.
Agreed wholeheartedly. Although it might (might, I say, I am speaking out of ignorance here) be a fun project for someone pursuing a math degree!
*Sigh*.
If someone could point me towards an electronic copy of the data recorded by paranormal researchers, and could talk me through the relivant stats. I'd be prepared to spend an hour or two MATLABing the data.

For a sufficently large body of data exponental decay should be a valid approximation for the inital stages. Does anyone in the know about radioactive physics want to fill us in on significance values etc.?

drkitten
27th February 2006, 10:59 AM
Agreed. It would be one fun Monte Carlo simulation, though, no?

For a sufficiently limited and personal definition of "fun," perhaps.


Correct me if I am thinking fuzzy on this...it seems to me that the type of test and claimed level of accuracy are much less important in this analysis than on the sort that TC was asking for.

Well, the type of test would be crucial because it would establish what our baseline for "chance" is, and it would need to be assessed, in detail, for each claim. (For example, if I claim to be able to predict the next card drawn from a poker deck : are the cards drawn with or without replacement? What kind of feedback do I get?)

The claimed level of performace is less relevant, but I think that would need to be assessed on a case-by-case basis.

I still think it would be too difficult to do well, and too easy to do badly -- and therefore shouldn't be done at all.

drkitten
27th February 2006, 12:08 PM
The claimed level of performace is less relevant, but I think that would need to be assessed on a case-by-case basis.

I know, quoting my own post is a sure sign of narcissism or something. But I just realized something relevant regarding bias.

The claimed level of performance is important, because in some regards, it determines the stringency of the controls that Randi & Co. would need to apply in testing conditions.

Just as an example -- if I claimed to be able to predict with 100% accuracy whether or not a pregnant woman was carrying a male or a female (a traditional question of divination, btw), that would indeed be paranormal. if I claimed to be able to predict with 51% accuracy -- that's not that hard, since the ratio of males to females at birth is approximately 51.5% males for most First World countries. That 1.5% might make all the difference between a successful and an unsuccessful test, depending upon how Randi set up the test (and calculated the probability of the null hypothesis).

In particular, if I claimed 100% accuracy, Randi could safely ignore this little demographic factoid in designing the test. Even if I scored slightly above "chance" (as defined by a 50/50 null hypothesis), his millioni would still be safe.

Number Six
28th February 2006, 09:15 AM
I don't know what the alpha level is on the JREF preliminary test (the ones that are statistical in nature that is) but if it's 0.001 then it's too large IMO. To prevent fraud you have to have it such that a large number of people can't take the test in a short time and have one win by chance and then go "A-ha!"

Yes, it's the claimaints responsibility to pay for the trip to the JREF and it'd be hard to get a large number of people from all over the world to do that, but there are plenty of people that already live close to the JREF and it would cost them almost nothing to take the challenge. All someone would have to do is organize a large number of them to take the test. Have them all make the same claim so it'd be easy to train them at once. March several of them into the JREF every day. You'd only need enough to cover a year because after a year a person is eligible to take the challenge again.

If all the tests were the same and the alpha level were 0.001 then that chance someone wouldn't win by chance on a single test would be 0.999, which means it'd take only 693 people taking the test to have a greater than 50% chance that someone would pass by luck alone.

If someone did this the JREF would be able to figure out their trick and they'd know they were being scammed but they'd be in a position where backing out would make them look bad.

Why would each person take the challenge? Consider the cost, chance of winning and the reward. The cost is a few hours of your time. The chance of winning is 1 in 1000 (if we have alpha=0.001). The reward is to be able to say you beat James Randi in his challenge. The woos worldwide would eat this up and the winner of the preliminary challenge would make a lot of money from them as a result. S/He'd be a cult hero to them. You'd make, at a minimum, tens of thousands of dollars, if not millions, all for a few hours of your time at a 1 in 1000 shot.

The fact that you passed the preliminary challenge instead of the real challenge would be lost in the shuffle. The fact that you failed when the real challenge came along would get lost too. Or, if someone passed the preliminary challenge they'd likely just not take the real challenge
because they'd have more to lose than to gain at that point.

I hope the alpha is generall smaller than 0.001. Tests can be set where there is both a good chance the person will pass the challenge if they have the abilities they claim and a small alpha.

CFLarsen
28th February 2006, 09:18 AM
Why would each person take the challenge? Consider the cost, chance of winning and the reward. The cost is a few hours of your time. The chance of winning is 1 in 1000 (if we have alpha=0.001). The reward is to be able to say you beat James Randi in his challenge. The woos worldwide would eat this up and the winner of the preliminary challenge would make a lot of money from them as a result. S/He'd be a cult hero to them. You'd make, at a minimum, tens of thousands of dollars, if not millions, all for a few hours of your time at a 1 in 1000 shot.

You can't get those good odds in Vegas. You risk nothing, you win all.

drkitten
28th February 2006, 09:43 AM
I don't know what the alpha level is on the JREF preliminary test (the ones that are statistical in nature that is) but if it's 0.001 then it's too large IMO. To prevent fraud you have to have it such that a large number of people can't take the test in a short time and have one win by chance and then go "A-ha!"

[snip]

If all the tests were the same and the alpha level were 0.001 then that chance someone wouldn't win by chance on a single test would be 0.999, which means it'd take only 693 people taking the test to have a greater than 50% chance that someone would pass by luck alone.

It hasn't been a problem so far; at present rates, 693 people taking the test would represent something like a century of testing. Not only will Randi be long-dead by then, but probably so will you, I, and the person selected to replace Kramer as the challenge coordinator.

I think that the 0.001 alpha level does a good job serving the Educational part of the JREF mission. If hundreds of people were to apply in a very short time with identical or near-identical claims, especially with something that could obviously related to simply getting lucky with wild guesses, I think that would actually help the cause of skeptical reasoning. Very few people, for example, think there's anything supernatural involved when they read about someone winning the lottery -- there are millions of tickets sold, and someone has to be a winner. This applies on a smaller scale, too.... if I win a new car at the firehouse raffle, it's not because I'm magic, but because there were only 1000 tickets sold, and someone had to get it. If it can be made obvious -- and people like Penn and Teller, Michael Shermer, and the Mythbusters are very good at making
things obvious -- that of the hundreds of people who entered the JREF raffle, one person finally drew a winning ticket, that's actually a strong argument against the paranormal.

Especially when they then fail the final test with results not much better than chance.

Hellbound
28th February 2006, 01:30 PM
My understanding is that the .001 level is for preliminary testing only, with results expected to meet the .000001 level for the actual MDC.

Course, I could be misremembering :)

drkitten
28th February 2006, 01:49 PM
My understanding is that the .001 level is for preliminary testing only, with results expected to meet the .000001 level for the actual MDC.

That's correct. I think 6's point is that even passing the preliminary test (irrespective of one's performance on the final) could and would be spun as a tremendous step forward for paranormalism. "Even Randi admits that this kind of performance merits further investigation!" Et cetera, et cetera, and so forth.

And to some extent he's right, because "against stupidity, the Gods themselves contend in vain," and there are demonstrably still people out there who believe all sorts of dumb things. But I also think that he's wrong, because the percentage of those people is slowly getting smaller and smaller. Even homeopaths will go in for surgery if they get appendicitis.

Hellbound
28th February 2006, 02:25 PM
And to some extent he's right, because "against stupidity, the Gods themselves contend in vain," and there are demonstrably still people out there who believe all sorts of dumb things. But I also think that he's wrong, because the percentage of those people is slowly getting smaller and smaller. Even homeopaths will go in for surgery if they get appendicitis.

Ah, I see now.

As to the stupidity quote, that only makes sense. After all, you wouldn't expect a god to be more powerful than that which created it...

*Huntsman ducks and runs*

:D

Thing
28th February 2006, 04:34 PM
If I were a cheerleader for the paranormal I'd be very keen to see these statistics compiled, so that I could claim that paranormal activities are observed, just not at a level required to win the challenge. Think about Brian Josephson's hysterical response to the testing of 'The Girl Without X-Ray Vision' (as she's now known), namely that since she performed better than chance there must be something there even though she didn't meet her own agreed criterion for success. The cry for a statistical anaysis sounds awfully like a prelude to an attempt to claim the same thing for challenge applicants en masse.

T'ai Chi
28th February 2006, 04:54 PM
If I were a cheerleader for the paranormal I'd be very keen to see these statistics compiled,


Or just any person who is curious about seeing the actual data from interesting tests.


so that I could claim that paranormal activities are observed, just not at a level required to win the challenge.


If there's nothing there, what does one have to be afraid of?

Mercutio
28th February 2006, 05:47 PM
Or just any person who is curious about seeing the actual data from interesting tests.
Especially those who are ignorant of statistics.
If there's nothing there, what does one have to be afraid of?
Misinterpretation of statistical artifacts as effects.

Hey, it happens! Regression to the mean gets misinterpreted quite often. I'd hate to see what sort of spin a less obvious artifact gets!

T'ai Chi
28th February 2006, 05:55 PM
Especially those who are ignorant of statistics.


But we're all ignorant of the statistics if we're not able to see any actual statistics.

What % of the applicants have been female? Interesting question. Seems unnecessarily difficult to get a numeric answer.

Mercutio
28th February 2006, 06:10 PM
But we're all ignorant of the statistics if we're not able to see any actual statistics.
Not ignorant of the statistics. Ignorant of statistics. Unable to understand a bias inherent in the accumulation of trial data, for one example.

What % of the applicants have been female? Interesting question. Seems unnecessarily difficult to get a numeric answer.
It is a self-selected sample. Suppose you found a particular percentage female; what possible interpretation could you give it? Is that the percentage in the population? Are there pressures that might lead a greater proportion, or a smaller proportion, of women to apply?

The data cannot answer those questions, even if they were available; for the purposes of the challenge, there is no reason to make any inquiries of this sort. They are meaningless data.

What sort of things do you think you could learn from the gender percentages of these data? Why do you think they are appropriate?

T'ai Chi
28th February 2006, 06:31 PM
Unable to understand a bias inherent in the accumulation of trial data, for one example.


The only thing presented on that note was a poor argument based on optional stopping occurring, something which does not actually happen in JREFs well-designed tests, and based on observed data being tested against what the claimant claims, something which also does not accur since z-scores are in the form

z = (observed-expected by chance)/stuff


Suppose you found a particular percentage female; what possible interpretation could you give it? Is that the percentage in the population? Are there pressures that might lead a greater proportion, or a smaller proportion, of women to apply?


The hang-up is thinking of doing inference based on the %. Viewing the % as a descriptive statistic, there are no problems whatsoever.


They are meaningless data.


You do not want to learn numerical results about the test?


What sort of things do you think you could learn from the gender percentages of these data?


The characteristics of the applicants. That seems interesting to know what type of people took the test, gender, where they are from, age, so on.

As do the categories of claims tested.

As do the scores from the tests, for reasons already explained (http://www.statisticool.com/jrefchallengestats.htm).

Mercutio
28th February 2006, 06:38 PM
The only thing presented on that note was a poor argument based on optional stopping occurring, something which does not actually happen in JREFs well-designed tests, and based on observed data being tested against what the claimant claims, something which also does not accur since z-scores are in the form

z = (observed-expected by chance)/stuff

Your hang-up is thinking of doing inference based on the %. Viewing the % as a descriptive statistic, there are no problems whatsoever.
The example I gave was a simplification of the problem of combining tests with decision levels based on different claims. It was intended purely to put the very real problem into a more concrete form so that you could understand it. Now the only thing missing is your ability or willingness to extrapolate from that example to the challenge situation.

Why is it meaningless if one wants to know the % of females that have applied for the test? You do not want to learn numerical results about the test?
What would a given percentage mean? What inference could you draw? I have shown you why I think it meaningless.

The characteristics of the applicants. That seems interesting.

As do the categories of claims tested.

For example.

As do the scores from the tests, for reasons already explained to you.But you would be unable to make any inferences at all about the greater population from this sample. Why not examine the far more useful data gathered by parapsychologists?

T'ai Chi
28th February 2006, 06:51 PM
It was intended purely to put the very real problem into a more concrete form so that you could understand it.


Like I said, it was not dealing with real data nor a real situation (optional stopping and testing against the what the claimant expects do not occur in the JREF tests), so understandably it is not a persuasive way to argue.


What would a given percentage mean? What inference could you draw? I have shown you why I think it meaningless.


This is descriptive statistics, as already mentioned, not inferential. It tells you what % of something occured in the sample.


Why not examine the far more useful data gathered by parapsychologists?

No one is stopping anybody from doing that. It is a good idea too, but a different topic. If you'd like to talk about that different topic, sinec you keep coming back to that, why not start up a new thread on that topic and leave this thread to talking about the statistics from tests done by skeptical organizations.

Mercutio
28th February 2006, 06:59 PM
Like I said, it was not dealing with real data nor a real situation (optional stopping and testing against the what the claimant expects do not occur in the JREF tests), so understandably it is not a persuasive way to argue.
You were unable to see the bias. I tried this example to help you. Perhaps it succeeded better with other readers.


This is descriptive statistics, as already mentioned, not inferential. It tells you what % of something occured in the sample.
The last paragraph of your web page you linked a couple of posts back says that you intend to use the information, in part, to "help better understand what people believe in, who believes in them, where they are from..." This implies inference from the sample to the population.

No one is stopping anybody from doing that. It is a good idea too, but a different topic. If you'd like to talk about that different topic, sinec you keep coming back to that, why not start up a new thread on that topic and leave this thread to talking about the statistics from tests done by skeptical organizations.Again from your page, you advocate "learning about how skeptical organizations test, and ways to improve the testing." I am simply suggesting ways to improve your own investigation. If you wish to answer the questions you ask on the face of it, the better data set is the parapsychologists'. If you have some other motive for looking at a data set that is not at all ideal for answering your question, by all means continue.

T'ai Chi
28th February 2006, 08:25 PM
You were unable to see the bias.


I saw what you tried to do, it is just irrelevant for reasons stated, namely that there is no optional stopping in the JREF tests, and the observed data is not compared to what the claimant expects but to chance.

Let's break this down; just show me one example of an actual, not hypothetical, test by JREF where optional stopping was agreed upon and moreover, actually occured. Just one. Would shut me up, and would prove your argument has some worth.


The last paragraph of your web page you linked a couple of posts back says that you intend to use the information, in part, to "help better understand what people believe in, who believes in them, where they are from..." This implies inference from the sample to the population.


No, it does not necessarily imply that. It tells us the characteristics of that sample of people.


Again from your page, you advocate "learning about how skeptical organizations test, .."


Yes, that is correct. Without data we can't say much.


If you wish to answer the questions you ask on the face of it, the better data set is the parapsychologists'.


A parapsychologists' data set is not data from preliminary tests done by skeptical organizations. If you wish to look at parapsychologists' data sets, which I agree is fascinating, but off the topic of looking at data from tests done by skeptical organizations, you are welcome to do so.


If you have some other motive for looking at a data set that is not at all ideal for answering your question, by all means continue.


Looking at data from a sample is ideal for telling us about that sample.

If one has some motive for suggesting that data from skeptical organizations cannot possibly be analyzed, they can, by all means, continue.

Mercutio
1st March 2006, 06:03 AM
I saw what you tried to do, it is just irrelevant for reasons stated, namely that there is no optional stopping in the JREF tests, and the observed data is not compared to what the claimant expects but to chance.

Let's break this down; just show me one example of an actual, not hypothetical, test by JREF where optional stopping was agreed upon and moreover, actually occured. Just one. Would shut me up, and would prove your argument has some worth.
Once again, and slowly: That was not the point. The point was that the challenge tests were smaller runs because the cutoffs were chosen based on the claim, not based on an assumption of chance performance. That is, if you understand it, sufficient to create the bias. My example was much more blatant, and distilled the problem into one easy-to-see example.

The fact that you still do not see it, though, tells me that the argument failed in its purpose. Oh, well.


No, it does not necessarily imply that. It tells us the characteristics of that sample of people.
Perhaps you had better correct your web page then. It speaks of "what people believe in", not "what that small, self-selected sample of people believe in"


Yes, that is correct. Without data we can't say much.
Does the "yes" go so far as to understand the point I was making? In the part of my post you chose not to quote, I suggest that your own questions are better answered using other methods.


A parapsychologists' data set is not data from preliminary tests done by skeptical organizations. If you wish to look at parapsychologists' data sets, which I agree is fascinating, but off the topic of looking at data from tests done by skeptical organizations, you are welcome to do so.
Well, then...given that the data are unable to answer the questions you have about "what people believe in...", what exactly is your motivation for focusing on the poorer data set?


Looking at data from a sample is ideal for telling us about that sample.
Yes. We have already looked at that sample, in the context of answering the questions that sample was intended to answer. Any more is bad statistics, and bad methodology.

If one has some motive for suggesting that data from skeptical organizations cannot possibly be analyzed, they can, by all means, continue.Cannot? No. Should not? Many. No ulterior motive, though, simply enough experience with statistics and methodology that their misapplication is irritating. Does one need more motivation to advocate not using flawed methods to try to draw conclusions? I thought we had a common goal of understanding the world, understanding these phenomena. If I pointed out that there was one microscope in the lab that had a cracked lens, and noticed that you advocated using that scope, do you need to suggest "some motive" for my actions?

Your suggestion is flawed. Drop it and walk away.

Gr8wight
1st March 2006, 07:17 AM
What sort of things do you think you could learn from the gender percentages of these data?

The characteristics of the applicants. That seems interesting to know what type of people took the test, gender, where they are from, age, so on.


It is becoming clear that T'ai Chi's interest is merely one of general curiosity, which is nothing to be ashamed of. I, too, have been very curious to find out more information about past tests, and was elated when Kramer began posting information about past challenge applicants on the forum.

Why he feels it necessary to hide his personal curiosity behind clumsy protestations of desire for statistical analysis is beyond me. Is he just trying to feel all academic-like? Does he somehow feel embarrassed to be doing nothing more than simply sticking his nose in and sniffing about?

Don't worry, T'ai Chi. None of us will think less of you if you admit the truth. In fact, many of us will think very much more of you if you abandon this charade and just deal honestly with us.

petre
1st March 2006, 07:25 AM
I think it would be interesting to know what percentage of paranormal subjects in skeptic tests have a mole on their left hand. Such a pity that they do not recognize the value such data could have to humanity.

CFLarsen
1st March 2006, 08:59 AM
T'ai Chi,

Answers to these questions, made possible by the data being easily available, would be of general interest to the skeptical community, and could help better understand what people believe in, who believes in them, where they are from, and learning about how skeptical organizations test, and ways to improve the testing.
Source (http://www.statisticool.com/jrefchallengestats.htm)

Yet, you merely pick those you can find, namely those who take the JREF challenge.

This is exactly what you have accused me of doing in this article: That I include all astrologers (although I don't). And here you do the very same: You take a small sample of people and extrapolates that to the general population.

Better change your webpage, T'ai.

nathan
1st March 2006, 10:24 AM
What % of the applicants have been female? Interesting question. Seems unnecessarily difficult to get a numeric answer.
Doesn't seem an interesting question to me at all. What would it tell you?

T'ai Chi
2nd March 2006, 04:14 PM
Doesn't seem an interesting question to me at all. What would it tell you?

You might not be interested in such things like, for example, the number of beds in a hospital, the number of car accidents in a city, % of crime by type of crime, number of a certain product sold, and other descriptive statistics either, but some are.

Skeptical organizations doing tests should expect people to be interested in seeing the numbers.

T'ai Chi
2nd March 2006, 04:28 PM
The point was that the challenge tests were smaller runs because the cutoffs were chosen based on the claim, not based on an assumption of chance performance.


That necessarily doesn't mean there is bias.

If there is, you have an issue with the test design, not people asking to see the statistics.

Mercutio
2nd March 2006, 06:39 PM
That necessarily doesn't mean there is bias.
Necessarily doesn't? or doesn't necessarily? The former, I would disagree with strongly. The latter, I would ask why one would plan for a test knowing that there is a known possibility of bias, instead of looking for better data sets with which to answer the question? If you have the choice between test tubes you know are clean, and those that might be dirty, it should be an easy choice.

If there is, you have an issue with the test design, not people asking to see the statistics.There is not a bias in the tests, when they are used for what they are designed for; the designs are perfectly sound for their purpose. The bias emerges when the results are combined to look for deviations from chance. Jekyll's analysis might control for that bias, but your first idea would not.

T'ai Chi
2nd March 2006, 06:49 PM
The bias emerges when the results are combined to look for deviations from chance.


That remains to be known. An argument by appealing to hypotheticals is not convincing. In fact, it is somewhat doubtful since meta analyses have been very useful in many other areas of study.

In any case, the idea of combining aside, it would be nice to see an organized presentation of data from individual tests from skeptical organizations that conduct tests. Is that objected to as well?

Mercutio
2nd March 2006, 07:00 PM
That remains to be known. An argument by appealing to hypotheticals is not convincing. In fact, it is somewhat doubtful since meta analyses have been very useful in many other areas of study.
Meta analysis is a wonderful tool, but only as good as the original studies. If they are not appropriate for the meta-analysis, no amount of massaging the data will make it worthwhile.

In any case, the idea of combining aside, it would be nice to see an organized presentation of data from individual tests from skeptical organizations that conduct tests. Is that objected to as well? Objected to? Questioned. You seem to want, although you deny, to infer from the small, self-selected sample to the greater population. If that is at all implied, then that is worth objecting to. If it is not what you (or anyone) are after, then what is being examined is the characteristics of a self selected sample, for the sake of looking at the characteristics of a self selected sample. It can't tell us anything about the population at large, it cannot generalize to something about human nature...it seems a very trivial question. Not objected to...but about thisclose to useless.

T'ai Chi
2nd March 2006, 08:47 PM
If it is not what you (or anyone) are after, then what is being examined is the characteristics of a self selected sample, for the sake of looking at the characteristics of a self selected sample.


If one is interested in the % of the types of tests that skeptical organizations have done, for example, then seeing these numbers is not useless, but in fact answers the question of what % of types of tests there have been.

I believe (but may be wrong) that it has been said that for the JREF, for example, most of the tests are dowsing. It would be nice to have an actual number, instead of "most". Is most 51%? 90%? What? If you have these %s for the general types of claims, you can sort the list from largest to smallest. This type of stuff is called 'understanding a topic better'. :) Would think that people calling themselves skeptics and others would be interesting in seeing the test data from skeptical organizations for a variety of reasons.

If one protests because the claimants are "self selected", one needs to ask themself if there is any way to test people from the population randomly for a paranormal claim for a million dollar challenge. Sometimes, in reality, one has to go with what data one has, even if it is not under ideal conditions. If no inference is being done from such data, there is not really any issue.

Let's start with something simple; how many preliminary tests are conducted by year, for each year the preliminary test has been done? I doubt the "self selection" makes the number from the answer to this question "useless".

Mercutio
2nd March 2006, 09:20 PM
If one protests because the claimants are "self selected", one needs to ask themself if there is any way to test people from the population randomly for a paranormal claim for a million dollar challenge.
Thank you. Yes, this points out the problem nicely. The challenge has a very specific aim, and there are exceedingly few questions that can be asked of data collected in accordance with that aim. It is not a "protest" that the claimants are self-selected; it is a fact. A fact which limits the applicability of these data to any other use. The question is not whether there is another way to test the population for the million; the challenge does a very good job of that. The question is, why would anyone want to take those data, which have accomplished their task, and try to force them to answer questions they are not equipped to answer?

The short answer to the question you pose here is that it needlessly combines two problems. If you want to ask the questions you want to ask, you need to test people randomly selected from the population. If you want to test a paranormal claim for a million dollar challenge, you need to do what the challenge has been doing. The data set from the latter is not appropriate to address the former. If you are truly interested in the questions you pose, you should avoid the challenge data.

CFLarsen
3rd March 2006, 12:56 AM
Take a look at how hard T'ai Chi evades the questions put to him.

First, he thinks that it would be "interesting" to find out what percentage of the applicants have been female - and then indicates that this is a problem that JREF should solve.

Attempts of making him explain why it would be "interesting" only results in T'ai Chi's assurance that the statistics are merely "descriptive", despite the fact that he clearly indicates on his webpage that the statistics will be interpreted to help people better understand.

True to form, T'ai Chi has evaded this question:

Well, then...given that the data are unable to answer the questions you have about "what people believe in...", what exactly is your motivation for focusing on the poorer data set?

Taking potshots at skeptical organizations. By pointing to what he feels is poor data, he wants to portray skeptical organizations (and JREF in particular) as sloppy and poorly equipped to test paranormal applicants.

It is interesting to note that, despite T'ai Chi's insistence that only laboratory data is valid, he completely refuses to look at such data from the parapsychologists.

nathan
3rd March 2006, 04:26 AM
You might not be interested in such things like, for example, the number of beds in a hospital, the number of car accidents in a city, % of crime by type of crime, number of a certain product sold, and other descriptive statistics either, but some are.


as it seems so important to you to answer this question, why don't you just go to the challenge forum and count the number of male and female applicants YOURSELF?

CFLarsen
3rd March 2006, 05:49 AM
as it seems so important to you to answer this question, why don't you just go to the challenge forum and count the number of male and female applicants YOURSELF?

Because T'ai Chi has a long history of demanding that others do his work for him. Here's how it goes:

T'ai Chi: "It could be interesting to look at X".

Others: "Well, go look at X and tell us what you find."

T'ai Chi: "If you want to see it, you should find it yourself."

Others: "But you are the one who expressed interest in X."

T'ai Chi: "See how lazy those skeptics are..."

Take his little AURA "study" of transcripts of cold-readers (psychics and mentalists): He insisted that others collected the transcripts for him. He invited people to come up with suggestions and critiques, but refused to listen to those who were critical, even excluding those from seeing the study. People had to either state that they believed in spirits or that they believed that spirits were an impossibility - he would not allow the skeptical POV, namely that spirits may be possible, but no evidence has yet been found. He himself took that stance, of course, and saw no problem whatsoever.

For some unknown reason, T'ai Chi has removed the "study" from his website. Copies are, however, available if you send me an email.

T'ai Chi
3rd March 2006, 03:32 PM
A fact which limits the applicability of these data to any other use.


So every once in a while we see data from these tests, right?

Why not put it all in an easy to get, easy to read, format for everyone who is interested in such results from each test?

Seems reasonable.


If you are truly interested in the questions you pose, you should avoid the challenge data.

Is "truly interested" the same as a "true Scotsman"? :D

Mercutio
3rd March 2006, 06:08 PM
So every once in a while we see data from these tests, right?

Why not put it all in an easy to get, easy to read, format for everyone who is interested in such results from each test?
A) You and who else constitute this "everyone"? B) As I tell my kids, the question is not "why not?"; the question is "why?". These data cannot answer the questions you have wanted them to; what purpose will these data serve?
Seems reasonable.
To whom?
Is "truly interested" the same as a "true Scotsman"? :D
Very good! You are quite right; I really have no reason to suspect that you have any interest in the questions at all, let alone "true interest". Consider the comment retracted.

CFLarsen
4th March 2006, 01:12 AM
Seems reasonable.

Here we go again...."reasonable". That's when we know T'ai Chi has run out of arguments.... :rolleyes:

T'ai Chi
4th March 2006, 07:43 AM
These data cannot answer the questions you have wanted them to; what purpose will these data serve? To whom?


The data can answer questions people are interested in, descriptive ones about how many tests per year, what type, claimant characteristics, and scores.

You seem to be interested in individual results, but not a list of all the individual results. Who knows why.


; I really have no reason to suspect that you have any interest in the questions at all, let alone "true interest".


What you "suspect" doesn't really matter.

Mercutio
4th March 2006, 07:54 AM
The data can answer questions people are interested in, descriptive ones about how many tests per year, what type, claimant characteristics, and scores.
Who are these "people"?

The data can answer these questions only about the small, self-selected sample itself. The data cannot tell us anything about the greater population, about which the questions are far more interesting.

You seem to be interested in individual results, but not a list of all the individual results. Who knows why.
Because that is all the data are good for. If I am interested in the questions you think are important, I would look for a sample which can answer them. This list cannot answer those questions.

You seem to be interested in asking questions of a database that is not designed to answer them, but not looking for a database that can. Who knows why.
What you "suspect" doesn't really matter.Hey, I was only agreeing that my previous comment was a "No True Scotsman". Which would you prefer, that I assume you are interested, or that I assume you are not? It doesn't matter to me. If you are interested, there are better ways of answering the questions. If you are not, there is no reason to be asking them in the first place.

T'ai Chi
4th March 2006, 08:04 AM
Who are these "people"?


Many skeptics and non-skeptics, some people who read the commentary, some people who read SI, Skeptic, Fortean Times, some people who are on science and skeptic movement bulletin boards; in short, anyone who is interested in seeing data from tests done by skeptical organizations.

You seem to be interested in them, at least when they are listed singley and sporadically.


The data can answer these questions only about the small, self-selected sample itself. The data cannot tell us anything about the greater population, about which the questions are far more interesting.


This minor point was already addressed.

Mercutio
4th March 2006, 08:45 AM
Many skeptics and non-skeptics, some people who read the commentary, some people who read SI, Skeptic, Fortean Times, some people who are on science and skeptic movement bulletin boards; in short, anyone who is interested in seeing data from tests done by skeptical organizations.
Hmmm...perhaps I missed where these people were clamoring for this data set to be mined; if you could point me to where the request is being made by others than you, I will gladly concede the point. Might I suggest, though, that people who are interested in the sort of questions you ask might already realize that this data set cannot answer them, and are not asking the same question you are.

T'ai Chi
4th March 2006, 09:29 AM
If one doesn't feel statistics from tests done by skeptical organizations are interesting or important, one is entitled to such an unimaginative opinion.

But there seems to be really no other way to answer questions like "What % of tests are dowsing?" other than to see actual data, for example.

Test information and results listed singlely is OK, but putting all of these single results in a table and not even combining them in a meta-analysis is taboo, according to some, for some odd reason.

CFLarsen
4th March 2006, 10:24 AM
If one doesn't feel statistics from tests done by skeptical organizations are interesting or important, one is entitled to such an unimaginative opinion.

But there seems to be really no other way to answer questions like "What % of tests are dowsing?" other than to see actual data, for example.

Test information and results listed singlely is OK, but putting all of these single results in a table and not even combining them in a meta-analysis is taboo, according to some, for some odd reason.
Can you point out where the request is being made by others than you or not?

Mercutio
4th March 2006, 10:37 AM
If one doesn't feel statistics from tests done by skeptical organizations are interesting or important, one is entitled to such an unimaginative opinion.
I agree. Fortunately, no one has suggested such a thing. What is suggested is that trying to make the data serve a purpose for which they were not designed is not productive.

But there seems to be really no other way to answer questions like "What % of tests are dowsing?" other than to see actual data, for example.
What purpose would the answer to that question serve? It answers only the narrowest of questions about a self-selected sample. It tells nothing about the beliefs about dowsing in the general population, nothing about belief in one's own ability to dowse in the general population...I do not understand why one would intentionally choose a biased sample to examine meaningful questions. If you agree with that, and claim only to be interested in examining the narrow, self-selected sample, I sincerely ask "why?". It is an absolutely useless question.

Test information and results listed singlely is OK, but putting all of these single results in a table and not even combining them in a meta-analysis is taboo, according to some, for some odd reason. Exactly. It is an "odd" reason; a reason which only becomes clear with an advanced understanding of statistics. This is why combining them is such a bad idea.

Without a good understanding of statistics and methods, all sorts of investigations are seemingly "reasonable". My students have to be taught, with some care, why a self-selected sample is subject to bias; why a sample of convenience may show artifacts; why a particular alpha is used. These things are not intuitively obvious.

Taking a sample that is perfectly adequate for its original purposes but unfit for meta-analysis or inferential study and trying to do more with it, is just plain irresponsible. Anyone who cares about statistics and methodology should know that. There is enough misinformation available without intentionally adding to the pile.

CFLarsen
4th March 2006, 11:02 AM
I do not understand why one would intentionally choose a biased sample to examine meaningful questions. If you agree with that, and claim only to be interested in examining the narrow, self-selected sample, I sincerely ask "why?".

Because it would produce something that would muddy the waters.

Which is all that T'ai Chi is here for.