JREF Homepage Swift Blog Events Calendar $1 Million Paranormal Challenge The Amaz!ng Meeting Useful Links Support Us
James Randi Educational Foundation JREF Forum
Forum Index Register Members List Events Mark Forums Read Help

Go Back   JREF Forum » JREF Topics » Million Dollar Challenge
Click Here To Donate

Notices


Welcome to the JREF Forum, where we discuss skepticism, critical thinking, the paranormal and science in a friendly but lively way. You are currently viewing the forum as a guest, which means you are missing out on discussing matters that are of interest to you. Please consider registering so you can gain full use of the forum features and interact with other Members. Registration is simple, fast and free! Click here to register today.

Reply
Old 10th November 2012, 05:32 PM   #1
FluffyPersian
Critical Thinker
 
FluffyPersian's Avatar
 
Join Date: Aug 2011
Posts: 258
What's the required p-value to beat?

From an earlier thread:

Originally Posted by Lenoxus View Post
3. Some folks have argued that the Challenge is unfair because the p-values and effect sizes called for are too extreme. Have any applicants claimed to have paranormal abilities with only a small effect size?
What is the requires p-value? I tried searching around, but no number came up.

More importantly, are there any negotiations in which there were debates over this (even if they don't include the word "p-value")? Another poster mentioned that there were some negotiations over this with Ziborov, but I found little to that effect.

I ask because I teach a subject that involves applied statistics. I'd love to use an attempted demonstration of the supernatural as an example, because the meaning of "happened by chance" really stands out in this context.
FluffyPersian is offline   Quote this post in a PM   Nominate this post for this month's language award Copy a direct link to this post Reply With Quote Back to Top
Old 10th November 2012, 07:49 PM   #2
xterra
So far, so good...
 
xterra's Avatar
 
Join Date: Apr 2012
Location: On the outskirts of Nowhere; the middle was too crowded
Posts: 1,290
Generally p < .01 for the preliminary challenge.
xterra is offline   Quote this post in a PM   Nominate this post for this month's language award Copy a direct link to this post Reply With Quote Back to Top
Old 10th November 2012, 09:44 PM   #3
rjh01
Gentleman of leisure
Tagger
 
rjh01's Avatar
 
Join Date: May 2005
Location: Flying around in the sky
Posts: 19,641
I think the answer is 0.001 for the first test. Though sometimes this cannot be calculated. For example if I claimed I could defy gravity by rising from the ground then that would be impossible. So if I could demonstrate it then I win.
rjh01 is offline   Quote this post in a PM   Nominate this post for this month's language award Copy a direct link to this post Reply With Quote Back to Top
Old 10th November 2012, 10:09 PM   #4
rjh01
Gentleman of leisure
Tagger
 
rjh01's Avatar
 
Join Date: May 2005
Location: Flying around in the sky
Posts: 19,641
Here is a challange that did have p value. I leave it up to you to work out what the value was CONNIE SONNE, Dowser

She failed.
rjh01 is offline   Quote this post in a PM   Nominate this post for this month's language award Copy a direct link to this post Reply With Quote Back to Top
Old 10th November 2012, 10:24 PM   #5
FluffyPersian
Critical Thinker
 
FluffyPersian's Avatar
 
Join Date: Aug 2011
Posts: 258
Thanks, xterra and rjh. RJ, the example you gave inspired me to come up with a hybrid example.

.001 is indeed high. In most research, 0.025 is the p-value used for a two-tailed test (can't be different from...). My guess is that this is to safeguard the one million in case someone without legitimate supernatural powers (er, everyone) kept going in for repeated challenges.

Last edited by FluffyPersian; 10th November 2012 at 10:32 PM.
FluffyPersian is offline   Quote this post in a PM   Nominate this post for this month's language award Copy a direct link to this post Reply With Quote Back to Top
Old 10th November 2012, 11:40 PM   #6
Pixel42
Schrödinger's cat
 
Pixel42's Avatar
 
Join Date: May 2004
Location: Wiltshire, UK
Posts: 5,908
1:1000 seems to be a general rule of thumb for the preliminary test, but because claims (and therefore test protocols) vary so wildly JREF seem to be reluctant to state that officially.

Most people assume that the success criteria would be higher for the final test, though a simple repetition of the preliminary test would produce combined odds of 1:1,000,000 which seems adequate to me. But until and unless someone passes the preliminary test, that question is obviously moot.
__________________
"The correct scientific response to anything that is not understood is always to look harder for the explanation, not give up and assume a supernatural cause". David Attenborough.
Pixel42 is offline   Quote this post in a PM   Nominate this post for this month's language award Copy a direct link to this post Reply With Quote Back to Top
Old 11th November 2012, 12:34 AM   #7
rjh01
Gentleman of leisure
Tagger
 
rjh01's Avatar
 
Join Date: May 2005
Location: Flying around in the sky
Posts: 19,641
Originally Posted by FluffyPersian View Post
Thanks, xterra and rjh. RJ, the example you gave inspired me to come up with a hybrid example.

.001 is indeed high. In most research, 0.025 is the p-value used for a two-tailed test (can't be different from...). My guess is that this is to safeguard the one million in case someone without legitimate supernatural powers (er, everyone) kept going in for repeated challenges.
The difference is that it does not really matter if a piece of research is wrong. The research would be repeated and found to be wrong. In fact a lot of it is incorrect. That is why you have metadata in research. In the MDC it does matter if the result is incorrect. JREF could lose $1m.
rjh01 is offline   Quote this post in a PM   Nominate this post for this month's language award Copy a direct link to this post Reply With Quote Back to Top
Old 11th November 2012, 09:00 AM   #8
FluffyPersian
Critical Thinker
 
FluffyPersian's Avatar
 
Join Date: Aug 2011
Posts: 258
Originally Posted by rjh01 View Post
The difference is that it does not really matter if a piece of research is wrong. The research would be repeated and found to be wrong. In fact a lot of it is incorrect. That is why you have metadata in research. In the MDC it does matter if the result is incorrect. JREF could lose $1m.
Yep, precisely what I said above. Given a high enough number of claimants, and enough repeated trials for individual claimants, sheer chance would allow someone to claim the prize if the p-value was high enough. But I think the existing initial obstacles (the need for a recommendation letter from a professor) vastly reduces the number of preliminary trials, and and a limit on the number of attempts (if it doesn't exist already) would take care of the problem altogether.

In a field like medicine, it CAN matter if a piece of research is wrong. In practice, since it often takes years for meta-analyses to appear, treatment decisions are often made on the newest research.
FluffyPersian is offline   Quote this post in a PM   Nominate this post for this month's language award Copy a direct link to this post Reply With Quote Back to Top
Old 11th November 2012, 11:22 AM   #9
xtifr
Graduate Poster
 
xtifr's Avatar
 
Join Date: Apr 2012
Location: Sol III
Posts: 1,256
As I said in the last thread, this is not intended as a lottery. Ideally, what is being aimed for is certainty. In practice, that's not always possible, but discussing the p-value certainly seems like a red flag. If you believe you have a power, then you should be confident enough not to care how small the p-value is, because you should believe you'll win handily no matter how small it is.
__________________
"Those who learn from history are doomed to watch others repeat it."
-- Anonymous Slashdot poster
"The problem with defending the purity of the English language is that English is about as pure as a cribhouse whore."
-- James Nicoll
xtifr is offline   Quote this post in a PM   Nominate this post for this month's language award Copy a direct link to this post Reply With Quote Back to Top
Old 11th November 2012, 12:27 PM   #10
FluffyPersian
Critical Thinker
 
FluffyPersian's Avatar
 
Join Date: Aug 2011
Posts: 258
Originally Posted by xtifr View Post
As I said in the last thread, this is not intended as a lottery. Ideally, what is being aimed for is certainty. If you believe you have a power, then you should be confident enough not to care how small the p-value is, because you should believe you'll win handily no matter how small it is.


I think the question of what counts as a supernatural power of prophecy is a perfectly fair one. The answer could be anything from beating chance to 100% accuracy in any given trial. Given my lack of supernatural powers, I have little stake in the answer, but I don't see why bringing it up is a red flag.

And as you've implied above, if the trial involves any element of chance (e.g. sensing what integer within a range is on a hidden sheet of paper), then it's always a lottery of sorts.

Last edited by FluffyPersian; 11th November 2012 at 12:28 PM.
FluffyPersian is offline   Quote this post in a PM   Nominate this post for this month's language award Copy a direct link to this post Reply With Quote Back to Top
Old 11th November 2012, 03:23 PM   #11
xtifr
Graduate Poster
 
xtifr's Avatar
 
Join Date: Apr 2012
Location: Sol III
Posts: 1,256
Originally Posted by FluffyPersian View Post
I think the question of what counts as a supernatural power of prophecy is a perfectly fair one. The answer could be anything from beating chance to 100% accuracy in any given trial. Given my lack of supernatural powers, I have little stake in the answer, but I don't see why bringing it up is a red flag.
I think it's a red flag because it starts off by asking what the odds are. In other words, it's treating the million dollars as a lottery instead of a prize for a successful demonstration.

The only reasonable answer is: small enough to be convincing. Otherwise you're inviting people to try to game the system.
__________________
"Those who learn from history are doomed to watch others repeat it."
-- Anonymous Slashdot poster
"The problem with defending the purity of the English language is that English is about as pure as a cribhouse whore."
-- James Nicoll
xtifr is offline   Quote this post in a PM   Nominate this post for this month's language award Copy a direct link to this post Reply With Quote Back to Top
Old 11th November 2012, 04:12 PM   #12
Beerina
Sarcastic Conqueror of Notions
 
Beerina's Avatar
 
Join Date: Mar 2004
Location: A floating island above the clouds
Posts: 24,250
Originally Posted by Pixel42 View Post
1:1000 seems to be a general rule of thumb for the preliminary test, but because claims (and therefore test protocols) vary so wildly JREF seem to be reluctant to state that officially.

Most people assume that the success criteria would be higher for the final test, though a simple repetition of the preliminary test would produce combined odds of 1:1,000,000 which seems adequate to me. But until and unless someone passes the preliminary test, that question is obviously moot.
The problem with that argument is it assumes there isn't a fatal flaw in the design.

I submit the real reason for a double-layer test is so that, should someone pass the preliminary by some manner, it will allow experts to double-check where fraud could have crept in undetected and tighten their observation for the second round.

Stats is the error most scientists make in studying the paranormal. It's about magician sleight-of-hand. There are no real odds going on (and if there is something real, a million tests in a row will succeed.)
__________________
"Great innovations should not be forced [by way of] slender majorities." - Thomas Jefferson

The government should nationalize it! Socialized, single-payer video game development and sales now! More, cheaper, better games, right? Right?

Last edited by Beerina; 11th November 2012 at 04:15 PM.
Beerina is offline   Quote this post in a PM   Nominate this post for this month's language award Copy a direct link to this post Reply With Quote Back to Top
Old 11th November 2012, 04:18 PM   #13
xterra
So far, so good...
 
xterra's Avatar
 
Join Date: Apr 2012
Location: On the outskirts of Nowhere; the middle was too crowded
Posts: 1,290
Originally Posted by FluffyPersian View Post
I think the question of what counts as a supernatural power of prophecy is a perfectly fair one. The answer could be anything from beating chance to 100% accuracy in any given trial. Given my lack of supernatural powers, I have little stake in the answer, but I don't see why bringing it up is a red flag.

And as you've implied above, if the trial involves any element of chance (e.g. sensing what integer within a range is on a hidden sheet of paper), then it's always a lottery of sorts.
FluffyPersian, Take a look at my post from the thread entitled "How are MDC protocols designed and carried out?"

http://forums.randi.org/showpost.php...8&postcount=58

Post #67 is the answer to what I asked in #58; post #77 is my response to #67.

Does this help explain why people here are not concerned with p-values?

-----------

xtifr, here is the last sentence in the original post in this thread:

"I ask because I teach a subject that involves applied statistics. I'd love to use an attempted demonstration of the supernatural as an example, because the meaning of "happened by chance" really stands out in this context."

I take this to mean that FluffyPersian is not going to become a claimant, and thus he/she* does not think there is a red flag.

As usual, if I have misconstrued, misinterpreted, or misunderstood either of you, I ask for correction so we can continue the discussion.

*FluffyPersian, for clarity, please tell us which pronoun to use.

Last edited by xterra; 11th November 2012 at 04:23 PM. Reason: footnoted item
xterra is offline   Quote this post in a PM   Nominate this post for this month's language award Copy a direct link to this post Reply With Quote Back to Top
Old 11th November 2012, 07:55 PM   #14
FluffyPersian
Critical Thinker
 
FluffyPersian's Avatar
 
Join Date: Aug 2011
Posts: 258
xterra, I'm female. Thanks for the link.
FluffyPersian is offline   Quote this post in a PM   Nominate this post for this month's language award Copy a direct link to this post Reply With Quote Back to Top
Old 12th November 2012, 01:17 PM   #15
xterra
So far, so good...
 
xterra's Avatar
 
Join Date: Apr 2012
Location: On the outskirts of Nowhere; the middle was too crowded
Posts: 1,290
FuffyPersian,

My error. I have no idea what went awry in the link I posted previously. My post in that thread was number 58, but the link showed it incorrectly.

Try this:

http://forums.randi.org/showthread.php?t=238290

Then go to page 2, and look for my username -- the easiest way is to use the find feature on your browser. From there, follow down as indicated in my previous post.

I think this will work....
xterra is offline   Quote this post in a PM   Nominate this post for this month's language award Copy a direct link to this post Reply With Quote Back to Top
Old 14th November 2012, 05:29 PM   #16
LashL
Goddess of Legaltainment™
Administrator
 
LashL's Avatar
 
Join Date: Aug 2006
Posts: 33,178
Originally Posted by xterra View Post
FuffyPersian,

My error. I have no idea what went awry in the link I posted previously. My post in that thread was number 58, but the link showed it incorrectly.

...
Xterra, your post is still number 58, but you might instead wish to link via the little "link" button at the bottom right of the posts you wish to cite.

58: http://forums.randi.org/showthread.p...76#post8389976

67: http://forums.randi.org/showthread.p...70#post8391270

77: http://forums.randi.org/showthread.p...67#post8392067

Hope that helps.
LashL is offline   Quote this post in a PM   Nominate this post for this month's language award Copy a direct link to this post Reply With Quote Back to Top
Old 14th November 2012, 05:57 PM   #17
xterra
So far, so good...
 
xterra's Avatar
 
Join Date: Apr 2012
Location: On the outskirts of Nowhere; the middle was too crowded
Posts: 1,290
Thanks. I'll keep that in mind.
xterra is offline   Quote this post in a PM   Nominate this post for this month's language award Copy a direct link to this post Reply With Quote Back to Top
Old 15th November 2012, 09:23 AM   #18
Musibrique
Scholar
 
Join Date: Apr 2012
Posts: 52
Originally Posted by Pixel42 View Post
Most people assume that the success criteria would be higher for the final test, though a simple repetition of the preliminary test would produce combined odds of 1:1,000,000 which seems adequate to me. But until and unless someone passes the preliminary test, that question is obviously moot.
The probability of getting a p-value of 0.001 assuming the claim is true depends on the statistical power of the test. The power in the test depends on the sample-size, alpha level, and the effect-size of the claim. Unfortunately, small effect-sizes are generally hard (not impossible) to detect at the 0.001 significance level when the sample-size is small. Otherwise, there's a good chance they will detect the effect. Since I seriously doubt that claimants know how strong or weak their paranormal claims are, chances are they are being tested under inappropriate conditions.

I agree that alpha of 0.001 is a standard in the preliminary test. However, if what Pixel said is true that a single replication of the preliminary creates a p-value of 0.000001, then the claimant better be good on whatever he claims. If they don't combine them, then I guess the claimant must as well beat odds of billion to one in order to pass the formal test.
Musibrique is offline   Quote this post in a PM   Nominate this post for this month's language award Copy a direct link to this post Reply With Quote Back to Top
Old 15th November 2012, 09:33 AM   #19
Musibrique
Scholar
 
Join Date: Apr 2012
Posts: 52
Originally Posted by Beerina View Post
There are no real odds going on (and if there is something real, a million tests in a row will succeed.)
A million tests will succeed in a row? That is so unrealistic in practical terms, even via conventional research. So, if a study found a p-value of 0.001, what is the probability of getting five 0.001 p-values in a row? Simple! 1_ X 10^-14

Not even conventional research has reach those kinds of odds.

Last edited by Musibrique; 15th November 2012 at 09:34 AM.
Musibrique is offline   Quote this post in a PM   Nominate this post for this month's language award Copy a direct link to this post Reply With Quote Back to Top
Old 15th November 2012, 11:02 AM   #20
GnaGnaMan
Critical Thinker
 
Join Date: Sep 2008
Posts: 399
Originally Posted by FluffyPersian View Post
Yep, precisely what I said above. Given a high enough number of claimants, and enough repeated trials for individual claimants, sheer chance would allow someone to claim the prize if the p-value was high enough. But I think the existing initial obstacles (the need for a recommendation letter from a professor) vastly reduces the number of preliminary trials, and and a limit on the number of attempts (if it doesn't exist already) would take care of the problem altogether.
Obviously, the question is how many trials you expect to run in total. To safeguard the million, you'd want the chance that the million is paid out to be low, even after all of them are done. And by low I mean fraction of a percent.
If we expect a thousand trials then a million to one is the least that will do.

Given the rather large population of professional psychics (IE potential claimants at whom the challenge is actually aimed), expecting thousands of applicants seems reasonable.
__________________
"I don't think it's quite fair to condemn the whole program because of a single slip-up."
GnaGnaMan is offline   Quote this post in a PM   Nominate this post for this month's language award Copy a direct link to this post Reply With Quote Back to Top
Old 15th November 2012, 12:05 PM   #21
Pixel42
Schrödinger's cat
 
Pixel42's Avatar
 
Join Date: May 2004
Location: Wiltshire, UK
Posts: 5,908
Originally Posted by Musibrique View Post
the claimant better be good on whatever he claims.
If the claimant is any good at all then they will do consistently better than chance, and their ability will become more and more obvious with each test as the probability of their success being due solely to chance steadily decreases.
__________________
"The correct scientific response to anything that is not understood is always to look harder for the explanation, not give up and assume a supernatural cause". David Attenborough.
Pixel42 is offline   Quote this post in a PM   Nominate this post for this month's language award Copy a direct link to this post Reply With Quote Back to Top
Old 15th November 2012, 01:24 PM   #22
EdG
Scholar
 
Join Date: Apr 2012
Location: Arizona, USA
Posts: 108
Originally Posted by Beerina View Post
if there is something real, a million tests in a row will succeed
Ah, so that's why every baseball player bats 1.000. And every chess grand master has never lost a game. And every astrophysicist has never made a math error. Oh, wait a minute.....
EdG is offline   Quote this post in a PM   Nominate this post for this month's language award Copy a direct link to this post Reply With Quote Back to Top
Old 15th November 2012, 01:45 PM   #23
Musibrique
Scholar
 
Join Date: Apr 2012
Posts: 52
Originally Posted by Pixel42 View Post
If the claimant is any good at all then they will do consistently better than chance, and their ability will become more and more obvious with each test as the probability of their success being due solely to chance steadily decreases.
True, but the question here is the sample-size and the power of the study. Is the sample-size/power appropriate enough for the test to detect the claimant's claim?
Musibrique is offline   Quote this post in a PM   Nominate this post for this month's language award Copy a direct link to this post Reply With Quote Back to Top
Old 15th November 2012, 01:50 PM   #24
Pixel42
Schrödinger's cat
 
Pixel42's Avatar
 
Join Date: May 2004
Location: Wiltshire, UK
Posts: 5,908
Originally Posted by EdG View Post
Ah, so that's why every baseball player bats 1.000. And every chess grand master has never lost a game. And every astrophysicist has never made a math error. Oh, wait a minute.....
But in all those cases it would become clear very quickly that those individuals were doing considerably better than they would be expected to do if they were just swinging the bat/making moves/writing down figures at random.
__________________
"The correct scientific response to anything that is not understood is always to look harder for the explanation, not give up and assume a supernatural cause". David Attenborough.
Pixel42 is offline   Quote this post in a PM   Nominate this post for this month's language award Copy a direct link to this post Reply With Quote Back to Top
Old 15th November 2012, 02:01 PM   #25
Pixel42
Schrödinger's cat
 
Pixel42's Avatar
 
Join Date: May 2004
Location: Wiltshire, UK
Posts: 5,908
Originally Posted by Musibrique View Post
True, but the question here is the sample-size and the power of the study. Is the sample-size/power appropriate enough for the test to detect the claimant's claim?
That depends on what the claimant's claim is. Most claimants claim a considerably higher success rate than they need to achieve to reach the sort of success criteria JREF usually set. For example dowsers usually expect to be able to tell the difference between a buried barrel of water and a buried barrel of sand every time, so the 70% or 80% success rate that's actually needed should be a doddle.

What needs to be remembered is that the applicants never actually do any better than chance. It's not that they do a little bit better, but not well enough to meet the JREF success criteria - their results are always well within that which would be expected by chance alone.
__________________
"The correct scientific response to anything that is not understood is always to look harder for the explanation, not give up and assume a supernatural cause". David Attenborough.
Pixel42 is offline   Quote this post in a PM   Nominate this post for this month's language award Copy a direct link to this post Reply With Quote Back to Top
Old 15th November 2012, 07:57 PM   #26
gnome
Penultimate Amazing
 
gnome's Avatar
 
Join Date: Aug 2001
Posts: 10,321
Even if someone only claims a minimal success rate above chance, sufficient repetition could make achieving the required p-level not difficult at all...
__________________

gnome is offline   Quote this post in a PM   Nominate this post for this month's language award Copy a direct link to this post Reply With Quote Back to Top
Old 16th November 2012, 07:50 AM   #27
edd
Master Poster
 
edd's Avatar
 
Join Date: Nov 2007
Posts: 2,110
Originally Posted by Pixel42 View Post
That depends on what the claimant's claim is. Most claimants claim a considerably higher success rate than they need to achieve to reach the sort of success criteria JREF usually set. For example dowsers usually expect to be able to tell the difference between a buried barrel of water and a buried barrel of sand every time, so the 70% or 80% success rate that's actually needed should be a doddle.

What needs to be remembered is that the applicants never actually do any better than chance. It's not that they do a little bit better, but not well enough to meet the JREF success criteria - their results are always well within that which would be expected by chance alone.
I disapprove of p-values, particularly when applied to hypothesis testing for deeply implausible situations as the JREF tests.

A p-value is usually giving an estimate of the result occurring by chance. This isn't what we're interested in - we want to know the chance the person has paranormal abilities. A p-value of 0.001 is not useful if someone is claiming an ability that you a priori consider much less likely than that.

I'd therefore naturally argue that you want to do a Bayesian model comparison. In practice I'd be prepared to admit that sufficiently strong tests are going to reach the same conclusion whichever approach you take.

However, I think that there's also some educational value in the fact that this approach should encourage applicants to make strong claims about their ability. If a dowser thinks they can perform right 70-80% of the time they should be encouraged to go for that and be tested on that, and if they don't want to then they can broaden their claim at the expense of having to work harder to demonstrate it by needing a larger sample size.

(It's also the sort of approach that is more likely to lead you to a correct conclusion when yet another homeopath claims p < 0.01 results or something, so I think it's considerably more useful when you're at risk of seeing publishing biases)
__________________
When I look up at the night sky and think about the billions of stars out there, I think to myself: I'm amazing. - Peter Serafinowicz
edd is offline   Quote this post in a PM   Nominate this post for this month's language award Copy a direct link to this post Reply With Quote Back to Top
Old 16th November 2012, 08:58 AM   #28
Startz
Critical Thinker
 
Join Date: Nov 2004
Location: Santa Barbara, CA
Posts: 428
Originally Posted by edd View Post
I disapprove of p-values, particularly when applied to hypothesis testing for deeply implausible situations as the JREF tests.

A p-value is usually giving an estimate of the result occurring by chance. This isn't what we're interested in - we want to know the chance the person has paranormal abilities. A p-value of 0.001 is not useful if someone is claiming an ability that you a priori consider much less likely than that.

I'd therefore naturally argue that you want to do a Bayesian model comparison. In practice I'd be prepared to admit that sufficiently strong tests are going to reach the same conclusion whichever approach you take.

snip...
While I agree in general principle, in the case a Bayesian model comparison is problematic precisely because JREF and challengers disagree on the model priors.

More to the point probably, JREF is pretty clear that this is not a scientific investigation to uncover the truth. It's

a) a chance for a challenger to prove JREF wrong (in which case a classical test is probably reasonable).

b) a publicity stunt...so the statistical stuff is just a safeguard against something going wrong accidentally.

In my one experience trying to help an applicant negotiate a protocol with JREF there was indeed an issue of a small effect size requiring a somewhat lengthy test. Basically, JREF was unwilling/unable to deal with it. This makes me suspect that item (b) is what governs. (Which I don't have a problem with.)

Last edited by Startz; 16th November 2012 at 09:22 AM.
Startz is offline   Quote this post in a PM   Nominate this post for this month's language award Copy a direct link to this post Reply With Quote Back to Top
Old 16th November 2012, 08:59 AM   #29
Musibrique
Scholar
 
Join Date: Apr 2012
Posts: 52
Originally Posted by Pixel42 View Post
That depends on what the claimant's claim is. Most claimants claim a considerably higher success rate than they need to achieve to reach the sort of success criteria JREF usually set. For example dowsers usually expect to be able to tell the difference between a buried barrel of water and a buried barrel of sand every time, so the 70% or 80% success rate that's actually needed should be a doddle.

What needs to be remembered is that the applicants never actually do any better than chance. It's not that they do a little bit better, but not well enough to meet the JREF success criteria - their results are always well within that which would be expected by chance alone.
Hmm, I can't argue about that with higher hit rates since it seems reasonable what you're saying. The only problem I have with the challenge are the marginal hit rates since that would require more sample-size than larger hit rates.
Musibrique is offline   Quote this post in a PM   Nominate this post for this month's language award Copy a direct link to this post Reply With Quote Back to Top
Old 16th November 2012, 09:08 AM   #30
Musibrique
Scholar
 
Join Date: Apr 2012
Posts: 52
Originally Posted by edd View Post
A p-value is usually giving an estimate of the result occurring by chance. This isn't what we're interested in - we want to know the chance the person has paranormal abilities. A p-value of 0.001 is not useful if someone is claiming an ability that you a priori consider much less likely than that.
P-values are actually quite useful. The p-value basically tells you how likely it of getting an observation extreme or more than extreme if the null-hypothesis is true. The p-value basically measures the evidence for the null-hypothesis. If the p-value is greater than the standard 0.05, then it can't be argued that the null-hypothesis should be rejected. If, on the other hand, is less than 0.05, then it can be said that the null should be rejected. Keep in mind that the p-value tells you the probability of the result occuring by chance, not the alternative hypothesis. If P=0.05, then there is a 0.95 chance that the alternative is correct.

Quote:
I'd therefore naturally argue that you want to do a Bayesian model comparison. In practice I'd be prepared to admit that sufficiently strong tests are going to reach the same conclusion whichever approach you take.
Bayesian Statistics is generally quite controversial in the statistical community. Stick with point estimates and confidence intervals.

Quote:
However, I think that there's also some educational value in the fact that this approach should encourage applicants to make strong claims about their ability. If a dowser thinks they can perform right 70-80% of the time they should be encouraged to go for that and be tested on that, and if they don't want to then they can broaden their claim at the expense of having to work harder to demonstrate it by needing a larger sample size.
Agree.

Quote:
(It's also the sort of approach that is more likely to lead you to a correct conclusion when yet another homeopath claims p < 0.01 results or something, so I think it's considerably more useful when you're at risk of seeing publishing biases)
Publication bias is one thing, multiple analyses is another thing as well.

Last edited by Musibrique; 16th November 2012 at 09:11 AM.
Musibrique is offline   Quote this post in a PM   Nominate this post for this month's language award Copy a direct link to this post Reply With Quote Back to Top
Old 16th November 2012, 09:26 AM   #31
Startz
Critical Thinker
 
Join Date: Nov 2004
Location: Santa Barbara, CA
Posts: 428
Originally Posted by Musibrique View Post
P-values are actually quite useful. The p-value basically tells you how likely it of getting an observation extreme or more than extreme if the null-hypothesis is true.

... If P=0.05, then there is a 0.95 chance that the alternative is correct.
That last quoted sentence is not what a P-value means, which I imagine is why edd was suggesting a Bayesian analysis.
Startz is offline   Quote this post in a PM   Nominate this post for this month's language award Copy a direct link to this post Reply With Quote Back to Top
Old 16th November 2012, 09:30 AM   #32
Musibrique
Scholar
 
Join Date: Apr 2012
Posts: 52
Originally Posted by Startz View Post
That last quoted sentence is not what a P-value means, which I imagine is why edd was suggesting a Bayesian analysis.
Why not? Aren't p-values and confidence intervals connected? P=0.05, hence you can be 95% confident that the observed result is due to the alternative hypothesis whereas there's a 5% chance that the observed result is a Type I Error.

Also, I don't agree with his Bayesian approach. Bayesian Statistics is quite controversial and problematic in the statistical community. That's why I said stick with point estimates and confidence intervals.

Last edited by Musibrique; 16th November 2012 at 09:33 AM.
Musibrique is offline   Quote this post in a PM   Nominate this post for this month's language award Copy a direct link to this post Reply With Quote Back to Top
Old 16th November 2012, 09:56 AM   #33
Startz
Critical Thinker
 
Join Date: Nov 2004
Location: Santa Barbara, CA
Posts: 428
Originally Posted by Musibrique View Post
Why not? Aren't p-values and confidence intervals connected? P=0.05, hence you can be 95% confident that the observed result is due to the alternative hypothesis whereas there's a 5% chance that the observed result is a Type I Error.

Also, I don't agree with his Bayesian approach. Bayesian Statistics is quite controversial and problematic in the statistical community. That's why I said stick with point estimates and confidence intervals.
Sure p-values and confidence intervals are connected. The right statement is that 95% of the time the confidence interval includes the true value of the parameter. The confidence interval is not a posterior distribution for the true value, although it maybe approximately so...if you're a Bayesian.

Loosely speaking, the problem is that Bayes law (nothing to do with being a Bayesian) requires paying attention to Type II error as well as Type I error.

And my reading is that Bayesian statistics is much less controversial than it once was, although there remain skeptics on both sides.

[Note to mods: I assume if this drifts too far you'll move it.]
Startz is offline   Quote this post in a PM   Nominate this post for this month's language award Copy a direct link to this post Reply With Quote Back to Top
Old 16th November 2012, 10:20 AM   #34
edd
Master Poster
 
edd's Avatar
 
Join Date: Nov 2007
Posts: 2,110
Originally Posted by Startz View Post
While I agree in general principle, in the case a Bayesian model comparison is problematic precisely because JREF and challengers disagree on the model priors.
Absolutely agree (and with the stuff I've trimmed).
__________________
When I look up at the night sky and think about the billions of stars out there, I think to myself: I'm amazing. - Peter Serafinowicz
edd is offline   Quote this post in a PM   Nominate this post for this month's language award Copy a direct link to this post Reply With Quote Back to Top
Old 16th November 2012, 01:24 PM   #35
EdG
Scholar
 
Join Date: Apr 2012
Location: Arizona, USA
Posts: 108
Originally Posted by Pixel42 View Post
But in all those cases it would become clear very quickly that those individuals were doing considerably better than they would be expected to do if they were just swinging the bat/making moves/writing down figures at random.
I don't understand what your comment has to do with mine. I was responding to Beerina's claim that a million tests in a row need to succeed. Why should psychic abilities require 100% accuracy? If they exist, they likely operate the same way other human abilities do, subject to constraints, good days/bad days, and external stressors. The very best batters only hit about 10% of the pitches thrown their way. Why does Beerina think psychics could successfully perform a million tests in a row when no other human endeavor can?
EdG is offline   Quote this post in a PM   Nominate this post for this month's language award Copy a direct link to this post Reply With Quote Back to Top
Old 17th November 2012, 01:27 AM   #36
GzuzKryzt
Philosopher
 
Join Date: Aug 2005
Posts: 6,367
Originally Posted by EdG View Post
I don't understand what your comment has to do with mine. I was responding to Beerina's claim that a million tests in a row need to succeed. Why should psychic abilities require 100% accuracy? If they exist, they likely operate the same way other human abilities do, subject to constraints, good days/bad days, and external stressors. The very best batters only hit about 10% of the pitches thrown their way. Why does Beerina think psychics could successfully perform a million tests in a row when no other human endeavor can?
First, I think Beerina was speaking metaphorically.

Second, without sufficient data I would try to refrain from speculation what psychic abilities - should they exist - can and cannot do, how they are influenced, etc.

Third, picking baseball hitters is a clever ploy because in baseball success for a hitter is (roughly) defined b a .300 batting average. One could as easily have chosen baseball pitchers, even better relievers, and see success rate jump significantly. But that would have weakened one's argument, would it not?

Conclusion: What people like Beerina, Pixel42 and myself are trying to convey is, that e.g. a spoonbender sitting in a comfortable kitchen should have a blow-us-all-away success rate, easily clarifying something "paranormal" or "supernatural" going on.
Under controlled conditions absolutely eliminating manipulation from both sides, this success rate would be one in a million.

Furthermore, that would be a noodle-scratcher for both sides, would it not?
GzuzKryzt is offline   Quote this post in a PM   Nominate this post for this month's language award Copy a direct link to this post Reply With Quote Back to Top
Old 17th November 2012, 01:28 AM   #37
Pixel42
Schrödinger's cat
 
Pixel42's Avatar
 
Join Date: May 2004
Location: Wiltshire, UK
Posts: 5,908
Originally Posted by EdG View Post
I don't understand what your comment has to do with mine.
I was just pointing out that even if we concede your point that we shouldn't expect these abilities to be any more consistent than those of talented batsmen, chess players etc, we would still expect that they would (as with such abilities) produce results that are significantly better than random chance. And they don't.
__________________
"The correct scientific response to anything that is not understood is always to look harder for the explanation, not give up and assume a supernatural cause". David Attenborough.
Pixel42 is offline   Quote this post in a PM   Nominate this post for this month's language award Copy a direct link to this post Reply With Quote Back to Top
Old 17th November 2012, 07:44 AM   #38
Musibrique
Scholar
 
Join Date: Apr 2012
Posts: 52
Originally Posted by Pixel42 View Post
I was just pointing out that even if we concede your point that we shouldn't expect these abilities to be any more consistent than those of talented batsmen, chess players etc, we would still expect that they would (as with such abilities) produce results that are significantly better than random chance. And they don't.
That's why in Statistics we calculate the Type I Error probability before doing a one/two-tailed t-test. Since the Type I error rate for the preliminary is 0.001, hence we would expect by average one in a thousand applicants to pass by dumb luck. If the significant results were significantly better than the thousand to one rate, we can conceive these results as evidence for the paranormal. This can be determined by calculating the p-value of significant studies out of non-significant ones.

Unless the JREF decided to combine the p-value, the overall Type I Error probability of the claimant passing both tests is a billion to one.

Expecting an exact 100% or near 100% replication is very ridiculous and extremely conservative. Telling a psychic to pass 100 tests in a row is like telling famous baskeball player, Brian, to never miss a basket.

Last edited by Musibrique; 17th November 2012 at 07:47 AM.
Musibrique is offline   Quote this post in a PM   Nominate this post for this month's language award Copy a direct link to this post Reply With Quote Back to Top
Old 17th November 2012, 10:47 AM   #39
GzuzKryzt
Philosopher
 
Join Date: Aug 2005
Posts: 6,367
Originally Posted by Musibrique View Post
...
Telling a psychic to pass 100 tests in a row is like telling famous baskeball player, Brian, to never miss a basket.
Where and when has a "psychic" been asked to pass 100 tests in a row?
GzuzKryzt is offline   Quote this post in a PM   Nominate this post for this month's language award Copy a direct link to this post Reply With Quote Back to Top
Old 17th November 2012, 11:36 AM   #40
EdG
Scholar
 
Join Date: Apr 2012
Location: Arizona, USA
Posts: 108
Originally Posted by William Smith View Post
Second, without sufficient data I would try to refrain from speculation what psychic abilities - should they exist - can and cannot do, how they are influenced, etc.
Why? Any human ability should fall within normal parameters compared to other human abilities. Anyone can play piano after a few lessons, but only some people will reach virtuoso level after many years of study and practice.

Originally Posted by William Smith View Post
Third, picking baseball hitters is a clever ploy because in baseball success for a hitter is (roughly) defined b a .300 batting average. One could as easily have chosen baseball pitchers, even better relievers, and see success rate jump significantly. But that would have weakened one's argument, would it not?
An exceptional baseball pitcher may be defined as one who pitches a no-hitter game. There have only been 236 no-hitters in the past 111 years, so the success rate does not exactly jump significantly.

And no, despite your bizarre claim about my presumed motive, choosing pitchers or any other skilled human would not weaken my argument. Education and practice are the keys to acquiring skill in any field. If psychic skills exist, why should they be any different? Because you say so?
EdG is offline   Quote this post in a PM   Nominate this post for this month's language award Copy a direct link to this post Reply With Quote Back to Top
Reply

JREF Forum » JREF Topics » Million Dollar Challenge

Bookmarks

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump


All times are GMT -7. The time now is 12:49 AM.
Powered by vBulletin. Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
© 2001-2013, James Randi Educational Foundation. All Rights Reserved.

Disclaimer: Messages posted in the Forum are solely the opinion of their authors.