| JREF Homepage | Swift Blog | Events Calendar | $1 Million Paranormal Challenge | The Amaz!ng Meeting | Useful Links | Support Us |
![]() |
|
|
|
|||||||
| Notices |
| Welcome to the JREF Forum, where we discuss skepticism, critical thinking, the paranormal and science in a friendly but lively way. You are currently viewing the forum as a guest, which means you are missing out on discussing matters that are of interest to you. Please consider registering so you can gain full use of the forum features and interact with other Members. Registration is simple, fast and free! Click here to register today. |
|
|
#1 |
|
Thinker
Join Date: Aug 2011
Posts: 180
|
What's the required p-value to beat?
From an earlier thread:
What is the requires p-value? I tried searching around, but no number came up. More importantly, are there any negotiations in which there were debates over this (even if they don't include the word "p-value")? Another poster mentioned that there were some negotiations over this with Ziborov, but I found little to that effect. I ask because I teach a subject that involves applied statistics. I'd love to use an attempted demonstration of the supernatural as an example, because the meaning of "happened by chance" really stands out in this context. |
|
|
|
|
#2 |
|
Muse
Join Date: Apr 2012
Location: On the outskirts of Nowhere; the middle was too crowded
Posts: 625
|
Generally p < .01 for the preliminary challenge.
|
|
|
|
|
#3 |
|
Gentleman of leisure
Tagger
Join Date: May 2005
Location: Planet Earth
Posts: 17,181
|
I think the answer is 0.001 for the first test. Though sometimes this cannot be calculated. For example if I claimed I could defy gravity by rising from the ground then that would be impossible. So if I could demonstrate it then I win.
|
|
__________________
dddffffpppqqqq Want to use your computer for something that will make society better? See this thread for details Folding@home |
|
|
|
|
|
#4 |
|
Gentleman of leisure
Tagger
Join Date: May 2005
Location: Planet Earth
Posts: 17,181
|
Here is a challange that did have p value. I leave it up to you to work out what the value was CONNIE SONNE, Dowser
She failed. |
|
__________________
dddffffpppqqqq Want to use your computer for something that will make society better? See this thread for details Folding@home |
|
|
|
|
|
#5 |
|
Thinker
Join Date: Aug 2011
Posts: 180
|
Thanks, xterra and rjh. RJ, the example you gave inspired me to come up with a hybrid example.
.001 is indeed high. In most research, 0.025 is the p-value used for a two-tailed test (can't be different from...). My guess is that this is to safeguard the one million in case someone without legitimate supernatural powers (er, everyone) kept going in for repeated challenges. |
|
|
|
|
#6 |
|
Schrödinger's cat
Join Date: May 2004
Location: Wiltshire, UK
Posts: 4,237
|
1:1000 seems to be a general rule of thumb for the preliminary test, but because claims (and therefore test protocols) vary so wildly JREF seem to be reluctant to state that officially.
Most people assume that the success criteria would be higher for the final test, though a simple repetition of the preliminary test would produce combined odds of 1:1,000,000 which seems adequate to me. But until and unless someone passes the preliminary test, that question is obviously moot. |
|
__________________
"The correct scientific response to anything that is not understood is always to look harder for the explanation, not give up and assume a supernatural cause". David Attenborough. |
|
|
|
|
|
#7 |
|
Gentleman of leisure
Tagger
Join Date: May 2005
Location: Planet Earth
Posts: 17,181
|
The difference is that it does not really matter if a piece of research is wrong. The research would be repeated and found to be wrong. In fact a lot of it is incorrect. That is why you have metadata in research. In the MDC it does matter if the result is incorrect. JREF could lose $1m.
|
|
__________________
dddffffpppqqqq Want to use your computer for something that will make society better? See this thread for details Folding@home |
|
|
|
|
|
#8 |
|
Thinker
Join Date: Aug 2011
Posts: 180
|
Yep, precisely what I said above. Given a high enough number of claimants, and enough repeated trials for individual claimants, sheer chance would allow someone to claim the prize if the p-value was high enough. But I think the existing initial obstacles (the need for a recommendation letter from a professor) vastly reduces the number of preliminary trials, and and a limit on the number of attempts (if it doesn't exist already) would take care of the problem altogether.
In a field like medicine, it CAN matter if a piece of research is wrong. In practice, since it often takes years for meta-analyses to appear, treatment decisions are often made on the newest research. |
|
|
|
|
#9 |
|
Muse
Join Date: Apr 2012
Location: Sol III
Posts: 563
|
As I said in the last thread, this is not intended as a lottery. Ideally, what is being aimed for is certainty. In practice, that's not always possible, but discussing the p-value certainly seems like a red flag. If you believe you have a power, then you should be confident enough not to care how small the p-value is, because you should believe you'll win handily no matter how small it is.
|
|
__________________
"Those who learn from history are doomed to watch others repeat it." -- Anonymous Slashdot poster "The problem with defending the purity of the English language is that English is about as pure as a cribhouse whore." -- James Nicoll |
|
|
|
|
|
#10 |
|
Thinker
Join Date: Aug 2011
Posts: 180
|
I think the question of what counts as a supernatural power of prophecy is a perfectly fair one. The answer could be anything from beating chance to 100% accuracy in any given trial. Given my lack of supernatural powers, I have little stake in the answer, but I don't see why bringing it up is a red flag. And as you've implied above, if the trial involves any element of chance (e.g. sensing what integer within a range is on a hidden sheet of paper), then it's always a lottery of sorts. |
|
|
|
|
#11 |
|
Muse
Join Date: Apr 2012
Location: Sol III
Posts: 563
|
I think it's a red flag because it starts off by asking what the odds are. In other words, it's treating the million dollars as a lottery instead of a prize for a successful demonstration.
The only reasonable answer is: small enough to be convincing. Otherwise you're inviting people to try to game the system. |
|
__________________
"Those who learn from history are doomed to watch others repeat it." -- Anonymous Slashdot poster "The problem with defending the purity of the English language is that English is about as pure as a cribhouse whore." -- James Nicoll |
|
|
|
|
|
#12 |
|
Sarcastic Conqueror of Notions
Join Date: Mar 2004
Location: A floating island above the clouds
Posts: 23,835
|
The problem with that argument is it assumes there isn't a fatal flaw in the design.
I submit the real reason for a double-layer test is so that, should someone pass the preliminary by some manner, it will allow experts to double-check where fraud could have crept in undetected and tighten their observation for the second round. Stats is the error most scientists make in studying the paranormal. It's about magician sleight-of-hand. There are no real odds going on (and if there is something real, a million tests in a row will succeed.) |
|
__________________
"Great innovations should not be forced [by way of] slender majorities." - Thomas Jefferson The government should nationalize it! Socialized, single-payer video game development and sales now! More, cheaper, better games, right? Right? |
|
|
|
|
|
#13 |
|
Muse
Join Date: Apr 2012
Location: On the outskirts of Nowhere; the middle was too crowded
Posts: 625
|
FluffyPersian, Take a look at my post from the thread entitled "How are MDC protocols designed and carried out?"
http://forums.randi.org/showpost.php...8&postcount=58 Post #67 is the answer to what I asked in #58; post #77 is my response to #67. Does this help explain why people here are not concerned with p-values? ----------- xtifr, here is the last sentence in the original post in this thread: "I ask because I teach a subject that involves applied statistics. I'd love to use an attempted demonstration of the supernatural as an example, because the meaning of "happened by chance" really stands out in this context." I take this to mean that FluffyPersian is not going to become a claimant, and thus he/she* does not think there is a red flag. As usual, if I have misconstrued, misinterpreted, or misunderstood either of you, I ask for correction so we can continue the discussion. *FluffyPersian, for clarity, please tell us which pronoun to use. |
|
|
|
|
#14 |
|
Thinker
Join Date: Aug 2011
Posts: 180
|
xterra, I'm female. Thanks for the link.
|
|
|
|
|
#15 |
|
Muse
Join Date: Apr 2012
Location: On the outskirts of Nowhere; the middle was too crowded
Posts: 625
|
FuffyPersian,
My error. I have no idea what went awry in the link I posted previously. My post in that thread was number 58, but the link showed it incorrectly. Try this: http://forums.randi.org/showthread.php?t=238290 Then go to page 2, and look for my username -- the easiest way is to use the find feature on your browser. From there, follow down as indicated in my previous post. I think this will work.... |
|
|
|
|
#16 |
|
Goddess of Legaltainment™
Administrator
Join Date: Aug 2006
Posts: 26,221
|
Xterra, your post is still number 58, but you might instead wish to link via the little "link" button at the bottom right of the posts you wish to cite.
58: http://forums.randi.org/showthread.p...76#post8389976 67: http://forums.randi.org/showthread.p...70#post8391270 77: http://forums.randi.org/showthread.p...67#post8392067 Hope that helps. |
|
|
|
|
#17 |
|
Muse
Join Date: Apr 2012
Location: On the outskirts of Nowhere; the middle was too crowded
Posts: 625
|
Thanks. I'll keep that in mind.
|
|
|
|
|
#18 |
|
Scholar
Join Date: Apr 2012
Posts: 52
|
The probability of getting a p-value of 0.001 assuming the claim is true depends on the statistical power of the test. The power in the test depends on the sample-size, alpha level, and the effect-size of the claim. Unfortunately, small effect-sizes are generally hard (not impossible) to detect at the 0.001 significance level when the sample-size is small. Otherwise, there's a good chance they will detect the effect. Since I seriously doubt that claimants know how strong or weak their paranormal claims are, chances are they are being tested under inappropriate conditions.
I agree that alpha of 0.001 is a standard in the preliminary test. However, if what Pixel said is true that a single replication of the preliminary creates a p-value of 0.000001, then the claimant better be good on whatever he claims. If they don't combine them, then I guess the claimant must as well beat odds of billion to one in order to pass the formal test. |
|
|
|
|
#19 |
|
Scholar
Join Date: Apr 2012
Posts: 52
|
A million tests will succeed in a row? That is so unrealistic in practical terms, even via conventional research. So, if a study found a p-value of 0.001, what is the probability of getting five 0.001 p-values in a row? Simple! 1_ X 10^-14
Not even conventional research has reach those kinds of odds. |
|
|
|
|
#20 |
|
Thinker
Join Date: Sep 2008
Posts: 192
|
Obviously, the question is how many trials you expect to run in total. To safeguard the million, you'd want the chance that the million is paid out to be low, even after all of them are done. And by low I mean fraction of a percent.
If we expect a thousand trials then a million to one is the least that will do. Given the rather large population of professional psychics (IE potential claimants at whom the challenge is actually aimed), expecting thousands of applicants seems reasonable. |
|
__________________
"Any sufficiently analyzed magic is indistinguishable from science!" |
|
|
|
|
|
#21 |
|
Schrödinger's cat
Join Date: May 2004
Location: Wiltshire, UK
Posts: 4,237
|
|
|
__________________
"The correct scientific response to anything that is not understood is always to look harder for the explanation, not give up and assume a supernatural cause". David Attenborough. |
|
|
|
|
|
#22 |
|
Scholar
Join Date: Apr 2012
Location: Arizona, USA
Posts: 108
|
|
|
|
|
|
#23 |
|
Scholar
Join Date: Apr 2012
Posts: 52
|
|
|
|
|
|
#24 |
|
Schrödinger's cat
Join Date: May 2004
Location: Wiltshire, UK
Posts: 4,237
|
|
|
__________________
"The correct scientific response to anything that is not understood is always to look harder for the explanation, not give up and assume a supernatural cause". David Attenborough. |
|
|
|
|
|
#25 |
|
Schrödinger's cat
Join Date: May 2004
Location: Wiltshire, UK
Posts: 4,237
|
That depends on what the claimant's claim is. Most claimants claim a considerably higher success rate than they need to achieve to reach the sort of success criteria JREF usually set. For example dowsers usually expect to be able to tell the difference between a buried barrel of water and a buried barrel of sand every time, so the 70% or 80% success rate that's actually needed should be a doddle.
What needs to be remembered is that the applicants never actually do any better than chance. It's not that they do a little bit better, but not well enough to meet the JREF success criteria - their results are always well within that which would be expected by chance alone. |
|
__________________
"The correct scientific response to anything that is not understood is always to look harder for the explanation, not give up and assume a supernatural cause". David Attenborough. |
|
|
|
|
|
#26 |
|
Philosopher
Join Date: Aug 2001
Posts: 9,869
|
Even if someone only claims a minimal success rate above chance, sufficient repetition could make achieving the required p-level not difficult at all...
|
|
__________________
|
|
|
|
|
|
#27 |
|
Graduate Poster
Join Date: Nov 2007
Posts: 1,554
|
I disapprove of p-values, particularly when applied to hypothesis testing for deeply implausible situations as the JREF tests.
A p-value is usually giving an estimate of the result occurring by chance. This isn't what we're interested in - we want to know the chance the person has paranormal abilities. A p-value of 0.001 is not useful if someone is claiming an ability that you a priori consider much less likely than that. I'd therefore naturally argue that you want to do a Bayesian model comparison. In practice I'd be prepared to admit that sufficiently strong tests are going to reach the same conclusion whichever approach you take. However, I think that there's also some educational value in the fact that this approach should encourage applicants to make strong claims about their ability. If a dowser thinks they can perform right 70-80% of the time they should be encouraged to go for that and be tested on that, and if they don't want to then they can broaden their claim at the expense of having to work harder to demonstrate it by needing a larger sample size. (It's also the sort of approach that is more likely to lead you to a correct conclusion when yet another homeopath claims p < 0.01 results or something, so I think it's considerably more useful when you're at risk of seeing publishing biases) |
|
__________________
When I look up at the night sky and think about the billions of stars out there, I think to myself: I'm amazing. - Peter Serafinowicz |
|
|
|
|
|
#28 |
|
Critical Thinker
Join Date: Nov 2004
Location: Santa Barbara, CA
Posts: 388
|
While I agree in general principle, in the case a Bayesian model comparison is problematic precisely because JREF and challengers disagree on the model priors.
More to the point probably, JREF is pretty clear that this is not a scientific investigation to uncover the truth. It's a) a chance for a challenger to prove JREF wrong (in which case a classical test is probably reasonable). b) a publicity stunt...so the statistical stuff is just a safeguard against something going wrong accidentally. In my one experience trying to help an applicant negotiate a protocol with JREF there was indeed an issue of a small effect size requiring a somewhat lengthy test. Basically, JREF was unwilling/unable to deal with it. This makes me suspect that item (b) is what governs. (Which I don't have a problem with.) |
|
|
|
|
#29 |
|
Scholar
Join Date: Apr 2012
Posts: 52
|
|
|
|
|
|
#30 |
|
Scholar
Join Date: Apr 2012
Posts: 52
|
P-values are actually quite useful. The p-value basically tells you how likely it of getting an observation extreme or more than extreme if the null-hypothesis is true. The p-value basically measures the evidence for the null-hypothesis. If the p-value is greater than the standard 0.05, then it can't be argued that the null-hypothesis should be rejected. If, on the other hand, is less than 0.05, then it can be said that the null should be rejected. Keep in mind that the p-value tells you the probability of the result occuring by chance, not the alternative hypothesis. If P=0.05, then there is a 0.95 chance that the alternative is correct.
Quote:
Quote:
Quote:
|
|
|
|
|
#31 |
|
Critical Thinker
Join Date: Nov 2004
Location: Santa Barbara, CA
Posts: 388
|
|
|
|
|
|
#32 |
|
Scholar
Join Date: Apr 2012
Posts: 52
|
Why not? Aren't p-values and confidence intervals connected? P=0.05, hence you can be 95% confident that the observed result is due to the alternative hypothesis whereas there's a 5% chance that the observed result is a Type I Error.
Also, I don't agree with his Bayesian approach. Bayesian Statistics is quite controversial and problematic in the statistical community. That's why I said stick with point estimates and confidence intervals. |
|
|
|
|
#33 |
|
Critical Thinker
Join Date: Nov 2004
Location: Santa Barbara, CA
Posts: 388
|
Sure p-values and confidence intervals are connected. The right statement is that 95% of the time the confidence interval includes the true value of the parameter. The confidence interval is not a posterior distribution for the true value, although it maybe approximately so...if you're a Bayesian.
Loosely speaking, the problem is that Bayes law (nothing to do with being a Bayesian) requires paying attention to Type II error as well as Type I error. And my reading is that Bayesian statistics is much less controversial than it once was, although there remain skeptics on both sides. [Note to mods: I assume if this drifts too far you'll move it.] |
|
|
|
|
#34 |
|
Graduate Poster
Join Date: Nov 2007
Posts: 1,554
|
|
|
__________________
When I look up at the night sky and think about the billions of stars out there, I think to myself: I'm amazing. - Peter Serafinowicz |
|
|
|
|
|
#35 |
|
Scholar
Join Date: Apr 2012
Location: Arizona, USA
Posts: 108
|
I don't understand what your comment has to do with mine. I was responding to Beerina's claim that a million tests in a row need to succeed. Why should psychic abilities require 100% accuracy? If they exist, they likely operate the same way other human abilities do, subject to constraints, good days/bad days, and external stressors. The very best batters only hit about 10% of the pitches thrown their way. Why does Beerina think psychics could successfully perform a million tests in a row when no other human endeavor can?
|
|
|
|
|
#36 |
|
Philosopher
Join Date: Aug 2005
Posts: 6,367
|
First, I think Beerina was speaking metaphorically.
Second, without sufficient data I would try to refrain from speculation what psychic abilities - should they exist - can and cannot do, how they are influenced, etc. Third, picking baseball hitters is a clever ploy because in baseball success for a hitter is (roughly) defined b a .300 batting average. One could as easily have chosen baseball pitchers, even better relievers, and see success rate jump significantly. But that would have weakened one's argument, would it not? Conclusion: What people like Beerina, Pixel42 and myself are trying to convey is, that e.g. a spoonbender sitting in a comfortable kitchen should have a blow-us-all-away success rate, easily clarifying something "paranormal" or "supernatural" going on. Under controlled conditions absolutely eliminating manipulation from both sides, this success rate would be one in a million. Furthermore, that would be a noodle-scratcher for both sides, would it not? |
|
|
|
|
#37 |
|
Schrödinger's cat
Join Date: May 2004
Location: Wiltshire, UK
Posts: 4,237
|
I was just pointing out that even if we concede your point that we shouldn't expect these abilities to be any more consistent than those of talented batsmen, chess players etc, we would still expect that they would (as with such abilities) produce results that are significantly better than random chance. And they don't.
|
|
__________________
"The correct scientific response to anything that is not understood is always to look harder for the explanation, not give up and assume a supernatural cause". David Attenborough. |
|
|
|
|
|
#38 |
|
Scholar
Join Date: Apr 2012
Posts: 52
|
That's why in Statistics we calculate the Type I Error probability before doing a one/two-tailed t-test. Since the Type I error rate for the preliminary is 0.001, hence we would expect by average one in a thousand applicants to pass by dumb luck. If the significant results were significantly better than the thousand to one rate, we can conceive these results as evidence for the paranormal. This can be determined by calculating the p-value of significant studies out of non-significant ones.
Unless the JREF decided to combine the p-value, the overall Type I Error probability of the claimant passing both tests is a billion to one. Expecting an exact 100% or near 100% replication is very ridiculous and extremely conservative. Telling a psychic to pass 100 tests in a row is like telling famous baskeball player, Brian, to never miss a basket. |
|
|
|
|
#39 |
|
Philosopher
Join Date: Aug 2005
Posts: 6,367
|
|
|
|
|
|
#40 |
|
Scholar
Join Date: Apr 2012
Location: Arizona, USA
Posts: 108
|
Why? Any human ability should fall within normal parameters compared to other human abilities. Anyone can play piano after a few lessons, but only some people will reach virtuoso level after many years of study and practice.
An exceptional baseball pitcher may be defined as one who pitches a no-hitter game. There have only been 236 no-hitters in the past 111 years, so the success rate does not exactly jump significantly. And no, despite your bizarre claim about my presumed motive, choosing pitchers or any other skilled human would not weaken my argument. Education and practice are the keys to acquiring skill in any field. If psychic skills exist, why should they be any different? Because you say so? |
|
|
![]() |
| Bookmarks |
| Thread Tools | |
|
|