PDA

View Full Version : Check my methodology - prayer study


Pages : [1] 2

saizai
22nd August 2006, 09:55 PM
See http://www.prayermatch.org/ . It should be a complete description - methodology, goals, my intent / opinion, etc.

The backend programming isn't ready yet but the basics (i.e. user accounts and the public pages) are there. I intend to begin once a sufficient number of participants are signed up; the backend will be ready by then.

If you have a critique, please make sure that:
* you've read all the pages linked from the main page
* you can explain why your perceived flaw in my design would cause a false positive result, i.e. a statistically significant difference between the active and control groups of Recipients in the second and/or third round

I am aware that I have put limits on it that may cause false negatives, and am quite okay with that; my problem not yours. ;)

If I have left out anything it is probably by mistake (I only just finished writing the content); point it out and I'll correct it.

BTW, I have previously suggested this as a bona fide MDC, but the understanding I reached with Randi's representative was that they are only interested in things that can be proven in a small-scale, one-person fashion. I do not claim any such power or effect; if there is an effect, I only expect a small but statistically significant difference between the active and control groups.

Thanks!

P.S. Yes, I've read the rules and FAQ.

Flange Desire
23rd August 2006, 12:15 AM
snip
I do not claim any such power or effect; if there is an effect, I only expect a small but statistically significant difference between the active and control groups.

This is just a little unclear to me.

Surely, by definition, an 'effect' is a 'statistically significant difference between the active and control groups'.

What is a little unclear is just how 'small' the effect can be, whilst still being considered 'statistically significant'.

saizai
23rd August 2006, 12:58 AM
FD - It's intended in contrast to the more standard "I can heal anyone I want whenever I want!" sort of claim.

As for size of effect: plot scores from both groups on bell curves. If they're different (p<.05) then it's positive. Very small effects would obviously be harder to find with that amount of certainty (vs if it's a dramatic difference), but again that's my problem and bounded mainly by how many participants I have and need not affect the criteria.

William Smith
23rd August 2006, 02:38 AM
Saizai, what will happen once you have finished this "study"?
An application seems unlikely because you have already pointed out that the JREF is "only interested in things that can be proven in a small-scale, one-person fashion". With which of "Randi's representatives" did you communicate and when?

My point being: How does your "study" relate to the Challenge since an application from your part seems very unlikely?

William Smith
23rd August 2006, 03:12 AM
Saizai, from your "Methodology" page: "As the Participants are treated exactly alike by completely automated processes, and there is no direct contact between them and the Experiementers, there is no potential for a violation of blind through that means either."

Since you logically have an interest in a positive result ("a statistically significant difference in the average Scores, in the second and/or third round, between the active and control groups of Recipients") - you do, do you? - you could "disautomate" the "processes" any time you want, right?. I do not imply this being your motive.
The assignment process before each "round" needs to be done by someone other than you. You can't know any data about who is selected to which group at any time during the "study".

Pup
23rd August 2006, 06:16 AM
All Participants will, on a monthly basis, update us on their status. Healers will tell us about their experience, their continuing willigness to participate in the study, etc. Recipients will tell us about their mental and physical health, any unsual events they believe to be related to their participation, etc. This will be submitted online, but a duplicate also signed and mailed to us to keep everything above board.

At the end of the first round, to last at least one year, we will analyze the data we have. Through this, we will try to determine what difference (if any) there is between the two sets of Recipients. On that basis, we will publically choose a numeric equation directly and unambiguously derivable from the data collected, and that equation will serve as the "score" for the second round.

As a simple example, the "score" could consist solely of:

S = Recipient's average reported pain [scale of 1-10]

What are you saying above? It sounds like the recipients will give narrative accounts of their condition and you (someone?) will reduce those accounts to a numerical score. Surely you mean that the recipients themselves will give numerical grades to various things like pain, fatigue, etc. which will be combined into an overall score.

Even if the recipients themselves give the numbers along with anecdotes, it will be tempting to report that the numbers weren't significantly different, but every person who was prayed for told about having a better attitude, or feeling more loved, or having their medical bills miraculous paid for, or some other thing produced by data mining that wasn't anticipated in the numerical scores.

It seems that having the recipients report anything other than the numbers is unnecessary. Though you could collect all the positive anecdotes only from those being prayed for and fund the whole experiment by selling them in a book showing what the power of prayer can do. :)

saizai
23rd August 2006, 08:10 AM
GK - You are correct that, were I to use the data I have access to and treat participants differently based on their status, that would be a flaw. However, I do not intend to do any such thing. Any questions would be addressed in a general FAQ rather than personal communication with me - or answered by someone who does *not* know the participant's status, but is a delegate of me for other purposes.

I expect that this should be sufficient for a first/second round version. If this were a MDC, then I also expect that it would need to be hosted through a neutral third party or some other mechanism (such as you suggest) for the third round (equivalent to the "final test"; the second round would be the "preliminary test"; the first round is merely investigative) to completely ensure an inability on my part to try to cheat.

I would like it to be a MDC; my understanding is simply that they won't accept this type of challenge though. This is from a discussion by email with Kramer, about a year or two ago.

As for what after - well, that'd depend on the results of the study eh? ;)

What are you saying above? It sounds like the recipients will give narrative accounts of their condition and you (someone?) will reduce those accounts to a numerical score. Surely you mean that the recipients themselves will give numerical grades to various things like pain, fatigue, etc. which will be combined into an overall score.

Of course. All the things you listed would be given in numerical terms using the usual scales for such things (e.g. pain: 1-10; total $ medical bills; total $ paid by you; etc). I intend to use narrative accounts only for annotative purposes, not for analysis proper.

Though you could collect all the positive anecdotes only from those being prayed for and fund the whole experiment by selling them in a book showing what the power of prayer can do. :)

That might also be an interesting use. ;)

William Smith
23rd August 2006, 08:31 AM
...
I would like it to be a MDC; my understanding is simply that they won't accept this type of challenge though. This is from a discussion by email witKramer, about a year or two ago.
...


Have you asked the current Challenge Facilitator, Mr. Jeff Wagg challenge@randi.org about it?

Perhaps a shorter "study" (one round) with a more rigid protocol will meet JREF criteria. Since you seem to value the JREF and its mission, you could make the necessary adjustments to have your application accepted.

Religious claims and tests for deities do not qualify for the Challenge. Your "study" suggests the existence of a deity (or deities) since it involves praying, right?
Perhaps you can call it differently. Instead of "praying" you say e.g. "talking". Could you eliminate the alleged involvement of an alleged being and still conduct the "study"?

saizai
23rd August 2006, 09:15 AM
GK - I make no claim whatsoever about the mode of effect, nor that I am testing the existence of any deity. I am not interested in even trying to test that at this point. Just because it is potentially religious in connotation, does not make it a religious claim.

I specifically say that "prayer" is intended to be a generic word of convenience, and that it may take on multiple forms for different people, including ones that are not judeochristian or theological (e.g. buddhist meditation). So no, I won't change that to "talking" (which assumes a very specific view of prayer on your part btw - that of "talking" to the Judeo-Christian God).

A shorter study would not be possible to still be sufficiently rigorous for my taste. The first round vs second/third round split is absolutely necessary for the design as well.

I have already emailed Jeff about this; he has yet to respond to my question about the acceptability of this general type of application. (I.e., one involving multiple people over a long period of time, rather than just me over a short period.)

saizai
23rd August 2006, 09:45 AM
P.S. As for why I post here: Even if I'm not allowed to do the MDC, if there's a flaw in my methodology (though I don't think there is), I'd like to know. Hopefully a bunch of skeptics should be able to find any.

jmercer
23rd August 2006, 10:45 AM
Cool! I always wondered about that.

A murder of crows.
A pride of lions.
A herd of sheep.

And now... finally...

A bunch of skeptics.

My mind is totally at ease due to this, although I'm not quite sure what we have in common with bananas... :)

Startz
23rd August 2006, 11:00 AM
The real problem with such tests is the possibility of "cheating" - unintentional or intentional. I'll leave that to those with some expertise in the area skeptics/magicians/Amazing, and just comment on the statistics.

First, since you say a number will be assigned to each experimental subject, it would be good to announce in advance what statistical test you're going to use. There are lots of standard tests for the difference between two means. Sometimes they give different results.

Second, while a significance test at the five percent level is certainly standard in science, the informal rule for the Challenge preliminary is 1/1000. Perhaps that would be better?

Third, I don't understand why three rounds are necessary nor the reference to selection bias or data mining. So long as assignment and coding is truly random and all submitted data is used in the same way, one round ought to be sufficient.

saizai
23rd August 2006, 11:22 AM
Startz - Can you suggest how anyone involved could cheat? What would be the (mundane) mechanism thereof?

re #1: I will need to consult statistician friends of mine for that, but since we have until the end of the first round (>1 yr away), I don't think that's a problem. We can at least agree on the basic idea.

re #2: I think 5% should be adequate for the preliminary test. A stricter standard would be acceptable for the final test.

re #3: One round is not sufficient, because the method of Score calculation is done after the first round is complete (to best optimize the effects - I don't claim in advance what exactly will be affected). The second and third rounds are identical; both are run after the Score calculation is made final, so are not affected by the possiblity of selection bias.

If you're not familiar with the term, Wikipedia should have a good article on the subject.

Yoink
23rd August 2006, 11:48 AM
I'm no expert in this stuff, but this looks pretty well designed to me. The only thing I'd say is to second the point that you must specify all the statistical tests you intend to do on the data and exactly what forms the data collection will take before you run the experiment. One can always mine any data set for something out of the ordinary ("we found that if we looked only at those who reported liver problems, a whopping 80% showed marked improvement as compared to the control group!").

The only other thing I will say is that when you get a negative result (yes, I'll take that bet) proponents of healing-through-prayer will simply say "but how can prayers to someone identified only via first name and a number on a website possible get through?" Prayer is one of those things that is probably impossible to test in a rigorous double-blind experiment without doing things that no ethics review board would ever allow (lying to patients about whether or not they were being prayed for, and lying to "pray teams" about whether or not the people they're praying for are really sick etc.

Startz
23rd August 2006, 11:56 AM
Startz - Can you suggest how anyone involved could cheat? What would be the (mundane) mechanism thereof?

re #1: I will need to consult statistician friends of mine for that, but since we have until the end of the first round (>1 yr away), I don't think that's a problem. We can at least agree on the basic idea.

re #2: I think 5% should be adequate for the preliminary test. A stricter standard would be acceptable for the final test.

re #3: One round is not sufficient, because the method of Score calculation is done after the first round is complete (to best optimize the effects - I don't claim in advance what exactly will be affected). The second and third rounds are identical; both are run after the Score calculation is made final, so are not affected by the possiblity of selection bias.

If you're not familiar with the term, Wikipedia should have a good article on the subject.


re 1: I haven't the vaguest idea of how one would cheat. And I certainly didn't mean to imply that you would. I only meant that it's an area where *I* don't know a huge amount.

re 2: I guess that so long as you're not applying for the Challenge, you're not bound by its guidelines. But the JREF standard does seem to be 1/1000.

Here's one reason that's connected to "cheating" (there should be a less perjorative term.) With a five percent standard, someone could set up 14 different web sites, run the tests independently, and expect to find a significant result. Or 14 different people could do the same thing independently. Many people would like to see a a tougher standard.

re 3: You're right. Wikipedia has quite a nice entry. It begins "Selection bias, sometimes referred to as the selection effect, is the error of distorting a statistical analysis due to the methodology of how the samples are collected." So long as your data is collected randomly, there isn't any selection bias by the classic definition.

It sounds like you're trying to guard against bias from choosing the method of analysis. That's probably a good thing to do. But it can be done more simply by letting you see the data and choose a method of analysis *before* you find out which observation is in which group.

If you want to choose a method that finds the largest possible effect, then your suggestion of doing it in the first round and then applying the same method in a subsequent round is sensible. (I'm still not sure why a third round is needed.)

saizai
23rd August 2006, 12:47 PM
Yoink - re p1, that's the reason for the first round. Any such subpopulation choices or measure choices would be made then, before the second/third rounds.

I agree with p2; one can't prove the negative. I try to make it as open as possible though - my ideal would be first name, recent headshot, basic description of the illness, and state/country of residence. I think that should be sufficient for most faiths while still being unidentifiable. (One can also have participants swear that they haven't had any contact with other participants during the course of the study.)

Startz - Agreed re tougher standards; I would leave that to the third round though. Getting p<.001 would probably require a very large sample size - one that could easily be acquired with publicity from a success in the p<.05 second round.

Another possible way to do the analysis is to simply say that any significant effect on any measure tracked (e.g. pain alone, $ spent alone, etc) would count, but that no subpopulation sampling would be allowed. This prevents you from doing what Yoink suggested. After all, *any* significant difference on *any* (numeric) response value would be necessarily paranormal.

Especially if you require a low p-value (enough so that you don't get a false positive simply by virtue of a large number of measures), I think this would be fair.

digithead
23rd August 2006, 01:49 PM
From your faq:

"It is possible - probable even - that recipients will be prayed for by people other than their assigned healers. Couldn't this affect the results?

No. This variable - and all other otherwise uncontrolled variables - is controlled by the process of randomization; there is no reason why it would be different between the control and test groups. It will, however, be tracked - by polling the recipients as to whether they, or people they know of, will be praying for their betterment. It is expected that this measure will be statistically equal between groups, as with other statistics such as average age, gender distribution, etc."

You're wrong, randomization will not always take care of this because you will not know if someone prayed or not until you do the test so you are going to have to test for this rather than dismissing it outright.

What about their own prayers?

What about their current medical treatment and its effectiveness? Certain cancers have better treatments than others.

What about severity of illness? Different types of cancer matter in outcome. Someone with localized skin cancer will always respond better than those in stage 4 pancreatic cancer regardless of whether someone prays for them or not.

Are there any other confounding factors you're missing such as culture, geography, and family history? Any possible interactions?

You really need to sit with a Ph.D. biostatistician to design a better study because it seems like you're going to have to do a much more complicated randomization procedure and statistical analysis (e.g. mixed effects GLM) than you originally intended if you are going to be able to adjust for all the possible covariates, interactions, and covariance structures...

Startz
23rd August 2006, 02:18 PM
From your faq:

"It is possible - probable even - that recipients will be prayed for by people other than their assigned healers. Couldn't this affect the results?

No. This variable - and all other otherwise uncontrolled variables - is controlled by the process of randomization; there is no reason why it would be different between the control and test groups. It will, however, be tracked - by polling the recipients as to whether they, or people they know of, will be praying for their betterment. It is expected that this measure will be statistically equal between groups, as with other statistics such as average age, gender distribution, etc."

You're wrong, randomization will not always take care of this because you will not know if someone prayed or not until you do the test so you are going to have to test for this rather than dismissing it outright.

What about their own prayers?

What about their current medical treatment and its effectiveness? Certain cancers have better treatments than others.

What about severity of illness? Different types of cancer matter in outcome. Someone with localized skin cancer will always respond better than those in stage 4 pancreatic cancer regardless of whether someone prays for them or not.

Are there any other confounding factors you're missing such as culture, geography, and family history? Any possible interactions?

You really need to sit with a Ph.D. biostatistician to design a better study because it seems like you're going to have to do a much more complicated randomization procedure and statistical analysis (e.g. mixed effects GLM) than you originally intended if you are going to be able to adjust for all the possible covariates, interactions, and covariance structures...

I think that so long as prayer is assigned randomly, one can expect the covariates to be distributed randomly across the two groups. In this case, the covariates can be safely ignored.

saizai
23rd August 2006, 02:50 PM
Digithead - What Startz said.

Or to quote myself in the OP:
"If you have a critique, please make sure that:
* you can explain why your perceived flaw in my design would cause a false positive result, i.e. a statistically significant difference between the active and control groups of Recipients in the second and/or third round"

Can you give that explanation?

Dumb All Over
23rd August 2006, 02:55 PM
Hi Sai,

Will you require recipients to forgo all other types of treatments (including medical) during the test period?

Through what method will recipients report? Interview? Questionaire?

saizai
23rd August 2006, 03:11 PM
DAO - Absolutely not. If you read the Recipients page, at the bottom it explicitly says that we recommend they continue their normal treatment.

Recipients report purely through questionaire, most of which will consist of t/f, numeric, or quantiative-scale measures. There will be opportunity for them to give anecdotal text responses as well, but those are not used for the final analysis.

Dumb All Over
23rd August 2006, 03:15 PM
Thanks Sai,

I'm confused (as usual). How would you be able to discern the difference between prayer and medical treatment as a cause for those who report improvement?

saizai
23rd August 2006, 03:17 PM
Update: Exerpted quote from email w/ Jeff:

Yes, if you could work out a double-blinded protocol, it could work.
Jeff Wagg

On 8/23/06, Sai wrote:
[...]
However, I would like you to answer your implied assertion directly:
if we can work out a double-blind protocol, would this type of
challenge be acceptable?

So it looks like we're on, at least in theory.

saizai
23rd August 2006, 03:20 PM
DAO: Simple: the difference between active and control groups of recipients.

Active group gets prayed for by our people, control doesn't. They are otherwise purely random assignment, and they don't know what their status is.

So if prayer works (or more specifically, if it makes a difference whether we assign people to pray for them or not), then there should be a difference between those two groups in terms of their outcomes (e.g. pain, $ spent, whatever).

Dumb All Over
23rd August 2006, 03:35 PM
Thanks again, Sai,
I must say this is one of the more interesting proposals I've seen around here in a long time.
Theoretically, after the first year, isn't it possible that a prayed-for recipient's condition could actually be worse than it was at the beginning of the experiment, yet they report that they feel better than ever? Wouldn't an actual physical examination by a medical team be far more informative than the perceptions of the recipient?

saizai
23rd August 2006, 03:38 PM
DAO - That is the purpose of having their physician fill out a short survey about the patient once per quarter. ;)

saizai
23rd August 2006, 03:45 PM
P.S. If they report that they feel better, that still counts as an effect. Pain relief, y'know.

Dumb All Over
23rd August 2006, 04:13 PM
Sai,

So far, you seem genuinely forthright. I detect no B.S. It does not sound like you are trying to snag Randi's million through fraud. I'm a skeptic to my core. I fully anticipate that if this project is nurtured to full fruition, it will clearly show that prayer has absolutely no effect. Nevertheless, your proposal is interesting.

The other posters in this thread will offer all kinds of advice on your protocol. Please give all advice due consideration. I'm sure it will be offered in the spirit of conducting a proper, double-blind test. If you fail to heed their suggestions and demonstrate you are unwilling to close loopholes within the protocol, my B.S. meter will begin to rise and I'll urge that further negotiations cease. But, so far, so good. Good luck.

saizai
23rd August 2006, 04:31 PM
DAO - Thanks.

I am indeed quite earnest. And, evidently unlike some portion of applicants, I'm sentient too. ;) It may surprise you to know that I'm also a skeptic.

FWIW you may consider me completely agnostic as to whether the test will be successful or not; I simply want to conduct it. While I'm at it I may as well get JREF MDC involved, so nobody can say that my methodology was flawed or I was faking it or anything (as I have no intention or doing so). Either outcome is quite acceptable to me; my religious/spirtiual beliefs are in no way harmed either way.

I encourage anyone who can point out a genuine flaw in the protocol to point it out, but I do need to require that they be able to explain how their perceived flaw would cause a false positive. I contend that the methodology is flawless (i.e. that it will not cause a false positive; I am willing to accept false negatives); prove me wrong. :)

digithead
23rd August 2006, 05:18 PM
Digithead - What Startz said.

Or to quote myself in the OP:
"If you have a critique, please make sure that:
* you can explain why your perceived flaw in my design would cause a false positive result, i.e. a statistically significant difference between the active and control groups of Recipients in the second and/or third round"

Can you give that explanation?


You can't just assume that your randomization scheme worked, you have to demonstrate that it worked. You also need to check this after you assign them to their respective groups but before you perform the treatment.You'd be suprised how often in clinical trials the randomization schemes didn't work out as intended...

And you can get a false positive by loading up one of the groups with people who had a less severe form of cancer who thereby benefitted most from their medical treatment rather than prayer but it looks like it was prayer that did it. Hence why you have to risk-adjust in your statistical tests...

digithead
23rd August 2006, 05:20 PM
I think that so long as prayer is assigned randomly, one can expect the covariates to be distributed randomly across the two groups. In this case, the covariates can be safely ignored.

Not if they are confounders or endogenous to the matter at hand...

You always need to verify your assumptions...

saizai
23rd August 2006, 05:25 PM
Digithead - Please explain how the control and active groups would differ, based on what you have said.

And you can get a false positive by loading up one of the groups with people who had a less severe form of cancer who thereby benefitted most from their medical treatment rather than prayer but it looks like it was prayer that did it. Hence why you have to risk-adjust in your statistical tests...

This is obviously spurious, as there is no way to "load up" the groups in such a manner when the assignment is done programatically at random.

William Smith
23rd August 2006, 05:32 PM
...
So it looks like we're on, at least in theory.

Saizai, I'm looking forward to your application and your revised protocol proposal. We will assist you as good as possible.

saizai
23rd August 2006, 05:48 PM
GK - My protocol proposal is as you see it here; no revisions needed yet. (Except perhaps matters of implementation such as hosting the db with group assignments in a secure neutral location - but those don't really have to do with the protocol per se.)

If you have any suggestions for what should be revised, please tell me.

P.S. "As good as possible"? :P

Yoink
23rd August 2006, 06:02 PM
Digithead, wouldn't the problems you describe only emerge if the sample size was too small?

Also, couldn't you protect against such "outlier" problems by, say, discarding some certain percentage (or fixed number) of "top" and "bottom" scorers from each group?

That would also be a way of protecting against the risk that some "prayers" may find a way to contact their "prayees"--or vice versa. (P.S., saizai, do you have a way around this? It's not hard to imagine that someone madly praying for some sick person would be curious to know how that person is doing, and both sick person and praying person know what website they visited to enroll in the study; it wouldn't be too hard to imagine Googling around on the study to see if anybody is blogging about participating etc etc.

William Smith
23rd August 2006, 06:11 PM
GK - My protocol proposal is as you see it here; no revisions needed yet. (Except perhaps matters of implementation such as hosting the db with group assignments in a secure neutral location - but those don't really have to do with the protocol per se.)

If you have any suggestions for what should be revised, please tell me.

P.S. "As good as possible"? :P

I do consider proper double blinding a vital part of the protocol.

I volunteer to serve as a third party. However, this may pose logistical problems since I live in Germany. If you (we) can work out a protocol which allows for electronic data transfer, I'm in the game.

For specific suggestions to the protocol - let me get some sleep first. I'm wiped, I start to see yellow flags where there were none before. "Nightnight, Bingaling."

saizai
23rd August 2006, 06:25 PM
Also BTW, to address what Jeff mischaracterized as well:

I am very explicitly NOT claiming or testing (and have no desire to debate):
* the existence of any deity
* what the mechanism of prayer is, if it works
* any difference between different styles of prayer
* whether any particular Recipient has been prayed for or not, or how much
* the ability to heal any particular Recipient
* any personal ability of mine (I won't be participating except as administrator)
* the effectiveness of any particular treatment other than the prayer gotten through being in the active group

I reserve the right to pick subgroups at my descretion BEFORE the onset of the second/third round, so long as the criteria are made completely explicit. As a hypothetical example, if people who self-identify as Born-Again Christians do amazingly well as Healers in the first round, I may choose to require that Healers in the 2nd and 3rd rounds be BACs as well.

This goes together with the score method choice, and in any case both preceeds the test and could not possibly (by mundane explanation) cause a false positive.

saizai
23rd August 2006, 06:30 PM
Yoink - I think it would be relatively hard to google someone by first name alone.

However, I would be asking both parties to sign a certification stating that they will not, and at the end that they have not, contact any other participant before the end of the round.

GK - I agree wrt double blinding. Would you agree that, if the database containing assignments is both filled and accessed programmatically alone until the end of the experiment (kept encrypted and locked otherwise), and you have source code read access that shows there is no "phone home" feature, that would be sufficient to ensure a complete double blind?

By the start of the second round I would be completely prepared to hand over the website to a trusted third party, to run purely on its own, and require read/write access only to public pages (e.g. to post general news).

Yoink
23rd August 2006, 06:37 PM
Yoink - I think it would be relatively hard to google someone by first name alone.

True, but that wasn't my point. I meant that someone could google "prayer efficacy study" or "prayermatch.org" or "praying from photo alone" etc. etc. and find someone who has left an online record of being involved in the study. You could go onto any chat-type site and say "anyone know anyone involved in a prayer efficacy study"? If you floated this around support-group websites for people with inoperable cancer (the kinds of people who will be drawn to your study, I imagine) you may not have to work as hard as you're thinking to start unblinding the experiment.

saizai
23rd August 2006, 06:47 PM
Yoink - That would take a considerable amount of effort. Given that they are swearing not to do so, swearing that they have NOT done so, and that we will be putting them in contact with each other afterwards if they so desire, I don't see that this is a major flaw. We can eliminate any specific participants who upon investigation (or self-admission) have attempted or succeded in breaching the double blind. I see no reason to apply an a priori culling of the results however, and would not agree to such a requirement.

Plasmadog
23rd August 2006, 07:32 PM
I have some concerns about the idea of trying to maximize the difference in the first round. Is this a common practice in statistical studies? I'm no statistician, but it seems wrong to me. It assumes that an effect will be seen in the first round, and presumably that the effect will be positive. But what if there is no effect? You would merely be amplifying noise instead of signal. Using the same formula, the second round might indicate a huge positive effect, and the third might indicate a very negative effect, simply because your formula is amplifying random variations. How would you interpret that result?

Startz
23rd August 2006, 07:47 PM
I have some concerns about the idea of trying to maximize the difference in the first round. Is this a common practice in statistical studies? I'm no statistician, but it seems wrong to me. It assumes that an effect will be seen in the first round, and presumably that the effect will be positive. But what if there is no effect? You would merely be amplifying noise instead of signal. Using the same formula, the second round might indicate a huge positive effect, and the third might indicate a very negative effect, simply because your formula is amplifying random variations. How would you interpret that result?

It's not common practice. At least not to be this explicit.
But it's not unfair in this application. Suppose you think that prayer might have an effect, but maybe you suspect that it's not a general effect. It might work if you pray to Zeus. Or maybe to Odin. So you do a first round and find evidence that prayer to Zeus works and that prayer to Odin doesn't.

Now in a complete separate and complete random second round you only check for the efficacy of prayers to Zeus. Folks who believe that neither Zeus nor Odin intervene on request shouldn't object to this procedure.

digithead
23rd August 2006, 09:02 PM
This is obviously spurious, as there is no way to "load up" the groups in such a manner when the assignment is done programatically at random.

It is not spurious...

First, you have to determine what is the scientifically important difference (which is different from statistically significant difference), which your FAQ does not specify.

Then you have to figure out what confounders you might encounter, I've listed some for you but you somehow think simple randomization will miraculously make them disappear which it won't.

Then you have to "guesstimate" a population variance because you don't have pilot data in which to help you. Assume a larger one as this will prevent false positives, if there really is a difference, it will show up.

With this variance and your scientifically important difference, you then estimate the necessary sample size for whichever statistical test you will use based on a certain power and significance level (the FDA uses 80% and 5% respectively).

Then based on that sample size estimate, you have to recruit enough people to your study.

Then you have to determine the proportion of each confounder in the total sample and based on that proportion, you have to do a stratified random sample based on these confounders.

Then you have to verify that your randomization "worked" and each group contains the same proportion of disease, demographics, etc. so that you aren't "spuriously" loading one group in favor of your hypothesis...

Is that clear enough for you?

Startz
23rd August 2006, 09:19 PM
It is not spurious...

First, you have to determine what is the scientifically important difference (which is different from statistically significant difference), which your FAQ does not specify.

Then you have to figure out what confounders you might encounter, I've listed some for you but you somehow think simple randomization will miraculously make them disappear which it won't.

Then you have to "guesstimate" a population variance because you don't have pilot data in which to help you. Assume a larger one as this will prevent false positives, if there really is a difference, it will show up.

With this variance and your scientifically important difference, you then estimate the necessary sample size for whichever statistical test you will use based on a certain power and significance level (the FDA uses 80% and 5% respectively).

Then based on that sample size estimate, you have to recruit enough people to your study.

Then you have to determine the proportion of each confounder in the total sample and based on that proportion, you have to do a stratified random sample based on these confounders.

Then you have to verify that your randomization "worked" and each group contains the same proportion of disease, demographics, etc. so that you aren't "spuriously" loading one group in favor of your hypothesis...

Is that clear enough for you?

Digithead:

The OP has a perfectly well specified null hypothesis (in the second round). He doesn't need to worry about power. All he needs to do is test the difference in sample means. If he can reject the null at a specified size, then he has significant evidence that prayer works.

Of course if prayer has a very small effect, he's going to need a large sample to detect it. So he might be well advised to use the information from the first round along the lines you suggest to decide on a sample size for later rounds.

I would worry much more about how to verify that sampling is truly random than about the difficulties of a statistical test.

Plasmadog
23rd August 2006, 09:23 PM
It's not common practice. At least not to be this explicit.
But it's not unfair in this application. Suppose you think that prayer might have an effect, but maybe you suspect that it's not a general effect. It might work if you pray to Zeus. Or maybe to Odin. So you do a first round and find evidence that prayer to Zeus works and that prayer to Odin doesn't.

Now in a complete separate and complete random second round you only check for the efficacy of prayers to Zeus. Folks who believe that neither Zeus nor Odin intervene on request shouldn't object to this procedure.

That doesn't really address my concern though. What happens if there is actually zero effect, and any variance between the two groups in the first round is therefore just random noise? You'll have no way of knowing that it's just noise, and you will amplify it it as if it is evidence of a positive effect. If it is just random noise though, then won't that amplification itself be random, and therefore meaningless when applied to a second set of data?

digithead
23rd August 2006, 09:40 PM
Digithead:

The OP has a perfectly well specified null hypothesis (in the second round). He doesn't need to worry about power. All he needs to do is test the difference in sample means. If he can reject the null at a specified size, then he has significant evidence that prayer works.

Of course if prayer has a very small effect, he's going to need a large sample to detect it. So he might be well advised to use the information from the first round along the lines you suggest to decide on a sample size for later rounds.

I would worry much more about how to verify that sampling is truly random than about the difficulties of a statistical test.

His hypothesis is that they're different, he doesn't say "how different". So he absolutely needs to tell us what the scientifically important difference is because statistically significant difference is a function of sample size. So lets say the scientifically important difference is 5% and he observes a 1% difference. If he has enough sample size, this will be statistically signficant rather than practically significant and he'll declare success.

Also, specifying the scientifically important difference, variance estimate, and sample size required for his test will allow others to determine if his results are reliable or not and also make is less likely for him to claim that he didn't achieve sufficient power to find the true difference.

It will also force him into thinking about what really constitutes a successful outcome from prayer rather than relying simply on statistical formulas.

And your last statement is what I've been trying to get him to address, that you need to verify that your randomization is truly random and you've tried to account for all of the possible confounders that you think might skew your results.

I'm also assuming that he will be doing a one-sided test as it's only important that prayer improves the treatment over control. What happens if the prayer group does worse? Will he still claim they're statistical significant?

Startz
23rd August 2006, 09:49 PM
His hypothesis is that they're different, he doesn't say "how different". So he absolutely needs to tell us what the scientifically important difference is because statistically significant difference is a function of sample size. So lets say the scientifically important difference is 5% and he observes a 1% difference. If he has enough sample size, this will be statistically signficant rather than practically significant and he'll declare success.

>some stuff snipped
I completely agree that "scientifically important" is what matters. But I think that in this case 1e-308 is "scientifically important." [Of course, detecting an effect of the size would take a *real* large sample. :)]

Startz
23rd August 2006, 09:50 PM
That doesn't really address my concern though. What happens if there is actually zero effect, and any variance between the two groups in the first round is therefore just random noise? You'll have no way of knowing that it's just noise, and you will amplify it it as if it is evidence of a positive effect. If it is just random noise though, then won't that amplification itself be random, and therefore meaningless when applied to a second set of data?

You're quite right. The effect will be nothing significant will be found in the second set of data. That's why it's harmless to allow it.

saizai
23rd August 2006, 11:05 PM
Plasmadog - Amplifying noise to perceived signal only works one time. If it is indeed noise, then the next time it will be just noise and any pattern you thought was there ought to evaporate. You agree with this in your second post about it.

Startz - I agree with pretty much everything you've said.

I admire your optimism about the frequency of "data dredging" in the wider scientific literature. ;)

I reserve the right to add additional selection criteria for eligible Healers *and* Recipients before the commencement of the second and third rounds. If I find in the first round that, e.g., God hates atheists when it comes to whether they get better when prayed for, then I won't allow atheists as Recipients. :p Or if as you suggest, Odin is deaf as well as blind, then no Odin-worshippers as Healers.

As you say, all the data mining / crossreference techniques that digithead has suggested would be useful for me to determine the optimum measure for the second and third rounds. And I fully agree that the effect size I can detect will be inversely proportional to the sample size.

All that says is that I will not be able to detect effects less than some particular size to a sufficient significance. That is acceptable to me, and inherent in the design.

Digithead - What I objected to about your statement was "loading". That is an active, volitional verb that implies that I would somehow be skewing the population of the active vs control groups.

The assignment would be done by a simple mechanism, e.g. the result of (rand(time()) % 2).

saizai
23rd August 2006, 11:14 PM
Also:
What happens if the prayer group does worse? Will he still claim they're statistical significant?

If they do worse by a sufficiently statistically significant amount: Yes. Though that would have rather different social consequences. ;)

Also, I make absolutely no a priori claim as to the magnitude of the effect. An effect of any magnitude, to within the significance desired (p<.001) would count as a positive result. Naturally, detecting a minute difference would require an impracticable number of participants, but that is the only constraint.

Skeptic Ginger
23rd August 2006, 11:35 PM
I think that so long as prayer is assigned randomly, one can expect the covariates to be distributed randomly across the two groups. In this case, the covariates can be safely ignored.I agree with digithead. You have a poor control group here and the sample bias is going to negate the validity of your results.

In order to eliminate sample bias you have to have a more random group of cancer victims. In this study the subjects are selecting themselves. You don't have a random sample, period. "Loading" has nothing to do with it.

The people who select themselves are very likely going to include more believers than non-believers. Unless you have a study showing that isn't the case, then I'll stick with that assumption. Show me otherwise and I'll change my mind.

Believers are more likely to have other people praying for them. So your control group will also likely be prayed for. Again, show me some evidence that isn't true and I'll change my mind.

But, all that aside, other studies have failed to show any effect of prayer on illness especially cancer outcomes so I doubt you'll see any significant results anyway.

saizai
24th August 2006, 12:11 AM
skeptigirl: I fully cede and agree to the point of participants being a self-selected, non-random subset of the general population of cancer victims.

However, I need to ask you to explain how that could possibly create a false positive difference between the active and control groups, since both are drawn from the same (admittedly self-selected) pool and assigned randomly (from that pool).

I also entirely agree that the control group will likely be prayed for, and challenge you to the same question on this point as well.

Your doubt is not an argument. :)

saizai
24th August 2006, 01:10 AM
Draft JREF M$C protocol for PrayerMatch experiment

Protocol:
SETUP
1. Two pools of people will be recruited by Claimant to participate in the study. These people will be unrelated to Claimant and of either of two kinds: Recipients or Healers. Both are termed Partcipiants.

2. All Participants will have agreed in writing and electronic medium to the study design, affirmed their ability to give consent, agreed to provide signed copies of all their data submitted online, and sworn not to attempt to contact any other Participant until the conclusion of the round(s) in which they participate. All Participants will also fill out a brief survey. Recipients will sign a form as part of their initial response, a copy to be kept by their doctor, authorizing and requesting the doctor to notify us in the event of their death. **

3. Healers will be people of various faiths who have agreed to their duties as described here. Recipients will be people currently diagnosed with cancer as of the beginning of the round in which they participate. No Recipient may participate in more than one round. Healers may do so at their discretion.

4. Claimant reserves the right to add additional criteria for participation by Healers or Recipients or both, at Claimant's sole discretion, prior to the commencement of the second round. Such criteria may include, for example, inclusion or exclusion of particular faiths, ages, genders, or other tracked data. The criteria, once set, will be identical for rounds two and three. *

5. The study is divided into three rounds, each lasting approximately but no less than one year.

ROUNDS
6. Each round of the study will commence as soon as there are a sufficient number of Participants, N(R) and N(H) respectively. *

7. Upon commencement of a round, all participating Recipients will be randomly assigned by computer to either the Active or Control groups with a 50-50 chance of each. No Participant, nor any Experimenter who has potential contact with any Participant, will be allowed to have access to this data until after the Round is complete. This ensures that the experiment is a double blind. *

8. Once a month, each Healer will be randomly assigned a Recipient from the Active group to pray for. They will be reminded to do so once a week, and commit to doing so at least five minute per week every week for the duration participation. They may pray, and interpret the meaning of the word 'pray', in whatever manner their faith indicates is appropriate.

9. Healers will be told the first name, state, country, primary cancer location, and cancer type of their assigned Recipient and only their assigned Recipient. They will also be shown a digital photo of the Recipient, uploaded by the same, if available.

10. Each Recipient may have more than one Healer assigned to them at a time; indeed, that is the intent. However, each Healer will only be assigned one Recipient at at time. That is, there is a one-to-many relationship between Recipients and Healers. This will be, on average, N(H)/N(R).

11. Once per month, each Participant will provide an update in the form of a brief survey composed of multiple choice or numeric questions and one freeform text response. The text will be used for annotative and illustrative purposes only. **

12. Once per quarter, each Recipient will provide an update from their doctor, composed likewise. **

ANALYSIS AFTER FIRST ROUND

13. Extant data will be analyzed after the first round and used to create a Score Equation. The Score Equation will be an unambiguously determinable equation resulting from the data collected, and provided in the form of a computer function that, with access to any particular Recipient's records and without access to information regarding which group (active or control) they were assigned to, outputs a real number between 0 and 100 inclusive.

14. This Score Equation will be set at the Claimant's sole discretion in advisement with statisticians of Claimant's choosing. Once set, it will not be changed for either the second or third rounds.

COMPLETION OF STUDY

15. The second round will be used as the "preliminary test" for JREF purposes. The third round will be the "final test". No difference in the Score Equation, participation criteria, or significance test will be permitted between second and third rounds.

16. The second and third rounds will proceed identically to the first as stated above, except for the addition of predetermined Score Equation and participation criteria (if any).

POSITIVE AND NEGATIVE RESULTS

17. A positive result is obtained when the Scores, as determined by the Score Equation, differ significantly the Active and Control groups of Recipients.

18. "Differs significantly" is defined as any magnitude difference, whether positive or negative, with a statistical significance of p < .001. This shall not be changed between the second and third rounds.

19. A negative result is anything else.

20. Any Recipient who can be clearly demonstrated to have communicated with any other Participant, or who refuses to sign a statement saying that they have not, will have their data removed from the analysis for all purposes above.

WHAT CLAIMANT DOES NOT CLAIM AND WILL NOT ATTEMPT TO PROVE OR DISPROVE
* the existence of any deity
* what the mechanism of prayer is, if it works, or anything based on any particular theory of how prayer works
* whether any particular Recipient has been prayed for or not, or how much
* the ability to heal any particular Recipient
* any personal paranormal ability
* the effectiveness of any particular treatment other than the prayer gotten through being in the active group


* Items to be specified later:

1. N(R) and N(H): at conclusion of the first round, at Claimant's sole discretion
2. Additional criteria for participation: ditto
3. Specific computer security protocols: ditto, by mutual agreement between Claimant and a computer security professional of JREF's choosing. Security protocols for the first round will be at Claimant's sole discretion.
4. Score Equation: Ditto

** Data collected (may be revised at Claimant's sole discretion prior to second round):

General:
Name
Gender
Date of birth
Religion (eg Christian)
# days per month engaged in religious activity
# years practicing said religion
Ethnicity
Income
Country
State
Picture, self-submitted .jpg or convertible to .jpg
Belief in the efficacy of prayer, self-rated scale 1-10 (10 = complete faith)
# times per month praying for self
# people known to participant to be praying for participant
# times per month on average said people do so
# minutes on average participant is engaged in any one "prayer"
Preferred praying style, eg directed/undirected, alone/group, etc (choice from list)
Introversion-extroversion, self-rated scale 1-7 (1 = introvert 7 = extrovert)
# years practicing as professional faith healer, remote healer, reiki user, psychic, etc., if any

For Recipients monthly:
# days in medical treatment
Self-reported average pain (0-10, 0 = none)
Self-reported average quality of life (1-10, 10 = excellent)
# days taking cancer medication
$ cost of treatment
$ cost of treatment paid by recipient
Still alive? (t/f)

From doctor once:
Name
Institution name
License #
Phone #
Address
# years practicing medicine
# years practicing with cancer specifically

From doctor quarterly, patient's:
Current 95%-threshold life expectancy in months
Degree of metastasis, 1-10
Primary location of cancer
Type of cancer
% likelihood to recover from cancer into long-term stable situation, but cancer still present
% likelihood to go into full remission, i.e. cancer not detectable present

saizai
24th August 2006, 01:21 AM
One thing I forgot: add a clause for having Recipients sign a statement asking their doctor to notify us in the event of their death, and leaving a copy of it with their doctor.

Also, doctor's name, institution, license #, and contact info will be asked, and the participant's physical address as well.

saizai
24th August 2006, 01:32 AM
And one more:

20. Any Recipient who can be clearly demonstrated to have communicated with any other Participant, or who refuses to sign a statement saying that they have not, will have their data removed from the analysis for all purposes above.

Cuddles
24th August 2006, 07:25 AM
I agree with digithead and skeptigirl here. Randomly selecting people from a small sample does not give an even distribution. For example, say you have 50 people with mild cancer and 50 people with severe cancer. You are very unlikely to get 25 of each in each group if they are randomly selected. If there is no effect from prayer, or only a small effect, this could still give a strong positive result. In this example, if 30 people with only mild cancer were asigned to the prayer group then this group would have much better results.

You definately need to show that the control and test groups have similar demographics before analysing the results, and preferably before the trial starts.

Jeff Wagg
24th August 2006, 08:11 AM
He has been invited to apply. The protocol will need a lot of work, but I'll not be discussing that until after an application is received.

digithead
24th August 2006, 09:29 AM
Digithead - What I objected to about your statement was "loading". That is an active, volitional verb that implies that I would somehow be skewing the population of the active vs control groups.
The assignment would be done by a simple mechanism, e.g. the result of (rand(time()) % 2).

But that is exactly what you are doing by not accounting for the confounders of disease level, demographics, frailty, etc...

And it does not have to be volitional but mere unintentional if you are not aware that you are doing it. For instance, if I load fruit onto a plane for distribution and unknowingly, an insect happens to be in some of the fruit, I am introducing something that potentionally might ruin the fruit or harm the consumer...

However, If I am aware of these insects and inspect the fruit before loading (placing) them on the plane, I stand a better chance of preventing the harm from occurring...

But if I've been made aware of this problem and dismiss it as inconsequential (which it seems you are doing by all your rhetoric) then it's willful blindness...

As for your practical signficance=statistical significance=0.001. That's just silly, because then your results are solely based on sample size and not on a real scientific hypothesis placed within a theoretical construct. Irrespective of your hypothesis, If I were a reviewer and you were looking for funding to do this type study design, I'd place you on the denied pile because you're not even close to accounting for all of the sampling problems and confounders that will occur...

Seriously, seek the assistance of Ph.D. biostatistician with clinical trial experience, they can help you if you really want a decent study design...

Startz
24th August 2006, 10:01 AM
Saizai:

Let me suggest three modifications to your protocol, inspired by comments of others on the board (notably DigitHead).

1. Recipients will be assigned two random code numbers, A and B. Reciepients will be told code A and use it to report in. Healers will be told code B. In the files maintained in the database, pictures, names, etc. will be kept together with the code numbers in a form believed to be accessible by Higher Powers.

(This brings the experiment closer to double-blind, reducing the possibility of intentional or unintentional collusion.)

2. The scoring method used will be defined in a mechanical way. In other words, it will meet JREF's "no judgment" criteria.

3. Rather than have Round 1 be part of the Challenge, use it as a pre-test. Otherwise, JREF is agreeing to use of a test in the second and third rounds that you get to define unilaterally.

Would something like these be acceptable?

saizai
24th August 2006, 11:11 AM
digithead: I claim no a priori theory about the mechanism or effect of prayer. The first round is inteded to find out (or tune) the latter; I will not attempt to look at the former. We are not going to discuss theory, just as you would not discuss it with a dowser.

Your insects analogy is inaccurate.

Startz: 1. This is unnecessary; a more traditional login-and-password scheme is much simpler and just as effective. They will be also assigned a random code number for the purposes of their printouts, to make hand-tallying those easier. Possibly a similar mechanism to allow doctors to report deaths.

2. Of course. I elect as the form of my definition a chunk of code to be run on the server.

3. It is a pre-test; you can consider it part of the negotiation. I consider the terms that I reserve for myself to define unilaterally before commencement to be completely acceptable to a skeptic's needs for the protocol no matter what I define them to be.

saizai
24th August 2006, 11:28 AM
Startz: I misinterpreted your #1.

However, it is still quite unnecessary for a study run by an automated website capable of handling relational databases (as mine is). I consent to allowing up to three computer specialists of JREF's choosing familiar with Ruby on Rails to examine my code to ensure that its mechanism is correct. I do not grant any right to the code, and it is not a part of the "data" to which JREF has a right as specified in the rules.

Yoink
24th August 2006, 11:38 AM
But that is exactly what you are doing by not accounting for the confounders of disease level, demographics, frailty, etc...

And it does not have to be volitional but mere unintentional if you are not aware that you are doing it. For instance, if I load fruit onto a plane for distribution and unknowingly, an insect happens to be in some of the fruit, I am introducing something that potentionally might ruin the fruit or harm the consumer...

However, If I am aware of these insects and inspect the fruit before loading (placing) them on the plane, I stand a better chance of preventing the harm from occurring...

But if I've been made aware of this problem and dismiss it as inconsequential (which it seems you are doing by all your rhetoric) then it's willful blindness...

As for your practical signficance=statistical significance=0.001. That's just silly, because then your results are solely based on sample size and not on a real scientific hypothesis placed within a theoretical construct. Irrespective of your hypothesis, If I were a reviewer and you were looking for funding to do this type study design, I'd place you on the denied pile because you're not even close to accounting for all of the sampling problems and confounders that will occur...

Seriously, seek the assistance of Ph.D. biostatistician with clinical trial experience, they can help you if you really want a decent study design...

Digithead: I know that experimental design is incredibly tricky and that all sorts of apparently robust studies prove upon subsequent analysis to have hidden statistical flaws, so I'm not doubting that you may be putting your finger on a real problem here, but I'm struggling to imagine the sequence of events that will lead to a false positive if there is a reasonable large sample size.

Is the kind of scenario you are imagining something like this? Let us say that the sample is 100 people, and 10% are at Death's Door, 50% are Very Sick and 40% are Not So Sick: if this group is randomly divided it is pretty possible that we might get 8-2 division of the "Death's Door" patients into groups A and B, right (just under 5%, I think). Of course, that means there is a correspondingly greater likelihood of a few more "Very Sicks" in the B sample. I think we're getting into statistically insignificant probabilities to have one side have markedly fewer DD's AND VS's.

Is it your view that the higher number of DD's might so warp the experiment as to give a significant false positive?

So then is the solution to this a matter of "grading" the participants into cohorts and then modifying the randomly distributed groups to ensure rough parity within cohorts? How do you do this without either disrupting blindness or breaking randomness?

saizai
24th August 2006, 11:42 AM
20. Any Recipient who can be clearly demonstrated to have communicated with any other Participant, or who refuses to sign a statement saying that they have not, will have their data removed from the analysis for all purposes above. The final signoff rule will be excepted for Recipients who die in the course of the study, and who have previously signed their commitment not to contact other Participants.

WEBSITE

21. The entire study shall be run using an automated website programmed in Ruby on Rails. All assignments shall be done by this code, as shall the display of Healer's current assignment, the online collection of survey results, and the presentation of copies for the Participants to sign and mail in. The code to site may be examined by up to three computer science specialists of JREF's choosing to ensure that there are no errors, back doors, security flaws, etc. The databases containing Participant contact information, Healer-Recipient assignment pairs, and group assignment will be kept secure from reading by any parties other than the website code itself until the completion of the round(s) in which those Participants are involved. *

22. Claimant, and only Claimant, shall have continued read/write access to general pages of the site, e.g. the front page, FAQ, layout, CSS, etc.

saizai
24th August 2006, 11:46 AM
Also: A stratified random distribution would be acceptable to me, so long as I alone define the criteria for the subgroups. From each subgroup, assignment would proceed in the usual 50-50 random way.

And Yoink: Even if the sample size were small, it would still be unable to result in a false positive, because of the bound on significance for what constitutes a positive result.

saizai
24th August 2006, 11:51 AM
7. Upon commencement of a round, all participating Recipients will first be divided into subgroups based on specific criteria of the Claimant's sole discretion. Half of each subgroup wil be randomly assigned by computer to be in the Active group; the other half will be in the Control group. The last one, if there are an odd number of Recipients in a subgroup, will be assigned at random. No Participant, nor any Experimenter who has potential contact with any Participant, will be allowed to have access to this data until after the Round is complete. This ensures that the experiment is a double blind. *

Startz
24th August 2006, 12:25 PM
>snip

2. Of course. I elect as the form of my definition a chunk of code to be run on the server.


I don't think I understand whether JREF gets to see the code before signing the agreement.

3. It is a pre-test; you can consider it part of the negotiation. I consider the terms that I reserve for myself to define unilaterally before commencement to be completely acceptable to a skeptic's needs for the protocol no matter what I define them to be.

Does JREF get to see what statistical test is going to be used before they sign on?

digithead
24th August 2006, 12:38 PM
digithead: I claim no a priori theory about the mechanism or effect of prayer. The first round is inteded to find out (or tune) the latter; I will not attempt to look at the former. We are not going to discuss theory, just as you would not discuss it with a dowser.

I'm not asking you for a theory, I'm asking you what is the theoretical difference that matters just as I would as a dowser what accuracy would he or she consider "proof" that dowsing works...


Your insects analogy is inaccurate.

If you expect others to back up their claims, then you should extend the same courtesy. Declarative statements are meaningless without proof...

saizai
24th August 2006, 01:43 PM
Startz: I don't think I understand whether JREF gets to see the code before signing the agreement.

They get to see the code before the commencement of round two, and verify that it in no way breaches double blind, installs a back door, accesses a recipient's assignment(s) status, or accesses Healers' data. They do not get to reject a proposed code based on any other reason.

Does JREF get to see what statistical test is going to be used before they sign on?

Could you elaborate on what you mean by this, i.e. in what way the proposed protocol is underspecified?

Yes, they get to see all tests I decide on before the beginning of the second round.

digithead: I'm not asking you for a theory, I'm asking you what is the theoretical difference that matters just as I would as a dowser what accuracy would he or she consider "proof" that dowsing works...

To quote Startz: I completely agree that "scientifically important" is what matters. But I think that in this case 1e-308 is "scientifically important." [Of course, detecting an effect of the size would take a *real* large sample. ]

Your insects analogy is inaccurate because it makes this analogy:
there is a pool of subjects, S
a subset of the subjects are flawed ("insects"), Sf
the pool is divided randomly into two subsets, Sa and Sb. By random distribution, Sf should be equally distributed between Sa and Sb on a bell curve likelihood

Your analogy claims that the presence of Sf in S corrupts the entire pool S (by insects breeding and infecting other fruit). That has no analogue, and does not affect diff(f(Sa), f(Sb)).

digithead
24th August 2006, 02:14 PM
Startz:

Your insects analogy is inaccurate because it makes this analogy:
there is a pool of subjects, S
a subset of the subjects are flawed ("insects"), Sf
the pool is divided randomly into two subsets, Sa and Sb. By random distribution, Sf should be equally distributed between Sa and Sb on a bell curve likelihood

Your analogy claims that the presence of Sf in S corrupts the entire pool S (by insects breeding and infecting other fruit). That has no analogue, and does not affect diff(f(Sa), f(Sb)).

No, my analogy is not wrong. I was demonstrating that by being aware of possible confounders, one has to adjust for them or they are willfully ignoring the possible contamination...

And your methodology will corrupt the pool if there ends up being a higher proportion of Sf in one group than another. Your simple random sample methodology does not prevent this from happening...

And relying solely on the Gaussian distribution is risky sometimes. What if your outcome and independent variables are distributed binomial (or multinomial if there are different levels of confounding)? You have to stratify to make sure that these are accounted for. What if the underlying distribution is Cauchy or Gamma or Beta? Then you're going to need even more samples if you're going to rely on the central limit theorem...

Hence why most clinical trials now use random and mixed effect general linear (and occasionally nonlinear) models with non-Gaussian link functions because these allow you risk-adjust both for heterogeneity within a group and across groups while using the correct underlying distribution...

Again, see a Ph.D. biostatistician with clinical trial experience, they can help you design a better study. And you will get better suggestions than relying on criticism derived solely on skeptics forum...

jskowron
24th August 2006, 02:49 PM
Saizai says

"They get to see the code before the commencement of round two, and verify that it in no way breaches double blind, installs a back door, accesses a recipient's assignment(s) status, or accesses Healers' data. They do not get to reject a proposed code based on any other reason."

This seems to be a potential sticking point to me, at least as far as the MDC. If it were my million bucks, I would not agree to automatically accept any formula. There are certainly condition's other than those listed in the above quote that would make the formula invalid and/or unreliable. For example (admittedly extreme) the code could be as follows:

if condition=control group, then score= 0
if condition=prayer group, then score= 100

such a code does not breach double blind, install a back door, ivolve accessing a recipient's assignment(s) status, or involve accessing a Healers' data. However, such a code would lead to invalid findings of a significant difference between the control and prayer group. Based on your conditions set forth above, JREF would not be able to reject the code.

Also-

Stats Point 1- how many statistical comparisons do you plan to make per round (e.g. how many p values will you be calculating)? If there is more than one planned comparison, you should to adjust your alpha to account for the number of comparisons. For example, if you agree the p<.05 will be your significance level (alpha), and are doing three statistical comparison, each the significance level of each individual test would be p<.05/3, or p<.017. The more tests you do, the more likely you are to encounter type 1 errors (false positives). This procedure corrects for this.

Stats Point 2- Depending on what statistics you use, testing the hypothesis "the prayed for group will have a higher (or lower) mean score than than the control group" is different that testing the hypothesis "the prayed for group will have a different mean score than the control group." For statistical reasons, it may be advantageous to declare the expected direction of the difference.

Good luck. I can appreciate the process you're going through. There are many a Masters or Ph.D. candidate who think they have covered everything, only to find out, hopefully at the thesis/dissertation proposal meeting, rather than defence, that they missed something.

saizai
24th August 2006, 03:55 PM
jsk - I specifically said that the code would not have access to the person's group assignment status; your code does so. Please come up with another example of an unacceptable code.

Pt 1: There is only one comparison in the 2nd and 3rd rounds, that of the Score. The type 1 error potential is a necessary part of round 1.

Pt 2: Please elaborate. I specifically count a worsening in health among the active vs control group as a positive result, and will not cede that point. (Though I certainly don't want or expect it, it counts.)

Pt 3: Indeed.

Yoink
24th August 2006, 04:02 PM
Pt 2: Please elaborate. I specifically count a worsening in health among the active vs control group as a positive result, and will not cede that point. (Though I certainly don't want or expect it, it counts.)

One fairly obvious problem with accepting either "improvement" or "deterioration" as "positive" results is that it requires a larger sample size to protect against non-random distribution. If you are concerned only with, say, Group A doing better than Group B, then you have only one kind of error to worry about--that you don't get an unrepresentative number of super-ill people into Group B. But with your current set up, you also have to worry about the opposite (a non-random distribution of super-well people getting into Group B). As I see it, that would mean that you'll need something like double the sample size to meet the same confidence level. (A real statistician will no doubt correct me).

petre
24th August 2006, 06:03 PM
I should add that JREF is likely to suggest that you do indeed simply perform your first trial on your own, and after you obtain the data you require then submit an actual application with specific criteria for success.

William Smith
24th August 2006, 06:58 PM
(Saizai) has been invited to apply. The protocol will need a lot of work, but I'll not be discussing that until after an application is received.

This will indeed eat up lots of energy.

Saizai, I will grant your "study" its deserved time - after your application got accepted. I just see questions marks popping up all over the place due to experiences with wanna-be-claimants on this very forum. No offense. Really.

Gr8wight
24th August 2006, 08:39 PM
Draft JREF M$C protocol for PrayerMatch experiment

Protocol:
SETUP
1. Two pools of people will be recruited by Claimant to participate in the study. These people will be unr...

saizai,

Do you believe that prayer is efficacious in the healing of disease?

saizai
24th August 2006, 10:55 PM
petre - That's what the first round is.

GK - What sort of questions that relate to your experiences with other "wanna-be-claimants"? I feel I've been quite forthright, rational, comprehensible, and sane...

Gr8wight - As I said on the About page (did you read it?), I am agnostic on the subject, as I have not seen sufficiently convincing evidence either way. I am conducting this study purely out of a desire to investigate it. My religious/spiritual beliefs in no way require that prayer work, or that it work in this manner.

Cuddles
25th August 2006, 04:07 AM
I think a very important question is how many people do you plan to have? If there is a small sample size (only 10-20 patients) then Digithead is entirely right that the random selection will not give an equal distribution and your results will be meaningless unless you account for this. If you have a very large sample size (1000 people) then your method should be OK. It seems very unlikely that you will get this many people for a preliminary trial. Unless you tell us how may people you will actually be using we have to assume the worst.

jskowron
25th August 2006, 06:14 AM
Saizai-

Sorry- I misunderstood your condiditions when giving my example. You're correct, my formulas would require the experimenter to not be blinded to condition, which you say is valid reason for rejection of the formula. How about this extreme example-

If round 1 score > overall mean, then new score = 100
If round 1 score <or= overall mean, then new score= 0

This type of formula would potentially magnify any small differences that may exist between groups and magnify them to the point of being statistically significant.

(Please be clear that I am not accusing you of being a scammer when I present these extreme examples. However, your inclusion of the phrase "They do not get to reject a proposed code based on any other reason" is a little strange [and unscientific] and hints, to me at least, at mistrust or even duplicity.Surely you recognize that scammers are out there, and the MDC makes JREF a target. I cannot imagine JREF will agree to a formula for calculating a score without knowing what it is.)

As for the directionality of difference between groups- You seem to be proposing a test of the difference between 2 group means. The most basic statistical procedure used to do so is a t-test. The t statistic is normally distributed (e.g. when all possible values of t are graphed, the graph takes on a bell-curve shape). To simplify things- if you hypothesize that the prayer group score will be higher than the control, p=.05, you are statistically saying that the t score will fall in the right 5% end (tail)of the bell curve (this is known as a 1 tailed test). If you hypothesize simply that there will be a difference between the means, p=.05, you are statistically saying that the t score will fall in either the right 2.5% or the left 2.5% (a two-tailed test). As can be seen, a lower difference in the means (with prayer>control) will be consiedered significant in the one tailed test, but not in the two tailed. If there is any theoretical basis to do so, a one tailed-test should be employed. While you might not care about type-2 error (not noticing a positive result when there is one), it is proper statistical procedure to minimize both type 1 and type 2. Regardless, any statistical program (or t-test distribution chart, if you're calculating the statistic by hand!) will require you to specify ahead of time whether you test is 1-tailed or 2-tailed.

sorry for all the stats geek stuff, but once you go down the road of using inferential statistics, it is important to follow all the road signs.

Gr8wight
25th August 2006, 06:56 AM
Gr8wight - As I said on the About page (did you read it?), I am agnostic on the subject, as I have not seen sufficiently convincing evidence either way. I am conducting this study purely out of a desire to investigate it. My religious/spiritual beliefs in no way require that prayer work, or that it work in this manner.


In that case, I think you are in the wrong place. The JREF is not a research foundation, and the challenge is not a vehicle for scientific inquiry. The challenge is aimed at those who make a claim, not those who just want to find out. You should be pursuing this at a university somewhere.

Ladewig
25th August 2006, 07:25 AM
Prayer is one of those things that is probably impossible to test in a rigorous double-blind experiment without doing things that no ethics review board would ever allow (lying to patients about whether or not they were being prayed for, and lying to "pray teams" about whether or not the people they're praying for are really sick etc.


Why would no ethics board approve that? How is it different from giving placebos to sick people?

Cuddles
25th August 2006, 07:28 AM
Why would no ethics board approve that? How is it different from giving placebos to sick people?

Placebos generally aren't given for serious diseases. New treatments are compared with accepted treatment, but not with a placebo. Deliberately withholding treatment from a potentially fatal disease like cancer would be considered serious misconduct, and possibly murder, if there is any treatment available that gives a better survival rate than placebo.

saizai
25th August 2006, 10:29 AM
jsk - I understand your concern. I have no intent to scam JREF; I simply want to limit their decision capacity to the minimum necessary to ensure an acceptable protocol. This means being specific about the reasons they can reject a suggestion of mine.

I'm afraid I don't understand your second example either. How does that introduce a false positive? All it would say is "above average" or "below average"; statistically that should be 50-50. No recipients are transferred from round to round...

Thank you for your clarification re 1 vs 2 tailed analysis. I'll have to take that into consideration and answer later.

gr8wight - To the contrary, I believe that the challenge *is* a vehicle for scientific inquiry. JREF in this case is taking its usual stance as a skeptic; I am taking the place of a challenger who does not know if what he is claiming is real or not. This is hardly unusual. There is nothing in the challenge rules that says that I have to have a fervent belief in what I claim; I can do so purely for experimental reasons. I "claim" that the null hypothesis is false, for the purposes of this discussion. That is not different in principle from any other claim.

Not to mention, it is explicit in the challenge rules that JREF is not interested in my beliefs, theories, or anything of the sort - only in whether it works or not. I likewise am not interested in discussing my beliefs or theories. :)

yoink - Could you please explain what about my setup (namely, remote prayer) that intrinsically makes it impossible to have rigorous double blinding without the need to lie to anyone?

saizai
25th August 2006, 10:30 AM
Cuddles - Nobody is suggesting replacing standard treatment with prayer. It is only to be used as a supplement.

William Smith
25th August 2006, 10:43 AM
...
GK - What sort of questions that relate to your experiences with other "wanna-be-claimants"? I feel I've been quite forthright, rational, comprehensible, and sane...
...


Sorry saizai, that came out all wrong on my part.

So far, your appearance seems indeed "forthright, rational, comprehensible and sane". What I had in mind were the "here today and gone tomorrow" approaches of the Mental Professor, supernaturalbeing and Neutralize in particular.

If I choose to enter a discussion with a potential Applicant, it mostly means I will take an all-out/full-on/whole-nine-yards approach and also become involved on an emotional level.
I spend a good part of my day with reflections on how to improve the protocol or solve the problems at hand. By choice, of course.

Should you apply, saizai, your claim together with your approach so far seem worth devoting time and energy to. The discussion in this thread shows others think alike.

saizai
25th August 2006, 10:52 AM
GK - I see. All I can do is point to my existing commitment (namely, the website which I am programming and is up, and the associated costs) and say that I have no intention of vanishing without completing at least one round. If that one round shows absolutely no discernable effect, then I'll either need to repeat with a larger sample, change what info I'm tracking (hopefully I'll have been comprehensive), or admit defeat. In any case, the protocol would need to be figured out first, so your input would not be wasted.

To be perfectly honest, I am in no particular hurry to submit the official application, since JREF's involvement doesn't occur until after the first round, and it is likely to take some time to recruit enough participants to begin. That is the only reason.

Yoink
25th August 2006, 11:01 AM
Why would no ethics board approve that? How is it different from giving placebos to sick people?

Nobody is given a placebo unless they agree to enroll in a study. They are informed in advance that they will be given treatment that will be either the experimental medicine or the placebo and that neither they nor the doctor treating them will know which they are receiving.

No ethics board would ever, under any circumstances whatsoever, agree to giving patients a placebo while telling them that it is actually an effective medicine.

Yoink
25th August 2006, 11:09 AM
yoink - Could you please explain what about my setup (namely, remote prayer) that intrinsically makes it impossible to have rigorous double blinding without the need to lie to anyone?

saizai, you misread my post. As far as I can see, with a large enough sample your protocol works fine (I think it is set up in such a way that it needs a very large sample, and you may find it difficult recruiting enough participants). My point was a propos the fact that when you get a negative result, believers in prayer will simply dismiss this on the basis that the link between prayers and prayees was too tenuous. I was suggesting that the only way to set up a test that might actually shake the faith of believers would be to engage in active deceit--deceit of the kind that no ethics board would approve.

This, of course, is a common problem in any scientific question devoted to human behaviour: there are all sorts of social questions that could be answered by raising humans in isolated cages or Truman Show like virtual worlds and performing "clean" experiments on them--but, of course, these are not acceptable in the real world. I wasn't knocking your design, just making a general observation about the limitations of studies of this kind on human subjects.

saizai
25th August 2006, 11:11 AM
Yoink - You're wrong. Patients are told that they may be given a placebo, and that the treatment is experimental. Your argument that they should not be given a medicine that hasn't been proven effective is circular. The only exception is that medicines that act biologically must be proven in more "disposable" oragnisms (eg rats) to not be harmful - not necessarily to have a positive effect (since humans are more complex).

saizai
25th August 2006, 11:16 AM
Yoink - I see. Yes, believers could do so; I try to mitigate that as much as I can think of (by providing photo, first name, gender, age, state, country, primary cancer location, and cancer type) but of course it is inevitable. Nevertheless, there are plenty of people who claim that remote, intercessionary prayer of this sort works. It's acceptable to require a retreat to claiming that only in-person prayer, or prayer for a person one has contact with, works. I have no particular interest in "shaking the faith of believers"; I am not and will not be a proselytizer of any faith or lack thereof.

Nevertheless I still don't think I understand what sort of "faith-shaking" deceit-inclusive study you are thinking of. Could you detail a couple specific examples?

I don't think your analogy is really accurate - the comparisons are rather extreme. :p

digithead
25th August 2006, 11:25 AM
Saizai-

As for the directionality of difference between groups- You seem to be proposing a test of the difference between 2 group means. The most basic statistical procedure used to do so is a t-test. The t statistic is normally distributed (e.g. when all possible values of t are graphed, the graph takes on a bell-curve shape). To simplify things- if you hypothesize that the prayer group score will be higher than the control, p=.05, you are statistically saying that the t score will fall in the right 5% end (tail)of the bell curve (this is known as a 1 tailed test). If you hypothesize simply that there will be a difference between the means, p=.05, you are statistically saying that the t score will fall in either the right 2.5% or the left 2.5% (a two-tailed test). As can be seen, a lower difference in the means (with prayer>control) will be consiedered significant in the one tailed test, but not in the two tailed. If there is any theoretical basis to do so, a one tailed-test should be employed. While you might not care about type-2 error (not noticing a positive result when there is one), it is proper statistical procedure to minimize both type 1 and type 2. Regardless, any statistical program (or t-test distribution chart, if you're calculating the statistic by hand!) will require you to specify ahead of time whether you test is 1-tailed or 2-tailed.

sorry for all the stats geek stuff, but once you go down the road of using inferential statistics, it is important to follow all the road signs.

You need to go back to Stats 101. When you're using t-tests, you're testing the null hypothesis, that is there is no difference in the the means of two groups. The alternative hypothesis only comes into play if you reject the null. Your alpha level, 0.05, sets the acceptable Type I error level and rejection region at which you will accept for a particular test. If the observed t-score is in the rejection region, you conclude there is a statistically significant difference, otherwise you conclude that you "failed" to reject the null hypothesis. You are not "statistically saying" that there is a difference when you specify the null hypothesis, you are merely constructing a logical framework in which to test whether or not you have enough evidence to conclude that there is an effect and place the error bounds on that evidence...

And the t-distribution is not a Gaussian distribution, it's limiting distribution is Gaussian when your sample size is large, but it is definitely not Gaussian...

Yoink
25th August 2006, 11:57 AM
Yoink - You're wrong. Patients are told that they may be given a placebo, and that the treatment is experimental. Your argument that they should not be given a medicine that hasn't been proven effective is circular. The only exception is that medicines that act biologically must be proven in more "disposable" oragnisms (eg rats) to not be harmful - not necessarily to have a positive effect (since humans are more complex).

You either misread my post or are answering someone else's post. I never made anything like the claim you are attributing to me here.

ETA: here was what I said, just to remind you:

Nobody is given a placebo unless they agree to enroll in a study. They are informed in advance that they will be given treatment that will be either the experimental medicine or the placebo and that neither they nor the doctor treating them will know which they are receiving.

No ethics board would ever, under any circumstances whatsoever, agree to giving patients a placebo while telling them that it is actually an effective medicine.

saizai, that's twice now that you've completely misread posts of mine that were, as far as I can see, pretty straightforwardly phrased. I'd appreciate it if you would take just a second or two longer in reading my comments before firing off utterly irrelevant "rebuttals."

saizai
25th August 2006, 11:59 AM
Yoink: No ethics board would ever, under any circumstances whatsoever, agree to giving patients a placebo while telling them that it is actually an effective medicine.

Please explain how that is analagous to any prayer study setup, in which the subjects are in no way being encouraged to discontinue standard treatment.

Or explain how it is not analagous to a standard clinical trial of a medicine that has not yet been proven effective.

Yoink
25th August 2006, 12:11 PM
Yoink:

Please explain how that is analagous to any prayer study setup, in which the subjects are in no way being encouraged to discontinue standard treatment.

Or explain how it is not analagous to a standard clinical trial of a medicine that has not yet been proven effective.

O.K. saizai--let's make that three times that you have failed simple reading comprehension. I was quite impressed with your experiment when I first joined this thread, but now I have grave doubts about the experimenter.

I was answering ladewig, post #80. Go look at it. You will see that he quotes the post of mine that s/he was responding to. You might then, just possibly, realize that I was talking about a different hypothetical experiment that was NOT THE ONE YOU ARE PROPOSING.

Once you have got that through your extraordinarily dense skull you might see that my comment about what an ethics board would not approve does not refer to the experiment that you propose, which I believe would pass muster with an ethics board.

And then, finally, you might go back to my last post, or the post before that, and realize that I was accurately describing what an ethics board would approve: i.e. enrolling patients openly in a study and telling them that they will receive either a placebo or the experimental medicine. You might also realize that I never once intimated that your experiment would fail to meet this test.

I will not hold my breath, however, as I wait to see if you manage to figure any of this out. Sheesh.

saizai
25th August 2006, 12:28 PM
Yoink - There is no need to shout, be rude, or insult me. Please refrain from doing so, as I at least am attempting to engage in a more civil discourse.

ladewig was responding to your post on the first page:
The only other thing I will say is that when you get a negative result (yes, I'll take that bet) proponents of healing-through-prayer will simply say "but how can prayers to someone identified only via first name and a number on a website possible get through?" Prayer is one of those things that is probably impossible to test in a rigorous double-blind experiment without doing things that no ethics review board would ever allow (lying to patients about whether or not they were being prayed for, and lying to "pray teams" about whether or not the people they're praying for are really sick etc.

Indeed, you are discussing not only the experiment I am proposing, but *all* experiments about prayer. (Mine being a test of prayer that is intended to be rigorously double-blind...) Your statement is that "prayer is one of those things that is probably impossible to test in a rigorous double-blind experiment " without the use of deception that harms the subjects. I am asking you to defend that statement, because I believe it to be false.

I am sorry that you feel the need to resort to ad hominem attacks instead of responding to my request.

I also suggest that you be somewhat more circumspect about judging my intelligence or character, as you may be unpleasantly surprised when you are wrong.

Yoink
25th August 2006, 12:46 PM
Yoink - There is no need to shout, be rude, or insult me. Please refrain from doing so, as I at least am attempting to engage in a more civil discourse.

ladewig was responding to your post on the first page:


Indeed, you are discussing not only the experiment I am proposing, but *all* experiments about prayer. (Mine being a test of prayer that is intended to be rigorously double-blind...) Your statement is that "prayer is one of those things that is probably impossible to test in a rigorous double-blind experiment " without the use of deception that harms the subjects. I am asking you to defend that statement, because I believe it to be false.

I am sorry that you feel the need to resort to ad hominem attacks instead of responding to my request.

I also suggest that you be somewhat more circumspect about judging my intelligence or character, as you may be unpleasantly surprised when you are wrong.

saizai, you have demonstrated both an inability to understand my fairly straightforward posts, and now an inability to apologize for consistently misrepresenting them (I notice you've just ignored the post I originally objected to: the one where you referred to "[my] argument that [patients] should not be given a medicine that hasn't been proven effective" and pointed out that this claim that I had never made "is circular.")

As for the post you do reference: surely even you can see that I am saying "your test is fine, but it will return a negative, and that negative will be immediately dismissed--on perfectly reasonable grounds--by those who claim that prayer is in fact effective." I then go on to say that to do a test that might persuade believers you would need to do an unethical study that involves active duplicity. Since you seem hell bent on not understanding my point, I will try to spell it out for you further:

The problem with your experiment is that in order to maintain double-blindedness, it has to actually interfere with the normal mechanisms of "prayer treatment." That is, prayer that is claimed to be effective is prayer that involves people praying for people they know, and the people who are being prayed for being fully aware that they are being prayed for by those people.

To do a rigorous test of this "normal" prayer would require deceiving people: you would have to have some people believing that they were being prayed for by their families, where in fact you've, say, deliberately kept the family misinformed about that person's illness (you say that they're on an extended work assignment overseas, say).

So, I was not making a point that in any way criticized your experimental design, except to say that it was, of necessity, designed in such a way that what was being tested was not really "prayer healing" as any normal practitioner of "prayer healing" would normally conceive it.

digithead
25th August 2006, 12:54 PM
Saizai,

It is you that is being rude and discourteous.

I, Yoink, and others have given you valid, empirically, and scientifically based criticisms of your study design and you shrug them off as inconsequentional.

How many clinical trials have you performed? If the answer is 0 then you should be consulting with biostatistician to help you set up your experiment so that you avoid all of the pitfalls. If the answer is greater than 0 then you would know that you should be consulting with biostatistician to help you set up your experiment so that you avoid all of the pitfalls.

Please seek assistance if you really want your study to have any scientific validity. Otherwise, you're just posturing and preening for your own amusement and it's pointless for anyone to continue to answer your questions or engage you any discourse.

-digithead

saizai
25th August 2006, 01:20 PM
Yoink: I agree with your rephrasing. I believe that your previous post, as quoted, did not say that, but we can chalk it up as immaterial and based on a mutual misunderstanding.

I think the sort of more limited version prayer you suggest would be more difficult to test mainly not because you'd have to lie, but because you'd have to drastically constrain the contact between the parties involved for long enough to notice erstwhile results (months most likely).

Also disagree that your limited view of how people conceptualize prayer - i.e. that it only works for people one knows and is in contact with - is accurate of most theological conceptions of the mechanism of prayer, which would more likely claim that those constraints are unnecessary.

digithead: Please explain how my prior clinical experience or lack thereof negates the rudeness of Yoink's comments, e.g. "your extraordinarily dense skull".

jskowron
25th August 2006, 01:41 PM
You need to go back to Stats 101

Whoa- take it easy- I assumed I was explaining to someone who has not taken stats 101, so I took some liberties with my explananation. That said, there is no excuse for me saying that the t distribution is normal. It would have sufficed for me to say it was bell shaped, with higher peak and more of its distribution in the tails than a standardized normal distribution. Certainly, with a low sample size (eg <40), the sampling distribution is unlikely to be normally distributed.

My error is, I think, minor far as the OP is concerned, but major as far as an accurate representation of sampling distributions and an understanding of the central limit theorem is concerne. Thanks for pointing it out.

As far as my use of the phrase "statistically saying......", I simply found it more clear than saying "you are hypothesizing that, if the null hypothesis were true, you would only expect the results you obtained only 5 times out of 100." It is, in fact, a peave of mine when people equate the p-value with the percentage chance that the alternate hypothesis is true. I was trying to explain my point, as requested by Saizai, in as simple a manner as possible. I don't think my short-cut explanation negates the fact that if he has a theoretical reason to use a 1-tailed test, he should do so.

saizai
25th August 2006, 01:47 PM
jsk - How about simply saying "it tends towards normal distribution" i.e. like a limit?

digithead
25th August 2006, 02:49 PM
digithead: Please explain how my prior clinical experience or lack thereof negates the rudeness of Yoink's comments, e.g. "your extraordinarily dense skull".

He is venting his frustration at your willful ignorance to the problems that have been pointed out to you, but that's irrelevant...

And can you explain how your lack of prior clinical experience might lead you to design a poor study?

And how you blithely dismiss the concerns that have raised as inconsequentional when you've be given ample evidence that they're not?

You can't be helped if you're not willing to listen and take heed of some sound scientific advice...

saizai
25th August 2006, 02:59 PM
"Willful" ignorance is an pretty strong claim. Are you psychic?

Certainly, I understand the benefits of experience, which is why I consult others. I don't feel that this in any way makes it okay for someone to insult me, nor for you to turn that into an ad hominem attack as well.

If you have comments about the study design, please restrict them to the methodology itself rather than making comments about me.

Yoink
25th August 2006, 03:02 PM
Saizai, for what it's worth I do apologize for the intemperence of my post. It is frustrating, though, to be posting essentially sympathetic and would-be-helpful comments and to get a series of hostile responses that are based on arbitrary misreadings of one's posts.

In general it seems to me that you are wedded to your design, and look at any and all criticism as simply something that needs to be explained away. As it happens I think your design--with a large enough sample--will probably work well enough to generate some kind of meaningful outcome (one to which the anti-prayer-healing folk will say "see?" and to which the pro-prayer-healing folk will say "well, of course, for prayer to really work you need such-and-such and this-and-so). On the other hand, I think you'd probably make an even better study if you were a little more open to constructive criticism and a little less wedded to defending your current design.

saizai
25th August 2006, 03:52 PM
Yoink - Thank you for your apology.

I am only wedded to the design insofar as I want to be able to test prayer in a feasible way, and do not want to claim any personal ability nor that the effect will be very large or completely reliable. Can you think of any other method? Can you think of any way to make the protocol I've described laxer (but still rigorous) to encompass more pro-prayer viewpoints? The only thing I can think of would be to allow contact or restrict to known people, and of course that as you have pointed out would be excessively difficult to run. So this is the only design I feel adequately meets my desired criteria.

I am a skeptic, too, you see. I will defend the design though I am perfectly willing to be convinced otherwise... but as with others' comments that are obviously not flaws though they were claimed to be (eg "what if someone doesn't pray? what if they have other people praying for them?"), I'm not going to blindly accept criticism that isn't properly justified.

saizai
25th August 2006, 09:43 PM
FYI: I'm doing a server move now and debugging the result (why is it that whenever you have something that works on one server, it breaks when you move it to another?).

Anywho, hopefully it should be functional soon - and on a server that can actually have reliable uptime and load capacity, unlike my laptop.

saizai
26th August 2006, 11:06 AM
*grumbles about Apache and capistrano and such*

saizai
26th August 2006, 11:27 AM
FYI: The main site is not working yet, BUT http://forums.prayermatch.org is.

I've added a Skeptic's Corner forum on it, and will offer moderator status there to anyone who is a moderator here (PM me here with your username if you want me to mod you.)

You may notice some other parallels to the JREF forums. ;)

digithead
26th August 2006, 03:46 PM
"Willful" ignorance is an pretty strong claim. Are you psychic?

Certainly, I understand the benefits of experience, which is why I consult others. I don't feel that this in any way makes it okay for someone to insult me, nor for you to turn that into an ad hominem attack as well.

If you have comments about the study design, please restrict them to the methodology itself rather than making comments about me.

Well, since all of my posts have been about your study and your continued dismissal of concerns that quite a few of us have raised, it appears to be "willful" ignorance to me...

Am I psychic? No. But I at least understand what ad hominem means means, which is to attack the person rather than their ideas or actions. Since I characterized your actions, which are fair game, it wasn't ad hominem. I didn't call you names or suggest that you're stupid, I merely made the observation that you are willfully ignoring all of the problems with your study that others and I have raised...

And since you don't want to listen, I'll wish you luck in your study and be done with matter...

Startz
26th August 2006, 09:16 PM
I'd like to speak up in saizai's defense. Mind you, I think he has exactly the same chance of winning the million as I do of pulling a rabbit out of my hat. And I have questions about whether the protocol is secure. But he's asked a couple of times for clarification of why his randomization procedure is an inadequate safeguard. I don't think he's gotten a very clear answer.

If some stat type has the time, the following might be helpful. Perform a Monte Carlo consistent with saizai's "experimental design" that shows why his test of the null hypothesis is invalid. Then perhaps an explanation could be provided or perhaps even the code could be posted. This might be illuminating for those who aren't statisticians. It might also be useful for JREF before they proceed with any protocol.

digithead
26th August 2006, 11:25 PM
I'd like to speak up in saizai's defense. Mind you, I think he has exactly the same chance of winning the million as I do of pulling a rabbit out of my hat. And I have questions about whether the protocol is secure. But he's asked a couple of times for clarification of why his randomization procedure is an inadequate safeguard. I don't think he's gotten a very clear answer.

If some stat type has the time, the following might be helpful. Perform a Monte Carlo consistent with saizai's "experimental design" that shows why his test of the null hypothesis is invalid. Then perhaps an explanation could be provided or perhaps even the code could be posted. This might be illuminating for those who aren't statisticians. It might also be useful for JREF before they proceed with any protocol.

Starz, this is done for your benefit because Saizai clearly has dismissed this as inconsequential, which he does at his own peril...

He is wrong because he's naively assuming that the typical confounders in any clinical trial or quality of life analysis will be taken care of solely by his simple random sampling. Type of disease, disease severity, current treatment, gender, race, religion, culture, education, and age are all confounders to some degree in these types of studies but mainly I'm concerned about type of disease and its severity and current treatment because these can easily distort measures if they are not adjusted for in a statistical model. Another term is lurking variable. Without adjusting for it, any statistical test will likely show a spurious result either in the negative (it covers up the relationship) or more likely the positive (it makes the relationship significant when it's not). There's no need to perform Monte Carlo simulations, this is basic clinical trial design fundamentals...

More simply, by his randomization scheme, he's assuming homogeneity across the groups which you cannot make without testing for it...

If you want a good reference, try Friedman, Furberg, and DeMets "Fundamentals of Clinical Trials", it's probably in its 5th edition now...

I also don't think his Likert scale proposal is all that great for what he's trying measure which is essentially a quality of life estimate. He should try something like the SF-36 which has norms that he can compare against. But this is an entirely other matter from his randomization scheme...

saizai
27th August 2006, 03:22 AM
I would quite like to see the result of Startz' suggestion.

Digithead - It's still ad homem (and psychic), because you are asserting something about what I intend, in contravention to what I've explicitly said (i.e. that I am interested in criticism so long as it is justified). Or, perhaps, do you know want I want better than I? That would be quite the power!

A bit difficult to prove in double-blind trials though... :p

saizai
27th August 2006, 03:25 AM
P.S. I'm not familiar with "Likert scale" or "SF-36". Care to explain?

http://www.prayermatch.org:11234 btw is up now. Hopefully the Apache server will be fixed soonish so that the :11234 isn't necessary, but at least that works and it's not hosted off my little laptop. http://forums.prayermatch.org also works.

Gr8wight
27th August 2006, 08:30 AM
In addition to my feeling that this "experiment" does not belong in this venue, I don't see how any study of prayer can possibly be considered valid as it is impossible to effectively set up a control group. How do you tell people not to pray for someone who is sick? How do you know they are telling the truth if they claim not to have prayed. How do you know how many people outside of the study's purview did or did not pray for the subjects in the study? I don't see how randomisation can be an effective control in this area unless the sample size is huge.

Startz
27th August 2006, 10:48 AM
Perhaps this is, if everyone will forgive the cliche, a teachable moment. (I hope not to be boring everyone with statistical detail.) saizai believes that his idea is subject to straightforward statistical test. Others claim you can't ignore "confounders." First, the general principle. Then, an illustration.

In general, "controlling for confounders" is a critically important step. Imagine your outcome was whether patients died in a hospital and you were trying to see if being given last rites made a difference. If you tested for differences in means without controlling for patient health (a confounder) you'd get a silly answer. That's because those given the last rites are probably a lot sicker.

BUT! It isn't absolutely necessary to control for confounders that are uncorrelated with your observed explanatory variables. It's okay to omit confounders if they are uncorrelated with selection into control vs treatment group and uncorrelated with the data gathering process.

To illustrate this I made up a computer example in which 400 observations are randomly divided into a "prayed for" and "not prayed for" group. An outcome y is determined by a strong confounder, a modest random effect, and no effect at all of prayer. Then I do a standard 2-tailed t-test for differences in means at the 0.05 level and see how often "prayer has no effect" is rejected.

I did this 10,000 times and a found a significant effect of prayer 4.48 percent of the time. Just about what you'd expect.

This seems to illustrate that saizai 's claim that confounders don't matter in his application is basically right. Or it may illustrate that people are talking at cross-purposes, in that my "model" of saizai's experiment misses an important element that would make a difference. I hope that showing this toy model will make it easier for others to point out the specifics of what's missing.

For those interested, here's the code (in Matlab).

%{
confoundedPrayer.m
Monte Carlo to illustrate testing for prayer efficacy
in presence of confounder
Dick Startz
August 2006
%}
rand('state',0); %% reset random number generators
randn('state',0);
nMonte = 10000
n = 400
rejectSum = 0;
for iMonte = 1:nMonte
prayFor = rand(n,1)>0.5;
notPrayFor = ~prayFor;
y = 2 + 3*rand(n,1) + 0*prayFor + randn(n,1);
meanDif = mean(y(prayFor)) - mean(y(notPrayFor));
stdErr = sqrt(var(y(prayFor))/sum(prayFor) + var(y(notPrayFor))/sum(notPrayFor));
rejectSum = rejectSum + (abs(meanDif/stdErr)>1.96);
end
disp(['False rejections occurred ',num2str(100*rejectSum/nMonte),' percent of the time']);

digithead
27th August 2006, 12:09 PM
I don't see anywhere in your code where you included disease type, disease severity, treatment protocol, or any of the other confounders I've listed. Your assumption is still that the two groups are homogeneous which you cannot make until you verify from your sampling that it is true...

And confounders are not only correlated with the selection process, they are correlated with other covariates and the outcome and need to be accounted for in either the selection process or in the statistical analysis...

If you were to focus on one and only one disease type, severity, and treatment your assumptions would be correct although you'd still have to test if the groups are homogeneous with regard to gender, etc. Randi discussed this type of study here where they looked at prayer and its effect on patients who had coronary bypass surgery:

http://www.randi.org/jr/2006-04/041406schwartz.html#i7

However, in this type of study that Saizai wants to do, a matched pair design would work just as well with randomly selecting an individual with given demographics, disease type, disease level and treatment into the new treatment group and then matching with a similar individual in the control group...

As for Likert scale, see:

http://en.wikipedia.org/wiki/Likert_scale

And SF-36, see:

http://www.sf-36.org/

digithead
27th August 2006, 12:17 PM
Here's the cardiac study at Pubmed:

http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=pubmed&cmd=Retrieve&dopt=AbstractPlus&list_uids=16569567&query_hl=3&itool=pubmed_docsum

Note that they demonstrated that "Major events and 30-day mortality were similar across the 3 groups" rather than assuming this was true...

But the outcome is what's most interesting:

"CONCLUSIONS: Intercessory prayer itself had no effect on complication-free recovery from CABG, but certainty of receiving intercessory prayer was associated with a higher incidence of complications."

digithead
27th August 2006, 12:23 PM
And looking at the bibliography in the article in greater detail, there are many studies of intercessory prayer. One would be better off doing a meta-analysis of the literature than reinventing the wheel. Although at first glance, I can guess what the result will be given the titles of many of the articles...

Note: edited for grammar

Startz
27th August 2006, 12:32 PM
I don't see anywhere in your code where you included disease type, disease severity, treatment protocol, or any of the other confounders I've listed. Your assumption is still that the two groups are homogeneous which you cannot make until you verify from your sampling that it is true...


The relevant line in the code is
y = 2 + 3*rand(n,1) + 0*prayFor + randn(n,1);

where "rand(n,1)" represents confounders. This allows for the confounders to be stronger for one group than the other - but only by random chance.

And confounders are not only correlated with the selection process, they are correlated with other covariates and the outcome and need to be accounted for in either the selection process or in the statistical analysis...
>snip


I've supplied some evidence, albeit pretty simple-minded. Perhaps those who disagree might chip in with evidence too. All we need is a simple example that models the proposed protocol and comes up with false positives. I'm sure that such an example would be very helpful in letting JREF know what to watch out for.

digithead
27th August 2006, 12:41 PM
As for the meta-analysis I've proposed, nevermind. It's already been done:

Masters KS, Spielmans GI, Goodson JT (2006). Are there demonstrable effects of distant intercessory prayer? A meta-analytic review. Ann Behav Med 32(1):21-6.

It's conclusion: "There is no scientifically discernable effect for IP [Intecessory Prayer] as assessed in controlled studies. Given that the IP literature lacks a theoretical or theological base and has failed to produce significant findings in controlled trials, we recommend that further resources not be allocated to this line of research."

It also was "designed to provide a current meta-analytic review of the effects of IP and to assess the impact of potential moderator variables."

Moderator variables are another name for confounders...

So Saizai, I'm figuring that you need to do a serious literature review before you even embark on your study...

digithead
27th August 2006, 12:44 PM
And Starz, for your code to be valid you needed to use a multivariate random variable with a certain level of covariance. Your code "y = 2 + 3*rand(n,1) + 0*prayFor + randn(n,1);" is just an addition of two random variables which creates only an additive distribution (N(mu1+mu2,sigma)) rather than a multivariate distribution (N(mu1, mu2, sigma1, sigma2, rho))...

Startz
27th August 2006, 01:36 PM
And Starz, for your code to be valid you needed to use a multivariate random variable with a certain level of covariance. Your code "y = 2 + 3*rand(n,1) + 0*prayFor + randn(n,1);" is just an addition of two random variables which creates only an additive distribution (N(mu1+mu2,sigma)) rather than a multivariate distribution (N(mu1, mu2, sigma1, sigma2, rho))...

Okay, that's a very specific response. Is the following correct? If instead of rand(n,1) and randn(n,1) I used two correlated normal random variables with different means and variances, then I'll get too many false positives?

digithead
27th August 2006, 01:52 PM
I haven't used Matlab in years so I don't know....

In SAS, to generate a random bivariate normal from two N(0,1) random variables with a correlation of 0.5 it's:

data a;
keep x y;
mu1=0; mu2=0; sigma1=1; sigma2=1; rho=.5;
do i = 1 to 10000;
x = mu1+sqrt(sigma1)*rannor(0);
y = (mu2+rho*(sqrt(sigma2)/sqrt(sigma1))*(x-mu1)) +
sqrt(sigma2*(1-rho**2))*rannor(0);
output;
end;
run;

The rannor function with a seed of 0 generates the random number off of the computer clock at the time of calculation so it's somewhat the "most random" of the pseudorandom number generators...

One can also adapt the above to any other mu, sigma or rho they wish and I'm sure it can be translated into Matlab or any other stats/math package that's used...

But it's a moot point because as I've pointed out above, studies have shown that intercessory prayer has no clinical effectiveness. In fact, when Masters, et al (2006) removed the discredited Cha and Wirth (2001) study, the overall effect of intercessory prayer not just not statistically insignificant it was nearly nil...

So I'd just leave Saizai to his wishful thinking that he'll somehow demonstrate that prayer works...

Startz
27th August 2006, 02:55 PM
I haven't used Matlab in years so I don't know....

In SAS, to generate a random bivariate normal from two N(0,1) random variables with a correlation of 0.5 it's:

data a;
keep x y;
mu1=0; mu2=0; sigma1=1; sigma2=1; rho=.5;
do i = 1 to 10000;
x = mu1+sqrt(sigma1)*rannor(0);
y = (mu2+rho*(sqrt(sigma2)/sqrt(sigma1))*(x-mu1)) +
sqrt(sigma2*(1-rho**2))*rannor(0);
output;
end;
run;

The rannor function with a seed of 0 generates the random number off of the computer clock at the time of calculation so it's somewhat the "most random" of the pseudorandom number generators...

One can also adapt the above to any other mu, sigma or rho they wish and I'm sure it can be translated into Matlab or any other stats/math package that's used...

But it's a moot point because as I've pointed out above, studies have shown that intercessory prayer has no clinical effectiveness. In fact, when Masters, et al (2006) removed the discredited Cha and Wirth (2001) study, the overall effect of intercessory prayer not just not statistically insignificant it was nearly nil...

So I'd just leave Saizai to his wishful thinking that he'll somehow demonstrate that prayer works...

I translated your suggestion into Matlab and ran 50,000 simulations. The simulated false positive rate was 4.908 percent, just as one would expect. So for this little piece of the puzzle, Saizai would appear to have a perfectly valid procedure...which I'm sure will therefore fail to win the million dollars.


%{
confoundedPrayer.m
Monte Carlo to illustrate testing for prayer efficacy
in presence of confounder
Dick Startz
August 2006
%}
rand('state',0); %% reset random number generators
randn('state',0);
nMonte = 50000
n = 400
rejectSum = 0;
mu1=0; mu2=0; sigma1=1; sigma2=1; rho=.5;

for iMonte = 1:nMonte
prayFor = rand(n,1)>0.5;
notPrayFor = ~prayFor;
x = mu1+sqrt(sigma1)*randn(n,1);
y = (mu2+rho*(sqrt(sigma2)/sqrt(sigma1))*(x-mu1)) + sqrt(sigma2*(1-rho^2))*randn(n,1);

outcome = x + y + 0*prayFor;
meanDif = mean(outcome(prayFor)) - mean(outcome(notPrayFor));
stdErr = sqrt(var(outcome(prayFor))/sum(prayFor) + var(outcome(notPrayFor))/sum(notPrayFor));
rejectSum = rejectSum + (abs(meanDif/stdErr)>1.96);
end
disp(['False rejections occurred ',num2str(100*rejectSum/nMonte),' percent of the time']);

saizai
27th August 2006, 03:41 PM
In addition to my feeling that this "experiment" does not belong in this venue,

In what way exactly? It is a potential MDC app, albeit of a claim somewhat different from the usual "I can do it every time" sort.

I don't see how any study of prayer can possibly be considered valid as it is impossible to effectively set up a control group. How do you tell people not to pray for someone who is sick? How do you know they are telling the truth if they claim not to have prayed. How do you know how many people outside of the study's purview did or did not pray for the subjects in the study? I don't see how randomisation can be an effective control in this area unless the sample size is huge.

I am not telling anyone not to pray. The people who are praying do not know the people they are praying for personally. I simply tell them to pray for those people.

If prayer works in such a way that one prayer is enough for everything, and more is not additive, then no prayer should work at all because people have prayed that everyone in the world be healthy for ever and ever amen. I do track how many outside people the recipient knows to be praying for them, but I expect that to be irrelevant.

You need to explain how your claimed lack of controls would result in a false positive for this to be a flaw - unless you are taking a believer's stance and saying that the study is *too* tightly controlled for how you believe prayer works, and therefore will show a false *negative*.

Which is it?

saizai
27th August 2006, 03:56 PM
startz - Thank you for your examples; they demonstrate exactly what I was thinking.

digithead - I am aware of the previous literature on the subject. As I have said elsewhere, previous studies have either:
* been flawed in methodology
* had very small sample sizes
* shown no effect
* shown marginal effect
* been inconclusive

I simply want to run something that is big enough to find a smallish effect. It is relatively easy for me to do this, so I see no reason not to. Plus I'm using it as an excuse to build up my Ruby on Rails skills. ;-)

Your advice to read the literature seems to simply be a suggestion that I not do it at all, not that I do it better.

saizai
27th August 2006, 04:50 PM
digithead - Thanks for the reference to SF-36; I think I may end up using it. Do you know of any similar standard survey that is cancer specific (or adds cancer specific questions)?

I'm also considering requesting monthly reports from Healers as well. It wouldn't impact a result as far as the MDC is concerned, but would still be potentially interesting data from a research perspective.

Gr8wight
27th August 2006, 10:25 PM
I am not telling anyone not to pray. The people who are praying do not know the people they are praying for personally. I simply tell them to pray for those people.

If prayer works in such a way that one prayer is enough for everything, and more is not additive, then no prayer should work at all because people have prayed that everyone in the world be healthy for ever and ever amen. I do track how many outside people the recipient knows to be praying for them, but I expect that to be irrelevant.

You need to explain how your claimed lack of controls would result in a false positive for this to be a flaw - unless you are taking a believer's stance and saying that the study is *too* tightly controlled for how you believe prayer works, and therefore will show a false *negative*.

Which is it?

My point is that the numbers have to be compared against a control group that has not received prayer. However, the fact that you have no way of determining if any or all persons in the control group were prayed for by someone outside of the experiment, you cannot actually say the group is an effective control. Any results you get, positive, or negative will be meaningless because you plain don't know who was really prayed for, and who wasn't.

Ladewig
28th August 2006, 06:23 AM
To do a rigorous test of this "normal" prayer would require deceiving people: you would have to have some people believing that they were being prayed for by their families, where in fact you've, say, deliberately kept the family misinformed about that person's illness (you say that they're on an extended work assignment overseas, say).



Thanks. I had absolutely no idea what you were talking about until this paragraph.

petre
28th August 2006, 08:27 AM
My suggestion to saizai:

1. Take an existing study
2. Identify any flaws saizai exist in the methodology
3. Propose the specific changes made to address said flaws

Introducing new elements (online participation, self-selection, etc) without providing a full analysis of their effect on the study only increases the opportunity for flawed methodology, and therefore makes such a study less useful than the others that have already been completed.

saizai
28th August 2006, 11:38 AM
My point is that the numbers have to be compared against a control group that has not received prayer. However, the fact that you have no way of determining if any or all persons in the control group were prayed for by someone outside of the experiment, you cannot actually say the group is an effective control. Any results you get, positive, or negative will be meaningless because you plain don't know who was really prayed for, and who wasn't.

Wrong. I am not testing the diff between (people who get no remote prayer) and (people who do).

I am testing the diff between (people who are normal, and may be getting prayer in their usual manner) and (people who ALSO get extra prayer through me).

You are making a believer's argument that the results would be a false negative... which given the forum means you're probably confused.

saizai
28th August 2006, 11:40 AM
My suggestion to saizai:

1. Take an existing study
2. Identify any flaws saizai exist in the methodology
3. Propose the specific changes made to address said flaws

Introducing new elements (online participation, self-selection, etc) without providing a full analysis of their effect on the study only increases the opportunity for flawed methodology, and therefore makes such a study less useful than the others that have already been completed.

I'm not interested in replicating previous studies because I haven't the funds or the manpower to do them. But I do have the skill and ability to do one that's online.

Self-selection cannot result in a false positive. Neither can online participation.

Please, if you have a criticism, explain how it will result in a false positive.

digithead
28th August 2006, 11:55 AM
I translated your suggestion into Matlab and ran 50,000 simulations. The simulated false positive rate was 4.908 percent, just as one would expect. So for this little piece of the puzzle, Saizai would appear to have a perfectly valid procedure...which I'm sure will therefore fail to win the million dollars.


My mistake, you are still assuming equal variances which is not a valid assumption when your sampling from a heterogeneous population. In a prior life I could suggest to you how to correctly simulate this but I've given up on the sweet life of statistics for the hard life of criminology where there are no answers...

But to give an example of my concerns, say you've recruited 50 people into the study, 4 of them have localized skin cancer, 25 have non-Hodgkins lymphoma, 19 have leukemia, and 1 has beginning pancreatic cancer and 1 has end-stage pancreatic cancer.

You randomize into groups of 25 with each person having an equally likely chance to be picked for the treatment group and you get the following sample:

Treatment: 3 skin cancer, 12 non-Hodgkins lymphoma, 10 leukemia
Control: 1 skin cancer, 13 non-Hodgkins lymphoma, 9 leukemia, 1 beginning pancreatic cancer, 1 end-stage pancreatic cancer

Note that this sampling scheme is as likely as any other sampling under your randomization technique. Can you see why you need to control for disease type, disease severity, and medical treatment? Can you see why a quality of life measure might be higher in the treatment group solely by the type of disease proportion it has? It need not be intentional for bias to creep in because you have to compare apples to apples or in this case, disease to disease...

Look at the cardiac study that Randi talked about, they only looked at those who had CABG procedure yet they still controlled for complications and other confounders. You have to account for possible confounders both in your sampling scheme and in your statistical analysis...

And as I've said, performing a prayer study is an exercise in futility given the 15 or so studies that have been performed over the years and the fact that none that could stand up to rigorous scientific scrutiny showed an effect. It would be a waste of time, effort and money and from an ethical standpoint might make someone abandon treatments that work in favor of divine intervention...

Cuddles
29th August 2006, 05:04 AM
I'd say the most significant problem is that you have assumed 400 people. Is there any realistic chance of this number of people actually being studied here? I would expect maybe 10 or 20 people for a preliminary study like this. I don't think anyone here will claim that randomisation doesn't work with large numbers, but with the small number likely to be used it must be proven that the two groups have similar makeups. I notice saizai didn't answer this question last time I asked it.

petre
29th August 2006, 09:51 AM
I'm not interested in replicating previous studies because I haven't the funds or the manpower to do them. But I do have the skill and ability to do one that's online.

Self-selection cannot result in a false positive. Neither can online participation.

Please, if you have a criticism, explain how it will result in a false positive.

Even if you're not, perhaps people that participate in the study (actual participants, observers, etc) are concerned about false negatives. To ignore their concerns ailenates them. Such a study would self-select for persons that are not concerned with false negatives.

I simply see no reason to perform yet another test with obvious flaws and complete disdain for making adjustments to improve sampling and analysis.

If you wish an even simpler test of prayer with no chance of creating a false positive, simply pray that every person on earth will just suddenly believe in prayer.

Finally, the JREF tends to be a bit of a stickler on allowing confounders in protocols (by that I mean, the potential for a false negative). They'd likely insist on a claim along the lines of, "I believe prayer has an effect, (describe the effect), (describe a test that will demonstrate the effect if it exists)". If you admit that your test may not conclusively demonstrate said effect even if it exists, your protocol will likely be rejected.

saizai
29th August 2006, 12:44 PM
Cuddles - As I've said, the number of participants is purely limited to what can be done practically. I have no other reason to put a limit on it.

Also I should point out (again...) that excessively small values of N would result in it being impossible to get p<.05, so that point is somewhat moot.

petre - That's interesting; I didn't realize you were advocating for the believer's perspective. Please be more explicit about this in the future.

I do not have "disdain" for improvements - but on this forum, I have to assume at the outset that everyone will be advocating the skeptic's perspective, and require that they be logical (i.e. prove a false positive result) to justify a critique.

Could you please specifically list what you believe the "obvious flaws" to be, how to correct them, whether they would calse a false positive or negative, and how they would do so?

If you wish an even simpler test of prayer with no chance of creating a false positive, simply pray that every person on earth will just suddenly believe in prayer.

Finally, the JREF tends to be a bit of a stickler on allowing confounders in protocols (by that I mean, the potential for a false negative). They'd likely insist on a claim along the lines of, "I believe prayer has an effect, (describe the effect), (describe a test that will demonstrate the effect if it exists)". If you admit that your test may not conclusively demonstrate said effect even if it exists, your protocol will likely be rejected.

I am not, and will never be, claiming that prayer works (if it does) on quite such a dramatic scale.

JREF shouldn't be concerned with false negatives except out of the goodness of their hearts (ha.) and limiting the amount of backpedalling the testee can do - or more relevantly, to ensure that the testee is willing to agree on camera that the protocol was fair, mutually agreed, and gave an adequate chance to test their claim.

I admit that my test will not demonstrate an effect of prayer if prayer only works for people you know personally, or people you are in contact with. I don't see why this is a problem.

ETA: P.S. Nowhere in the challenge does it say that I'm required to be a Believer (though that is indeed the usual case; as I've said, I am agnostic on this). It just requires that I be able to state clearly what a positive and negative result would be, and devise a protocol that ensures there will be no false positives that is mutually acceptable.

P.P.S. Your use of "confounders" to mean "things which could result in a false negative" is rather unusual; the usual use is to mean "things which could result in a false positive if not accounted for".

saizai
29th August 2006, 01:04 PM
Also, an enhancement from the believer's perspective (not relevant to the MDC, but relevant to a more general study):

I'll have the study be run asynchronously; that is, people can join at any time and immediately be active without having to wait for the required number N to be ready. Each round will conclude once N(R) Recipients have completed a year each. This makes the site more user-friendly and immediate-gratification-y (heh), and should not in any way detract from the validity of the protocol so long as N(R) is not determined on the basis of "live" results (which can be assured by simply making the assignments database inaccessible to me).

Whenever a Recipient is done with one year, no matter which their previous status was, they'll be shifted to "known recipient" mode - i.e. they will definitely be receiving prayer, told so, and put in more immediate contact with their Healer(s). We'll still gather info about their progress per usual. Recipients who are in their first year and in the active group would have a higher priority in the assignment algorithm though.

This should allow a secondary, correlative study about the effects of uncertain prayer (i.e. the 50-50 unknown of each Recipient's first year) vs certain prayer (i.e. thereafter). The effects of certain prayer of course can be dismissed as placebo effect, so are not intended to demonstrate any paranormal claim, but would nevertheless be of definite perceived (and actual) utility to the people involved.

(Frankly, this is a point that's always puzzled me about the practice of medicine: why not try to enhance the placebo effects as much as possible if your goal is to heal the patient? Obviously, one must control for them as a confounder [note that I'm using this to mean "thing which could result in a false positive"] when doing research for the effectiveness of treatments, but that doesn't mean that the placebo is ineffective; quite the opposite has been proven true repeatedly.)

I can't think of any way to implement this as a causal-effects study; to do so you'd need to cause some of the recipients to be shifted and some not, or assign them differently at the outset, which would probably not be very user-friendly / good PR (which is a consideration in a study run via website).

Cuddles
29th August 2006, 01:38 PM
Cuddles - As I've said, the number of participants is purely limited to what can be done practically. I have no other reason to put a limit on it.

Also I should point out (again...) that excessively small values of N would result in it being impossible to get p<.05, so that point is somewhat moot.

No it isn't. Unless you can demonstrate that the prayer and control groups have similar composition, especially in regards to serious vs. mild disease, no-one will accept it as valid.

What sort of numbers do you expect? You may not be limiting the participants, but do you think you will have 1000 people, 10 people or somewhere in between?

P.P.S. Your use of "confounders" to mean "things which could result in a false negative" is rather unusual; the usual use is to mean "things which could result in a false positive if not accounted for".

"Confounder" always means anything that will cause the test to give a false reult, positive or negative. False positives can be dangerous if they suggest that a treatment is effective when it really isn't, so they tend to have more attention focussed on them. In this case a lot of attention is paid to false negatives because believers often make excuses for a study not finding anything, so much care must be taken to show that there is nothing that could provide false results in either direction.

saizai
29th August 2006, 02:02 PM
No it isn't. Unless you can demonstrate that the prayer and control groups have similar composition, especially in regards to serious vs. mild disease, no-one will accept it as valid.

FWIW I have said that I will be tracking this data upfront... so it should be quite easy to demonstrate.

What sort of numbers do you expect? You may not be limiting the participants, but do you think you will have 1000 people, 10 people or somewhere in between?

Somewhere in between. It'll depend in large part on how much traffic / word of mouth is generated, and how many participants can be gotten through more normal means (eg asking cancer doctors, church groups, cancer support groups, etc to disseminate info).

Of course, I'd like to have as many as possible.

"Confounder" always means anything that will cause the test to give a false reult, positive or negative. False positives can be dangerous if they suggest that a treatment is effective when it really isn't, so they tend to have more attention focussed on them. In this case a lot of attention is paid to false negatives because believers often make excuses for a study not finding anything, so much care must be taken to show that there is nothing that could provide false results in either direction.

I see. In that case, could you please respond to my request above?

"Could you please specifically list what you believe the "obvious flaws" to be, how to correct them, whether they would calse a false positive or negative, and how they would do so?"

Of course, to claim anything as a false negative, I expect you will be taking the stance of a Believer from some particular theological framework for the sake of argument; it would be helpful if you specified which one.

petre
29th August 2006, 02:32 PM
ETA: P.S. Nowhere in the challenge does it say that I'm required to be a Believer (though that is indeed the usual case; as I've said, I am agnostic on this). It just requires that I be able to state clearly what a positive and negative result would be, and devise a protocol that ensures there will be no false positives that is mutually acceptable.


While technically true, the problem is that even if you don't believe in it you have to find people that do. Namely, when (if) you are asked to provide 3 affidavits from people stating they've observed what you intend to demonstrate. A general affidavit would be fairly easy to get I'd imagine (i.e. "I believe the power of prayer works", etc) but to get someone to claim that they believe your protocol, specifically, will produce results may be more difficult.

saizai
29th August 2006, 02:43 PM
While technically true, the problem is that even if you don't believe in it you have to find people that do. Namely, when (if) you are asked to provide 3 affidavits from people stating they've observed what you intend to demonstrate. A general affidavit would be fairly easy to get I'd imagine (i.e. "I believe the power of prayer works", etc) but to get someone to claim that they believe your protocol, specifically, will produce results may be more difficult.

I don't think I can do so, because this study has never been performed before in the manner I want to do it. Plus, the affidavit you suggest wouldn't meet the standard anyway.

However, I don't think it comes under the clause for which that "3 affidavit" requirement was made (namely, really extreme claims of personal power), so hopefully it should not be relevant. Also, you're conflating two different things: affidavits confirming that it's something worth JREF's bother (i.e. to filter out excessively extreme claims, like creating lights around oneself spontaneously), and people to participate. The latter should not be difficult.

You still haven't answered my request though.

Startz
29th August 2006, 03:05 PM
I'm sitting here in a jury assembly room waiting to be called for a jury panel. I offer this as an excuse in case the following seems incoherent to readers.

There's a shaggy dog story that goes something like this. A magician (let's call him JR for short) claims that he can make an elephant materialize onstage out of thin air using an ordinary pack of playing cards.

Various scientifically-minded skeptics have a rousing argument about the statistics of such a demonstration. Questions are raised about when the deck is shuffled whether hearts will be randomly distributed or not. And whether the test results depend on if the jokers have been removed from the deck. Then there is the question about whether JR can be trusted to shuffle the deck randomly or whether an observer from the audience should cut the cards.

So while all skeptical eyes are focused on the card deck in JR's right hand and all minds are working on the statistics of card decks, JR holds out a handful of peanuts in his left and an elephant lumbers onto stage sight unseen.

The moral (besides that jury duty can be mind-numbing) is that perhaps the attention being paid to statistics is displacing attention that might be better spent looking for loopholes and tricks.

But since about all I have to contribute is on statistics, I re-ran the simulation using two different variances and reducing the sample size to 20. In 50,000 simulations there were 5.36 percent false positives, about as one would expect.

petre
29th August 2006, 03:47 PM
I don't think I can do so, because this study has never been performed before in the manner I want to do it. Plus, the affidavit you suggest wouldn't meet the standard anyway.

However, I don't think it comes under the clause for which that "3 affidavit" requirement was made (namely, really extreme claims of personal power), so hopefully it should not be relevant. Also, you're conflating two different things: affidavits confirming that it's something worth JREF's bother (i.e. to filter out excessively extreme claims, like creating lights around oneself spontaneously), and people to participate. The latter should not be difficult.

You still haven't answered my request though.

If you mean point out things that may cause a false positive, I've not claimed there exist such flaws. If you mean flaws in general, some have even been mentioned by yourself (potential for false negatives, sample size questions, population distribution)

I would not count on the affidavit requirement being waved, especially given the JREF's currently limited ability to process claims.

saizai
29th August 2006, 04:16 PM
Startz - Thanks, again, for the pragmatic perspective. :)

(Sorry on the civic duty thing being so mindnumbing though. Good thing you brought something to amuse yourself with.)

The moral (besides that jury duty can be mind-numbing) is that perhaps the attention being paid to statistics is displacing attention that might be better spent looking for loopholes and tricks.

Indeed. I do not want there to be any potential for someone to claim a loophole or trick, and have no intention of trying to use one. So far I think I've closed everything, and am waiting for someone to point out anything I've missed.

I think as we've pointed out (and your statsruns and story demonstrate), the statistics are sound.

One point I'd like suggestions on: I would like a protocol that both assures JREF the ability to verify the source of submissions to be not inappropriately influenced by me (e.g., that the mailed signed verifications are not forgeries), but also ensures participant privacy to the maximum degree possible (e.g., by completely separating their data from their contact info). Ideas?

petre - Your previous posts certainly seemed to be claiming that there were "obvious flaws" in the study that would make it unacceptable to a skeptic. But no matter.

Could you please respond to my request about how the potential for false *negatives* could be mitigated, and what exactly you see as the "obvious flaws" relating to that?

So far I have agreed that I am not testing prayer that is limited to friends or people in constant contact; I think this is a reasonable limit and necessary to a study that is practicable. Am I limiting it excessively in any other way?

petre
29th August 2006, 05:11 PM
petre - Your previous posts certainly seemed to be claiming that there were "obvious flaws" in the study that would make it unacceptable to a skeptic. But no matter.


The point of view I try to take when assessing protocols is "JREF application processor" rather than "skeptic", which is only slightly different. It's my hope that such analysis is of greater help to potential applicants. You may feel the points made by other posters and myself will not concern JREF, and if such is the case I wish you luck in your continued protocol negotiations.

On the matter of flaws, perhaps if I enumerate them from my previous post they will stand out more clearly:

1. Potential for false negatives. This is a point you, yourself, have made I believe. I suggested using previous studies as a guide to address this specific issue, but you've indicated that you haven't the resources for such an extensive study. Perhaps there is no answer to correct this flaw within your means.
2. Concerns about sample size. I do not feel you've sufficiently convinced many, or at least me, that you'll achieve a significant sample size, which I believe is the only specific criticism you've made of the existing studies. Perhaps you could set a benchmark by naming a specific study you feel was handled properly and was lacking only in sample size, then we'd have an idea how large a sample you'll need to find results that study missed due to small sample size.
3. Population distribution (self-selection and other bias, uneven post-randomization distributions, etc). While this may not fit the description of "obvious" for every viewer of this forum, it does seem clear that there are still some concerns about how this question. Again, modeling the adjustments made in previous studies could be of benefit there.

Certain skeptics would be more than happy to approve of your study I'm sure, since you'd either declare at the end of the study that no significant result was found (consistent with existing studies) or they simply could claim that your methodology was flawed and discount any positive result you might get. I'd prefer to do the legwork beforehand to make a positive result more meaningful, and I believe the JREF would as well.

Another matter that crossed my mind was the time frame. The JREF has not yet approved a protocol of such long duration (I seem to recall an application predicting something several years in the future, and I believe said applicant was advised to re-apply within one year prior to the event, as applications were only good for one year). While they may be willing to make an exception given the nature of this claim, I have doubts that they will accept a protocol that states "the test will continue until enough people have participated, even if it takes several decades".

digithead
29th August 2006, 05:14 PM
snip...

But since about all I have to contribute is on statistics, I re-ran the simulation using two different variances and reducing the sample size to 20. In 50,000 simulations there were 5.36 percent false positives, about as one would expect.

Try it another way, place 80% randomly in one group with N(8,2) and 20% with N(3,1) and the reverse for the other group. What result do you get now? What happens when you adjust for the population they came from? Do you see how having a higher proportion from one population will skew the result if you don't adjust for it?

As for N(x,y) above, N is the normal distribution, x is mu, and y is sigmasquared...

And jury duty sucks...

ETA: added the word randomly

Startz
29th August 2006, 05:24 PM
>snip

I think as we've pointed out (and your statsruns and story demonstrate), the statistics are sound.


Just so that I'm being clear, I haven't said that the statistics are sound. All I've said is that some of the questions raised don't seem to point to large problems.

In fact, I don't think you've been specific enough about what test is going to be used and how the data is going to be treated. Among other things, we don't know how a "score" is going to be computed for each person. Until that's nailed down, I don't think JREF should entertain a protocol.

I don't know much about conducting studies where a participant might try to influence the outcome. That's why we have magicians.

But here's one way to stuff the ballot box. Have one recipient show up with some unusual identifying characteristic. Have one healer in cahoots who knows whether that person is going to be prayed for and can signal the "experimenter." I think that's all it takes to guarantee a win, no matter how large the sample is.

Gr8wight
29th August 2006, 07:58 PM
Wrong. I am not testing the diff between (people who get no remote prayer) and (people who do).

I am testing the diff between (people who are normal, and may be getting prayer in their usual manner) and (people who ALSO get extra prayer through me).

You are making a believer's argument that the results would be a false negative... which given the forum means you're probably confused.

But you do not know that the people in the test group are actually getting more prayer than the people in the control group. That's the whole problem. You don't know, and you can't know. How can you pretend to claim that your results, positive or negative, will mean anything whatsoever?

I'm not sure exactly what you mean by saying I am making a "believer's argument." Is that supposed to be some kind of insult?

saizai
29th August 2006, 08:32 PM
The point of view I try to take when assessing protocols is "JREF application processor" rather than "skeptic", which is only slightly different. It's my hope that such analysis is of greater help to potential applicants. You may feel the points made by other posters and myself will not concern JREF, and if such is the case I wish you luck in your continued protocol negotiations.

Understood.

1. Potential for false negatives. This is a point you, yourself, have made I believe. I suggested using previous studies as a guide to address this specific issue, but you've indicated that you haven't the resources for such an extensive study. Perhaps there is no answer to correct this flaw within your means.

Do you see any false negative potential OTHER than what I said (familiarity/contact requirement)? E.g., insufficient information provided to Healer, other requirements, ...

2. Concerns about sample size. I do not feel you've sufficiently convinced many, or at least me, that you'll achieve a significant sample size, which I believe is the only specific criticism you've made of the existing studies. Perhaps you could set a benchmark by naming a specific study you feel was handled properly and was lacking only in sample size, then we'd have an idea how large a sample you'll need to find results that study missed due to small sample size.

I'll have to respond to that later as I don't have my reference on hand (it's buried in some boxes somewhere). IIRC their sample size was about 25.

However, I've seen very few studies of remote intercessionary prayer that I felt were methodologically sound to begin with, and none that were run the way I'd like to do it.

3. Population distribution (self-selection and other bias, uneven post-randomization distributions, etc). While this may not fit the description of "obvious" for every viewer of this forum, it does seem clear that there are still some concerns about how this question. Again, modeling the adjustments made in previous studies could be of benefit there.

Indeed. However, I think that Startz' applied modeling so far shows that this is not a significant (sic) problem. If someone can demonstrate that it is - ie it'll cause a false positive more than 5% of the time - then I'll certainly reconsider.

Certain skeptics would be more than happy to approve of your study I'm sure, since you'd either declare at the end of the study that no significant result was found (consistent with existing studies) or they simply could claim that your methodology was flawed and discount any positive result you might get. I'd prefer to do the legwork beforehand to make a positive result more meaningful, and I believe the JREF would as well.

Agreed.

Another matter that crossed my mind was the time frame. The JREF has not yet approved a protocol of such long duration (I seem to recall an application predicting something several years in the future, and I believe said applicant was advised to re-apply within one year prior to the event, as applications were only good for one year). While they may be willing to make an exception given the nature of this claim, I have doubts that they will accept a protocol that states "the test will continue until enough people have participated, even if it takes several decades".

Quote from my correspondance with Jeff:
The "general" setup is fine. We won't close the application as long as work is being done on it, and we haven't met an permanent impasse.

Jeff Wagg
JREF

On 8/24/06, Sai wrote:

Of course. Would you please verify before I go through the trouble of
doing so, though, that:
1. the general setup (i.e. a multi-group study involving many people
over a long period of time) is acceptable to JREF
2. JREF is willing to keep the file open without re-application for
the duration of the study so long as it is ongoing

Thanks,
- Sai


... so I think that should be okay within reasonable limits. I have explicitly said to Jeff that it will take more than a year per phase.


Just so that I'm being clear, I haven't said that the statistics are sound. All I've said is that some of the questions raised don't seem to point to large problems.

Point.

In fact, I don't think you've been specific enough about what test is going to be used and how the data is going to be treated. Among other things, we don't know how a "score" is going to be computed for each person. Until that's nailed down, I don't think JREF should entertain a protocol.

I've tried to be specific as to the paramaters of the score equation. I can't and won't say in advance what it will be, since it'll be based on the previous round's data (to better tune the equation).

However, I have yet to see anyone point out a real methodological flaw with *any* possible score equation I could come up with that doesn't access the assignments database.

I don't know much about conducting studies where a participant might try to influence the outcome. That's why we have magicians.

But here's one way to stuff the ballot box. Have one recipient show up with some unusual identifying characteristic. Have one healer in cahoots who knows whether that person is going to be prayed for and can signal the "experimenter." I think that's all it takes to guarantee a win, no matter how large the sample is.

I don't think I understand your example. Could you elaborate?

Also please note that there is a pretty small chance of any particular Healer knowing any particular Recipient, and that I will be requiring them to sign something saying they haven't communicated with any Recipient...

But you do not know that the people in the test group are actually getting more prayer than the people in the control group. That's the whole problem. You don't know, and you can't know. How can you pretend to claim that your results, positive or negative, will mean anything whatsoever?

1. I will be tracking how many people the Recipients believe are praying for them. This should be significantly equal between active and control groups.
2. They should also be significantly equal in *actual* measure, since the selection process (randomization) in no way has a potential to bias on that measure.
3. The active group is getting baseline + study; the control group is just getting baseline.
4. Ergo, the active group is getting more than the control group.

Also I should point out that AT MOST what you are claiming is that this would be a false negative, not a false positive (and even that is a statistically very unlikely claim).

I'm not sure exactly what you mean by saying I am making a "believer's argument." Is that supposed to be some kind of insult?

No, it's not. It simply means that you are taking the point of view of a believer, not a skeptic, and are arguing that my flaw as you perceive it could cause a false negative. You are NOT arguing that it could cause a false positive.

I simply want that to be very clear; I certainly don't mind that that is the case (as with petre above).

saizai
29th August 2006, 08:42 PM
BTW Startz: Thanks for the monte carlo simulations. They're quite helpful for cutting through arguments about stats theory.

My matlab skillz are pretty out of date (last time I used it was when I was taking BC Calc AP, my soph year of HS... 1997-98) so I'd probably end up coding it in Ruby instead if I tried to. :p

Gr8wight
29th August 2006, 09:10 PM
1. I will be tracking how many people the Recipients believe are praying for them. This should be significantly equal between active and control groups.
2. They should also be significantly equal in *actual* measure, since the selection process (randomization) in no way has a potential to bias on that measure.
3. The active group is getting baseline + study; the control group is just getting baseline.
4. Ergo, the active group is getting more than the control group.

Also I should point out that AT MOST what you are claiming is that this would be a false negative, not a false positive (and even that is a statistically very unlikely claim).

If randomisation is your only control, your sample size needs to be huge for it to be effective. A few hundred is probably not enough.



No, it's not. It simply means that you are taking the point of view of a believer, not a skeptic, and are arguing that my flaw as you perceive it could cause a false negative. You are NOT arguing that it could cause a false positive.

I simply want that to be very clear; I certainly don't mind that that is the case (as with petre above).

I am arguing that your results, regardless of whether they are positive or negative will be meaningless.

digithead
29th August 2006, 11:05 PM
Just so that I'm being clear, I haven't said that the statistics are sound. All I've said is that some of the questions raised don't seem to point to large problems.

In fact, I don't think you've been specific enough about what test is going to be used and how the data is going to be treated. Among other things, we don't know how a "score" is going to be computed for each person. Until that's nailed down, I don't think JREF should entertain a protocol.

I don't know much about conducting studies where a participant might try to influence the outcome. That's why we have magicians.

But here's one way to stuff the ballot box. Have one recipient show up with some unusual identifying characteristic. Have one healer in cahoots who knows whether that person is going to be prayed for and can signal the "experimenter." I think that's all it takes to guarantee a win, no matter how large the sample is.

Which is exactly what I've been trying to get across, we've got the hypothesis which is to test if intercessory prayer can improve outcomes in sick people but we need answers for the following:

1) What is the measure of the outcome?
2) What is the clinically significant difference of this measure?
3) What are the type of diseases or conditions that will be studied?
4) How will the confounders (disease severity, current treatment, gender, culture, religiosity, dropouts and losses to followup) be adjusted for in the analysis?

All of the important other stuff such as funding, recruitment randomization, staffing, data management, etc. are a moot point until these things are addressed. A CRO performing a clinical trial wouldn't start recruiting people until these things are answered...

And Startz, I thank you for your attempts at simulating but I didn't do a very good job at describing how to simulate a multivariate model with confounders nor do I have the expertise to do it very well anymore as I no longer do research in statistics. They're a very good start but they need some tweaking, you might do a lit search in Biometrics, JASA, or Biometrika to find out how if you're still interested...

But I know how to design a clinical trial as I still do this when I'm evaluating crime control policies and treatment programs and in my opinion, Saizai is nowhere near close to designing an acceptable clinical trial to test the effect of intercessory prayer on disease outcome...

Nor does he seem very interested in heeding some sound statistical and clinical advice that many of us have given him so I think we should wish him luck and just leave him to his study...

Skeptic Ginger
30th August 2006, 12:11 AM
Placebos generally aren't given for serious diseases. New treatments are compared with accepted treatment, but not with a placebo. Deliberately withholding treatment from a potentially fatal disease like cancer would be considered serious misconduct, and possibly murder, if there is any treatment available that gives a better survival rate than placebo.Maybe this has been addressed, I'm behind in this thread, but this is simply false. A double blind placebo controlled prospective study is the ideal clinical trial.

Placebos are given to test treatments in life threatening diseases all the time. What you are not thinking through here is the study drug or treatment may be harmful, may not help, or may work. That's what the study is for. So until the treatment is shown to be effective by the treated group doing better than the placebo group, neither group really is "better" to be in.

Often experimental treatments are done on terminally ill patients when all other treatments fail. Or new treatments may be added to current therapy again with placebo control. And often when a new drug is developed, the first people it is tested on may be healthy volunteers. That may be done to see what kind of tolerance people have for the drugs. Again, placebo controlled.

As far as the ethics go, you do not "lie" to the patient except in a few rare circumstances, more often in psychology research than in medical research. Sometimes you distract the patient by telling them you are looking for X while you are really looking for Y. But the usual way placebo controlled trials are done is the patient is told they have a 50:50 chance of being in either group. And until the results are analyzed neither the observer nor the patient know whether drug or placebo was given. That is what double blind means as most of you know.

Skeptic Ginger
30th August 2006, 12:26 AM
...

No ethics board would ever, under any circumstances whatsoever, agree to giving patients a placebo while telling them that it is actually an effective medicine.
Again, this is simply not true.

Placebo research (http://www.placebo.ucla.edu/) has been conducted at UCLA for example. Here are some (http://www.placebo.ucla.edu/publications/) of the reports.

This paper (http://medicine.plosjournals.org/perlserv?request=get-document&doi=10.1371/journal.pmed.0020262) discusses the ethics of deceiving research subjects and suggests the following:...participants can be informed prior to deciding whether to volunteer for a study that the experimental procedures will not be described accurately or that some features of these procedures will or may be misleading or deceptive [25,26]. This approach, which we call “authorized deception,” permits research participants to decide whether they wish to participate in research involving deception and, if so, to knowingly authorize its use. Authorized deception is compatible with the spirit of informed consent. It fosters respect for persons, despite the use of deception, by alerting prospective participants to the fact that some or all participants will be deliberately deceived about the purpose of the research or the nature of research procedures.

For example, investigators using the balanced placebo design to study expectancy and pharmacological effects of dexamfetamine described the informed consent disclosure as follows: “For ethical reasons it was stated in the consent form that ‘…some information and/or instructions given [to the participant] may be inaccurate’” [15]. This statement recognizes the ethical force of authorized deception, but does not seem to go far enough. As illustrated above, the balanced placebo design involves lying to participants in two arms of the study: some participants are told that they are being administered a particular drug when in fact they receive placebo, and others that they are being administered placebo when in fact they receive the drug. Consequently, it is at best an understatement to describe the disclosure in this experiment as possibly involving “inaccurate” information. It would be more accurate to inform the prospective participants that some research participants will be misled or deceived.

But then keep in mind the consensus is currently that the placebo effect has been overrated. (http://www.cochrane.org/reviews/en/ab003974.html) There is still a lot we don't know about the mind body connection. So the jury is by no means in on this matter.

Typically giving a placebo makes the control group similar in every way except the treatment. It also would be impossible to blind the observers and subjects if a placebo were not used. Just knowing you did or didn't get a treatment alters the outcome so it isn't merely the placebo effect of believing you got the treatment.

Skeptic Ginger
30th August 2006, 12:57 AM
skeptigirl: I fully cede and agree to the point of participants being a self-selected, non-random subset of the general population of cancer victims.

However, I need to ask you to explain how that could possibly create a false positive difference between the active and control groups, since both are drawn from the same (admittedly self-selected) pool and assigned randomly (from that pool).

I also entirely agree that the control group will likely be prayed for, and challenge you to the same question on this point as well.

Your doubt is not an argument. :)I've tried to find in these 4 pages how it is you can have a control group that is prayed for and a test group that is prayed for and expect to see the effect of prayer.

You agree that self selected participants are not random. It didn't seem you were disagreeing with my hypothesis that self selected participants would include more 'believers' than a random sample and that more 'believers' would have additional people praying for them.

Here are some things to consider. You have to be more specific about what you are actually measuring. In other words,Are you testing if 10 people praying will have a greater impact than one person praying? (quantity)Are you testing if praying has an effect? (quality)Are you testing if the particular people praying in your study have an effect? (specific quality)Are you testing if the particular way prayers are performed has an effect? (specific quality)

You need to clearly define what it is you are actually measuring. Just saying you are testing if prayer has an effect doesn't allow you to determine if your control group is a true control group. I can't see that it is as far as you have gotten here. You have failed to explain how your control group will essentially differ from the test group if everyone is prayed for in both groups.

saizai
30th August 2006, 01:47 AM
skeptigirl, gr8wight:

I've said repeatedly.... I am testing for the ADDITIVE effect of prayer.

I am not going to try to make sure that the control group is not prayed for at all (that would be impossible as well as unethical). What I can say with certainty is that they aren't getting any extra prayer from their participation, and with statistical certainty that they (as a group) are getting less than the active group.

That diff is what I am testing.

As a subset I will also be doing analysis of other variables I'm tracking - e.g. whether religion, directedness, frequency, etc *correlate* to effectiveness. Same thing for disease outcomes (eg perhaps pain is more affected than survival rate or $ spent in treatment). This part is not intended (at this point) to be a causal study, just a correlative analysis; I'll be using it to construct the score equation & additional prerequisites for participation in later rounds. Those will still be done in the same, causal double blind model; if the correlative data is accurate then one would expect a larger effect demonstrated as it's better tuned.

I can do a similar thing with "specific quality" by eg making a bell curve of Healer effectiveness (from the average Scores of their Recipients).


digithead - I think there's been no question that I have answered your points 1-3 explicitly. Reiterating them is somewhat silly.

Your point #4 is understood, but you haven't demonstrated that Startz' monte carlo simulation of your objection is inaccurate, and it shows that your objection is unfounded.

I heed advice, but not blindly - I get to be a skeptic too, y'know. You have to make your case, and so far you have not. You've just stated that you *think* these confounders would result in a false positive, but you haven't given any monte carlo sims or math to back it up or refute Startz' counter examples.

As for "funding, recruitment randomization, staffing, data management, etc", those haven't been even raised as issues (randomization ain't exactly difficult with a computer btw).

IMHO you're trying to make this look "very far away from settled" when in reality it's pretty much a solid protocol with very little tweaking required to finish and be completely sound.

Cuddles
30th August 2006, 04:54 AM
Maybe this has been addressed, I'm behind in this thread, but this is simply false. A double blind placebo controlled prospective study is the ideal clinical trial.

Placebos are given to test treatments in life threatening diseases all the time. What you are not thinking through here is the study drug or treatment may be harmful, may not help, or may work. That's what the study is for. So until the treatment is shown to be effective by the treated group doing better than the placebo group, neither group really is "better" to be in.

Often experimental treatments are done on terminally ill patients when all other treatments fail. Or new treatments may be added to current therapy again with placebo control. And often when a new drug is developed, the first people it is tested on may be healthy volunteers. That may be done to see what kind of tolerance people have for the drugs. Again, placebo controlled.

As far as the ethics go, you do not "lie" to the patient except in a few rare circumstances, more often in psychology research than in medical research. Sometimes you distract the patient by telling them you are looking for X while you are really looking for Y. But the usual way placebo controlled trials are done is the patient is told they have a 50:50 chance of being in either group. And until the results are analyzed neither the observer nor the patient know whether drug or placebo was given. That is what double blind means as most of you know.

I stand by my original statement. Obviously I agree that a double-blind, placebo controled trial is the ideal, but this does not always happen. Much of the debate over evidence-based medicine (within the medical community, not in the media) has come from the realisation that there are many treatments, especially in surgery, that have never been tested properly.

If a treatment is accepted for a life-threatening condition then any new conditions are usually only tested against the previous treatment, since if someone dies while on placebo the people conducting the trial will be guilty of witholding treatment. This is not the best situation for science, but sadly this is how it works in the courts. Signing a waiver that states you are aware you may only get a placebo does not affect this.

Surgery is even worse, since there are risks associated with any surgery. This makes it almost impossible to do placebo controls, although it has been done a few times for minor operations. Much surgery is accepted because if something is wrong, cutting it out seems to be an obvious way to solve it. Unfortunately this may not always be true, and recently some procedures have been brought in to question precisely because they have only been compared with other procedures and never with a placebo.

All the links you posted in response to Yahzi seem to refer to non-fatal diseases and as such your arguments are entirely true. This is not the case that I was arguing, where there are serious ethical problems involved with placebos when a lack of treatment can be fatal.

Cuddles
30th August 2006, 05:20 AM
I'll have to respond to that later as I don't have my reference on hand (it's buried in some boxes somewhere). IIRC their sample size was about 25.

However, I've seen very few studies of remote intercessionary prayer that I felt were methodologically sound to begin with, and none that were run the way I'd like to do it.

This is one of our major concerns. It doesn't matter how good your method is if you don't have enough people. 25 is not large enough unless you can absolutely guarantee that the two groups are of similar composition. You have said that you will be able to observe this, but you have not said how you will observe it, or what you will do about it if there is a problem.

Indeed. However, I think that Startz' applied modeling so far shows that this is not a significant (sic) problem. If someone can demonstrate that it is - ie it'll cause a false positive more than 5% of the time - then I'll certainly reconsider.

As digithead has said, Startz's model is not complete. It uses only one variable where many need to be considered, and was set up as a quick example that has not been shown to accurately model what you are proposing. It is not up to us to show that this model is wrong, it is up to you to show that there is nothing else that could affect it.

I've tried to be specific as to the paramaters of the score equation. I can't and won't say in advance what it will be, since it'll be based on the previous round's data (to better tune the equation).

However, I have yet to see anyone point out a real methodological flaw with *any* possible score equation I could come up with that doesn't access the assignments database.

You must explain exactly how you obtain the scores or the whole trial is meaningless. The method must be decided in advance and cannot be changed depending on the results, since this is exactly how bad conclusions can be made from otherwise good trials. The classic statistical error is to take a set of data and analyse it until you find a correlation with something, which is almost always possible. This may not be the case here, but you must show that it is not.

Also please note that there is a pretty small chance of any particular Healer knowing any particular Recipient, and that I will be requiring them to sign something saying they haven't communicated with any Recipient...

Not all people are above fraud. Signing a statement does not mean they mean it. How will you prove that they are telling the truth?


1. I will be tracking how many people the Recipients believe are praying for them. This should be significantly equal between active and control groups.
2. They should also be significantly equal in *actual* measure, since the selection process (randomization) in no way has a potential to bias on that measure.
3. The active group is getting baseline + study; the control group is just getting baseline.
4. Ergo, the active group is getting more than the control group.

As Gr8wight and I have said, this is simply not true. You need very large samples before you can rely on randomisation. If you expect a sample of around 25, as in the study you refered to, this is nowhere near large enough. Also, you must show that randomisation achieves this, whatever your sample size, not just assume it does.

Also I should point out that AT MOST what you are claiming is that this would be a false negative, not a false positive (and even that is a statistically very unlikely claim).

False negative or false positive, the important word is "false".[/QUOTE]

skeptigirl, gr8wight:

I've said repeatedly.... I am testing for the ADDITIVE effect of prayer.

I am not going to try to make sure that the control group is not prayed for at all (that would be impossible as well as unethical). What I can say with certainty is that they aren't getting any extra prayer from their participation, and with statistical certainty that they (as a group) are getting less than the active group.

That diff is what I am testing.[.quote]

The trouble is it is not statistical certainty. You need to demonstrate this, not assume it.

[quote]As a subset I will also be doing analysis of other variables I'm tracking - e.g. whether religion, directedness, frequency, etc *correlate* to effectiveness. Same thing for disease outcomes (eg perhaps pain is more affected than survival rate or $ spent in treatment). This part is not intended (at this point) to be a causal study, just a correlative analysis; I'll be using it to construct the score equation & additional prerequisites for participation in later rounds. Those will still be done in the same, causal double blind model; if the correlative data is accurate then one would expect a larger effect demonstrated as it's better tuned.

This is the worst kind of analysis possible. If you gather data from people and then try to find a correlation with anything, you will find one. This is why studies only ever focus on one cause and try to control for all others. Occasionally a strong trend may be noticed that is commented upon and recommended for further study, but a trial that is set up to examine one possible correlation cannot reliably comment on any other.

digithead - I think there's been no question that I have answered your points 1-3 explicitly. Reiterating them is somewhat silly.

Point 1 asked for your measure for the outcome, which you explicitly stated you would not provide and said you would change for different trials. This is not acceptable for a medical trial.

Point 3 asked what diseases you would look at. All you have said is "cancer". This refers to hundreds of different diseases, many of which progress very differently from one another.

Your point #4 is understood, but you haven't demonstrated that Startz' monte carlo simulation of your objection is inaccurate, and it shows that your objection is unfounded.

In the same post digithead said excatly what was wrong with this simulation. Put simply, it is too simple. In any case, it is not up to us to show how it is wrong, but up to you to show that it acurately represents your trial, which is unlikely to be the case.

I heed advice, but not blindly - I get to be a skeptic too, y'know. You have to make your case, and so far you have not. You've just stated that you *think* these confounders would result in a false positive, but you haven't given any monte carlo sims or math to back it up or refute Startz' counter examples.

No, you have to make your case. If someone thinks something could be a problem, it is up to the person running the trial to show that it is not.

IMHO you're trying to make this look "very far away from settled" when in reality it's pretty much a solid protocol with very little tweaking required to finish and be completely sound.

The points raised say that this protocol is far from solid. Even if we assume that your protocol is perfect, you have to show this, which you have hardly even tried to do. I very much doubt the JREF would accept this without raising exactly the same points that we have. At the least they will require you to show that these points are not valid.

Gr8wight
30th August 2006, 07:39 AM
skeptigirl, gr8wight:

I've said repeatedly.... I am testing for the ADDITIVE effect of prayer.

I am not going to try to make sure that the control group is not prayed for at all (that would be impossible as well as unethical). What I can say with certainty is that they aren't getting any extra prayer from their participation, and with statistical certainty that they (as a group) are getting less than the active group.

That diff is what I am testing.

And I will repeat: if randomisation is your only control, your proposed sample size is way too small. Unless you can bump your numbers up into four digits, I can't see how you can call your results significant.

digithead
30th August 2006, 09:29 AM
Thanks, Gr8wight, well summarized, I was getting tired of repeating myself...

But I will say it again, Saizai, you are willfully ignoring all of the issues that we have raised so far because you have apparently convinced yourself that your protocol is beyond reproach. Remember what we've told you because you will be shipwrecked by your own hubris...

In any event thanks for the material, I will be using it next time I teach research methods, I'm pretty sure any undergrad can figure out the same issues that some of us have raised against it...

It's time to abandon arguing about the protocol and try to discover why you feel it is so important to test the power of intercessory prayer on disease outcome. Especially when numerous published studies have shown its ineffectiveness. You claim you're agnostic and a skeptic but what is your motivation for doing this study? Because a true skeptic would take the already overwhelming evidence in failing to reject the null to conclude that intercessory prayer has no effect on disease outcome and move on to new matters. You've given us the how now give us the why...

jskowron
30th August 2006, 09:49 AM
Digithead et al.

Nice job trying to point saizai in the right direction. I would agree that the it's time to move to the "why" versus the "how". I may be beating a dead horse here (those live ones sure are hard to catch!), but I think the following quote is telling:

They get to see the code before the commencement of round two, and verify that it in no way breaches double blind, installs a back door, accesses a recipient's assignment(s) status, or accesses Healers' data. They do not get to reject a proposed code based on any other reason.

Am I the only one bothered by the that last sentence? It seems rather confrontational (and smacks of distrust) for what is being presented as a plea to "check my methodology." When doing research, one should accept the fact that there are many ways to be wrong- probably more than there are ways to be right. Setting the conditions for which you can be judged as being wrong, a priori, is just not ok. You need to be open to errors being discovered, and willing to correct those errors. Through that statement, and his approach to those trying to help, saizai has indicated that he is not open to corrections or suggestions.

petre
30th August 2006, 09:51 AM
Do you see any false negative potential OTHER than what I said (familiarity/contact requirement)? E.g., insufficient information provided to Healer, other requirements, ...


How about:
- prayer only works if a specific diety is addressed (FSM maybe?)
- prayer only works if no one tracks the results
- prayer only works if done by a priest
- prayer only works if you donate heavily to a church
- prayer only works if done on Sunday
- prayer only works if spoken in Latin
- prayer only works if done while standing on your head
- prayer only works if done by 100 or more people

This is where actually believing that it works a certain way comes in handy. You can then narrow down what you think actually matters and find out if you're right. With no belief, there are a great many number of factors you need to control for to make it a really worthwhile test. The "likely" result of "no effect" (given that the only proposed improvement to existing studies is sample size) will at least have some meaning if you had some belief, in that it will encourage you to re-examine it.


I'll have to respond to that later as I don't have my reference on hand (it's buried in some boxes somewhere). IIRC their sample size was about 25.

However, I've seen very few studies of remote intercessionary prayer that I felt were methodologically sound to begin with, and none that were run the way I'd like to do it.


So to avoid their perils, you intend to use a greater sample size than those you've seen that did appear methodologically sound (which seem to have used, at most, 25 people, so perhaps 100 is a good enough target for your first round?) and to address any methodological failings you identified in studies that did use sufficient sample size.

So I suppose if someone were to present an existing study that used 100 or more people, you would identify an error in its methodology that your study will avoid somehow, or you will agree that even the sample size of that study was insufficient, and increase your definition of "too small of a sample size" to include the new study?

Skeptic Ginger
30th August 2006, 01:37 PM
skeptigirl, gr8wight:

I've said repeatedly.... I am testing for the ADDITIVE effect of prayer.

I am not going to try to make sure that the control group is not prayed for at all (that would be impossible as well as unethical). What I can say with certainty is that they aren't getting any extra prayer from their participation, and with statistical certainty that they (as a group) are getting less than the active group.

That diff is what I am testing.....Perhaps I can add to the point others are making about a larger sample size.

First, if you are going to test the effect of additional prayer, as opposed to 'any' prayer you will need to quantify baseline prayer amounts and define what you mean by additional. You have to address exactly what you are testing in terms of quantity over baseline prayer rates. "Additional prayer" is too vague in this case.

Then you will either need a very large group so that you can assure an even distribution of baseline prayer volumes for the individuals in the study, or you will have to collect data on the baseline prayer rates of those individuals and include that in your analysis. This is similar to the discussion about severity of disease in both the control group and the study group.

Say you want to test a group of people who are self selected so they are likely to be being prayed for. Those people might have any volume of prayers being said for them perhaps depending upon the size of the congregation they attend. At some cut off point you may be assured an equal distribution randomizing the group into two. Depending on the range of prayer volume within the group, that cut off point may be higher or lower.

Say the baseline prayer rate is 1 prayer or no prayer. If you divide a group of 2 subjects they will be 100% different. If you pray for the guy with no prayer, both the test group and the control group will have one prayer each and will not differ.

Say the number of baseline prayers in a group of 20 subjects is between 0 and 1,000. Ten of your group of 20 attend the same church and the congregation of 1,000 regularly pray for all members of the church with cancer, (they told each other about the study). Toss a coin in the air 20 times. How evenly was your division of heads and tails? Do that several times to get a sense of how a random division of a very diverse group of 20 ends up being distributed in each group.

The more diverse the group, whether it be disease severity, disease type which we haven't even gotten to here, or baseline prayer rate, the larger of a sample you need to see an even distribution in your test group and your control group.

The only other way to address dissimilarities in both groups is to measure everything and show the groups are evenly distributed. Severity of disease, age, gender, economic status, type of cancer, quality of medical care, baseline prayer rate, religious conviction and probably more things I am not thinking of have to be accounted for as equal in both groups or your results might be reflecting the difference in the group rather than the difference in prayer volume.

The way researchers generally address small sample size in a test study before undertaking a larger more expensive study is to select a very homogeneous group for the initial smaller study. You won't have this with you current method of recruiting subjects.

Startz
30th August 2006, 02:15 PM
>snip

I don't think I understand your example. Could you elaborate?


How about something like this? An unscrupulous challenger signs up a few confederates with a unusual first names or recognizable pictures. Then the challenger also signs up some friends to do the praying. (They get to see pictures.) All the praying people need to do is identify 1 subject to the challenger. Then it's easy to arrange a statistically significant result no matter how large the sample is. In other words, a minute amount of cheating is all that's needed to rig the results.

Skeptic Ginger
30th August 2006, 02:16 PM
I stand by my original statement.

[Original statement: "Placebos generally aren't given for serious diseases. New treatments are compared with accepted treatment, but not with a placebo. Deliberately withholding treatment from a potentially fatal disease like cancer would be considered serious misconduct, and possibly murder, if there is any treatment available that gives a better survival rate than placebo."]

Obviously I agree that a double-blind, placebo controled trial is the ideal, but this does not always happen. Much of the debate over evidence-based medicine (within the medical community, not in the media) has come from the realisation that there are many treatments, especially in surgery, that have never been tested properly.

If a treatment is accepted for a life-threatening condition then any new conditions are usually "only tested against the previous treatment", since if someone dies while on placebo the people conducting the trial will be guilty of witholding treatment. This is not the best situation for science, but sadly this is how it works in the courts. Signing a waiver that states you are aware you may only get a placebo does not affect this.

Surgery is even worse, since there are risks associated with any surgery. This makes it almost impossible to do placebo controls, although it has been done a few times for minor operations. Much surgery is accepted because if something is wrong, cutting it out seems to be an obvious way to solve it. Unfortunately this may not always be true, and recently some procedures have been brought in to question precisely because they have only been compared with other procedures and never with a placebo.

All the links you posted in response to Yahzi seem to refer to non-fatal diseases and as such your arguments are entirely true. This is not the case that I was arguing, where there are serious ethical problems involved with placebos when a lack of treatment can be fatal.You have addressed 4 separate issues.

First, the issue I addressed (above in bold) is patently false. Is that the statement you are standing by?

The second issue is whether a known treatment should be withheld in order to study a potentially better treatment. That is done, though rarely and under specific circumstances. Typically it is when the known treatment isn't tolerated by the patient or the known treatment offers a very poor prognosis and the new treatment has very high promise. Patients with terminal diseases and very poor prognoses sometimes volunteer to forgo standard treatments in order to further research even if they may die but future victims might be helped.

And many times there is no treatment yet established or the experimental treatment addresses a different aspect of the disease than the standard treatment addressed. For example a number of women took Tamoxifen after their breast cancer was treated to see if it lowered the rate of recurrence.

The third issue you have raised is how do you control for placebo effect in surgical procedures. That's a whole different ball of wax. Recently a study was done involving sham surgery that actually involved an incision in the control group. It was extremely controversial but the results indicated the surgical procedure did indeed have a placebo benefit and not a physical benefit.

The forth issue is probably more one of communication than substance. You have the right idea. Known treatments which could be used are generally not withheld in order to test new treatments. But when you state, "only tested against the previous treatment" that is technically incorrect. Certainly we want to know if a new treatment should be given in conjunction with and old treatment or instead of. That would be dealt with in the nature of the treatment and the study design. But the actual study would still use a placebo control if possible.

Your statement, "if someone dies while on placebo the people conducting the trial will be guilty of witholding treatment", implies giving a placebo means you must also withdraw treatment which is of course, silly. In addition to that, it wouldn't be the placebo that was at issue, it would be the failure to treat. The person could die on either the experimental treatment or the placebo if the known treatment was withheld.

What you are suggesting, and I don't think you mean, is that a study of a new treatment would involve withholding the known treatment from a group that was receiving the experimental treatment and comparing them to the group receiving the known treatment. Until there is evidence the new treatment is effective, it isn't going to be compared to a known treatment by giving one group the old and one group the new treatment. Once it is determined that both treatments work, then you might see a study comparing the two treatments to each other.

saizai
8th September 2006, 03:06 PM
I haven't the time to respond to replies since my last comment at this point, but:
1. http://www.prayermatch.org is now up finally, though running slowly because fcgi still isn't working (grr)
2. I'll probably use a supplemented SF36 for monthly HRQOL surveys
3. I'll probably use a TNM / Ann Arbor staging for quarterly doctor surveys (plus a few other questions)

The site is almost done; what's left to do are the monthly/quarterly reports, matching algorithm, and doctor logins. That should be relatively easy and not take too much more time.

Gr8wight
8th September 2006, 10:51 PM
I haven't the time to respond to replies since my last comment at this point, but:
1. http://www.prayermatch.org is now up finally, though running slowly because fcgi still isn't working (grr)
2. I'll probably use a supplemented SF36 for monthly HRQOL surveys
3. I'll probably use a TNM / Ann Arbor staging for quarterly doctor surveys (plus a few other questions)

The site is almost done; what's left to do are the monthly/quarterly reports, matching algorithm, and doctor logins. That should be relatively easy and not take too much more time.

Let us know when you have your positive result to report, so we can insult you over your refusal to release your raw data to be examined.

Oh, sorry, that was someone else.

saizai
10th September 2006, 11:05 AM
Gr8wight - That was just rude.

Gr8wight
10th September 2006, 02:06 PM
Gr8wight - That was just rude.

What, comparing you to Gary Schwartz was rude? Does Gary know you feel that way about him?

Prove me wrong.

Yes, I'll wait.

saizai
10th September 2006, 07:14 PM
Gr8wight - You didn't mention a name. You implied that I wouldn't release the data, and that therefore you would insult me. That's a coward's insult.

Given that you're evidently not willing to carry on a polite conversation, I won't be responding you from now on. G'day.

Gr8wight
10th September 2006, 11:47 PM
Gr8wight - You didn't mention a name. You implied that I wouldn't release the data, and that therefore you would insult me. That's a coward's insult.

Given that you're evidently not willing to carry on a polite conversation, I won't be responding you from now on. G'day.

That's OK. I'll still wait.

saizai
26th September 2006, 11:21 PM
This is one of our major concerns. It doesn't matter how good your method is if you don't have enough people. 25 is not large enough unless you can absolutely guarantee that the two groups are of similar composition. You have said that you will be able to observe this, but you have not said how you will observe it, or what you will do about it if there is a problem.

1. I'm having >25.
2. I observe by the very simple expedient of asking.

No study has control and active groups that are exactly the same. They just draw from the same pool and make sure the study itself doesn't create differences. It's statistically unlikely (by definition, in fact) for a difference to exist - there'll be one p*100% of the time, for whatever p level you choose.

As digithead has said, Startz's model is not complete. It uses only one variable where many need to be considered, and was set up as a quick example that has not been shown to accurately model what you are proposing. It is not up to us to show that this model is wrong, it is up to you to show that there is nothing else that could affect it.

Proving a negative is impossible. You claim there is a real problem, therefore you have the much simpler task of proving a positive.

Just handwaving a claim that it's insufficiently controlled isn't good enough, especially when the monte carlo sim shows otherwise.

You must explain exactly how you obtain the scores or the whole trial is meaningless. The method must be decided in advance and cannot be changed depending on the results, since this is exactly how bad conclusions can be made from otherwise good trials. The classic statistical error is to take a set of data and analyse it until you find a correlation with something, which is almost always possible. This may not be the case here, but you must show that it is not.

Quite so. But did you completely ignore the places where I said that I would be determining the score equation in advance of obtaining data?

Not all people are above fraud. Signing a statement does not mean they mean it. How will you prove that they are telling the truth?

They don't know what group they are in, therefore they have no way to lie in a way that would influence the results.


As Gr8wight and I have said, this is simply not true. You need very large samples before you can rely on randomisation. If you expect a sample of around 25, as in the study you refered to, this is nowhere near large enough. Also, you must show that randomisation achieves this, whatever your sample size, not just assume it does.

Not so, sorry. I've seen plenty of robust studies with N ~= 25 that still manage p<.05 or <.01 or <.001 even. Depends on the distribution of the measure in the pool. In any case it's self-correcting: you can't obtain a p<.05 with a too-small pool. Simple enough.

False negative or false positive, the important word is "false".

Only if you're concerned with defending a believer's perspective. If you're just concerned with protecting the challenge against fraud, then false positives are it.

This is the worst kind of analysis possible. If you gather data from people and then try to find a correlation with anything, you will find one. This is why studies only ever focus on one cause and try to control for all others. Occasionally a strong trend may be noticed that is commented upon and recommended for further study, but a trial that is set up to examine one possible correlation cannot reliably comment on any other.

Again you don't seem to have read what I wrote very carefully.

Correlational aspects are only going to be used to inform the design parameters of the next round(s). That is consistent with standard scientific method. What is being tested is the causal.

Point 1 asked for your measure for the outcome, which you explicitly stated you would not provide and said you would change for different trials. This is not acceptable for a medical trial.

I said I would provide it before each trial in question. This is perfectly acceptable.


In any event thanks for the material, I will be using it next time I teach research methods, I'm pretty sure any undergrad can figure out the same issues that some of us have raised against it...

Glad you enjoy reading.

However, I should point out that everything I have written is my copyright and I explicitly do not grant you any rights to use it in any manner whatsoever.

It's time to abandon arguing about the protocol and try to discover why you feel it is so important to test the power of intercessory prayer on disease outcome. Especially when numerous published studies have shown its ineffectiveness. You claim you're agnostic and a skeptic but what is your motivation for doing this study? Because a true skeptic would take the already overwhelming evidence in failing to reject the null to conclude that intercessory prayer has no effect on disease outcome and move on to new matters. You've given us the how now give us the why...

I'm not interested in discussing my motivation beyond what I have already stated: curiosity as a true (weak) agnostic. I decline to get dragged into an argument about theology, philosophy, and the like.

Am I the only one bothered by the that last sentence? It seems rather confrontational (and smacks of distrust) for what is being presented as a plea to "check my methodology." When doing research, one should accept the fact that there are many ways to be wrong- probably more than there are ways to be right. Setting the conditions for which you can be judged as being wrong, a priori, is just not ok. You need to be open to errors being discovered, and willing to correct those errors. Through that statement, and his approach to those trying to help, saizai has indicated that he is not open to corrections or suggestions.

I'm quite open to correcting real errors.

I'm not open to "correcting" things that aren't really errors, or that are merely whims. I am only interested in your input insofar as it ensures that my methodology is tight. No score equation I can possibly choose, within the parameters I gave, would be a methodological flaw - and therefore I make it explicit that I can choose whatever I want.

This is for the simple reason of ensuring that everything in the application is totally explicit so there is no arguing later about what the terms are.

How about:
- prayer only works if a specific diety is addressed (FSM maybe?)
- prayer only works if done by a priest
- prayer only works if done on Sunday
- prayer only works if done by 100 or more people


Tracked correlatively. If it's the case then this info will be used to filter later rounds' participants.

- prayer only works if no one tracks the results

Inherent flaw in the design and indeed in all possible designs I can think of. Acceptable.

- prayer only works if you donate heavily to a church
- prayer only works if spoken in Latin
- prayer only works if done while standing on your head

Not tracked. If so, oh well, my miss.

This is where actually believing that it works a certain way comes in handy. You can then narrow down what you think actually matters and find out if you're right. With no belief, there are a great many number of factors you need to control for to make it a really worthwhile test. The "likely" result of "no effect" (given that the only proposed improvement to existing studies is sample size) will at least have some meaning if you had some belief, in that it will encourage you to re-examine it.

You cannot, in principle, explicitly control for all possible factors. It's simply impossible by definition. That's what randomization is for.


So I suppose if someone were to present an existing study that used 100 or more people, you would identify an error in its methodology that your study will avoid somehow, or you will agree that even the sample size of that study was insufficient, and increase your definition of "too small of a sample size" to include the new study?

I haven't seen this hypothetical study, therefore I cannot comment.



One further improvement I have thought of:

I'll set an arbitrary score equation for the first round - essentially a random guess. This allows JREF to participate in the first round as well as the second and third.

If the first round is positive, then we go directly to the third as the 'final test'; if not, then we go to the second round as the new 'preliminary test' with a score equation based on the actual data gathered in the first round.

Cuddles
27th September 2006, 10:19 AM
1. I'm having >25.
2. I observe by the very simple expedient of asking.

No study has control and active groups that are exactly the same. They just draw from the same pool and make sure the study itself doesn't create differences. It's statistically unlikely (by definition, in fact) for a difference to exist - there'll be one p*100% of the time, for whatever p level you choose.

If you ask them, then how is this a blinded trial?

No-one ever said the groups have to be identical, but they must be similar, and must be provably so. 25 is simply not big enough, to rely on randomisation alone you need hundreds at the least, preferably thousands. A related issue is that, since there are so many types of cancer, no two people in the study will be similar. This would mean that no matter how randomly they are grouped, you will never get a meaningful result.

Proving a negative is impossible. You claim there is a real problem, therefore you have the much simpler task of proving a positive.

Just handwaving a claim that it's insufficiently controlled isn't good enough, especially when the monte carlo sim shows otherwise.

Where is the handwaving? Your sample is simply not large enough. The one simulation done was very basic and did not take into account any of the variables we have said could affect the results, such as age, social status, etc. Since you have not defined your sample group any more specifically than "anyone with cancer", we cannot give any more specific problems, we can just point out the areas you have not covered that could, and in many cases are likely to, affect the result.


Quite so. But did you completely ignore the places where I said that I would be determining the score equation in advance of obtaining data?

You said "Extant data will be analyzed after the first round and used to create a Score Equation.". Is this some new meaning of the words "in advance" that I wasn't previously aware of?

They don't know what group they are in, therefore they have no way to lie in a way that would influence the results.

So why is a statement even needed? If you are not concered about the possibility of cheating, why would you require them to sign something saying they are not? And if you are concerned, how can you prove that doing this would actually prevent them lying?

Not so, sorry. I've seen plenty of robust studies with N ~= 25 that still manage p<.05 or <.01 or <.001 even. Depends on the distribution of the measure in the pool. In any case it's self-correcting: you can't obtain a p<.05 with a too-small pool. Simple enough.

There may be studies that can achieve good preliminary results with small samples, but this is not one of them. You are looking for a (presumably) small effect with numerous possible confounders, most of which you haven't even considered.

Also, this is nothing to do with randomisation. If you take some ill people and some healthy people and asign all the healthy ones to a treatment group and all the ill ones to a control you will get an extremely significant result saying the treatment worked because the statistical analysis assume that the original groups were similar.

Only if you're concerned with defending a believer's perspective. If you're just concerned with protecting the challenge against fraud, then false positives are it.

Not true. With virtually every test of the paranormal, a negative result causes the claimants to say that the conditions were wrong or the test was unfair. In order to maintain any credibilty the JREF must ensure that any test will actually detect what it is looking for.

Again you don't seem to have read what I wrote very carefully.

Correlational aspects are only going to be used to inform the design parameters of the next round(s). That is consistent with standard scientific method. What is being tested is the causal.

Not really. As I said, it is well known that if you have a dataset with many variables, you are almost guaranteed to find correlations between some of them. If you use these to alter subsequent tests you are doing bad science, plain and simple. They could be used as the basis for an entriely different set of tests, but this does not seem to be what you are planning.

Also, you said "No difference in the Score Equation, participation criteria, or significance test will be permitted between second and third rounds.". The posts to which I was repying did not say that you would only be altering the equation after the first round, so I assumed you meant you would alter it after every test. I apologise if this is not the case.


I said I would provide it before each trial in question. This is perfectly acceptable.

I thought you had said that it would not be available, but it appears I misread one of your posts, so I apologise. I am still concerned that you will provide it, but will not allow any discussion of whether it is acceptable to anyone else.


Glad you enjoy reading.

However, I should point out that everything I have written is my copyright and I explicitly do not grant you any rights to use it in any manner whatsoever.

Is content posted on a public forum covered by copyright? And if so, is it copyright to the poster or the forum's owner?

One further improvement I have thought of:

I'll set an arbitrary score equation for the first round - essentially a random guess. This allows JREF to participate in the first round as well as the second and third.

Why does it have to be a guess? If you want to measure if people get better from prayer then your equation should measure this. It may not be perfect, and I would consider consulting qualified people who run similar tests, but there should be nothing random about it at all.

If the first round is positive, then we go directly to the third as the 'final test'; if not, then we go to the second round as the new 'preliminary test' with a score equation based on the actual data gathered in the first round.

This is just silly. Either a test is the JREF preliminary or not. If it is not, then it cannot be counted as one retroactively, if it is, then a negative result will count as a failiure. If all applicants were allowed to cherry pick their positive results the prize would have been won long ago.

digithead
27th September 2006, 11:15 AM
Glad you enjoy reading.

However, I should point out that everything I have written is my copyright and I explicitly do not grant you any rights to use it in any manner whatsoever.


You appear to know as much about copyright law as you do about designing clinical trials...

Exactly how is my educational use of your study a violation of copyright? You've placed your ideas in the public marketplace. The title of the thread you started is "Check my methodology - prayer study". Seems to me that you've put your study out there for others to discuss, debate, and critique...

Sorry, but everything you've discussed in this forum is now out there for fair use, regardless of your copyright or frustration of how others might use your study...

saizai
27th September 2006, 04:45 PM
I disagree. Fair use is only for limited extracts of a work for reasonable purposes (e.g. education). It just so happens that I have taught a college course myself, and for it had reason to obtain copyright permission - and have previously researched it in some depth. You can't quote an entire work without permission. You can refer people to a public discussion, but the fact that I have posted it publically doesn't mean that you're granted any rights to it other than to read it where I posted it.

Copyright law, btw, is not primarily about "ideas" but about content, i.e. my words as written. Ideas are what patents are for.

Note that fair use has four required tests, one of which is: "amount and substantiality of the portion used in relation to the copyrighted work as a whole". (http://www.copyright.gov/title17/92chap1.html#107) E.g. you can't (without permission) copy a short story that I write and publish online in its entirety, even for a classroom discussion.

I suggest you read the law more before accusing me of not knowing it. :D


P.S. You evidently haven't read the registration agreement for the JREF forum, which states in part:
Copyright

Any post or article published on the JREF forum by a Member is the copyright of the Member and may not be reproduced, copied or otherwise re-published without the express permission of the Member. By posting on the Forum a Member grants the JREF a non-exclusive licence to publish, republish or reproduce their work, in its entirety or as the JREF sees fit, in perpetuity. The James Randi Educational Foundation is the copyright holder of the JREF Forum.


That means that in addition to violating my copyright you would be violating the registration agreement...

saizai
27th September 2006, 05:03 PM
If you ask them, then how is this a blinded trial?

What does asking them questions have to do with the blinding? You seem to be confused as to what 'blinding' means...

No-one ever said the groups have to be identical, but they must be similar, and must be provably so. 25 is simply not big enough, to rely on randomisation alone you need hundreds at the least, preferably thousands. A related issue is that, since there are so many types of cancer, no two people in the study will be similar. This would mean that no matter how randomly they are grouped, you will never get a meaningful result.

Wrong, sorry. They are statistically identical by definition. Perhaps you haven't read as many actual research studies as I (the ones I've seen are mainly in the areas of cognitive science and neurology, fwiw) but there are studies conducted and published routinely with n<50 which nevertheless produce perfectly sound results.

Where is the handwaving? Your sample is simply not large enough. The one simulation done was very basic and did not take into account any of the variables we have said could affect the results, such as age, social status, etc. Since you have not defined your sample group any more specifically than "anyone with cancer", we cannot give any more specific problems, we can just point out the areas you have not covered that could, and in many cases are likely to, affect the result.

I disagree. Please provide a monte carlo sim that does take into account your supposed other factors and explain how they would affect the result even after randomized double-blinding. (Note the and; I want actual numbers rather than just your handwaving about "this will affect it" without anything to back that statement up.)


You said "Extant data will be analyzed after the first round and used to create a Score Equation.". Is this some new meaning of the words "in advance" that I wasn't previously aware of?

You haven't read the rest of it evidently.

The score equation created is used in the next round. Which has new data collected. Thus for that round the SE was made in advance.


So why is a statement even needed? If you are not concered about the possibility of cheating, why would you require them to sign something saying they are not? And if you are concerned, how can you prove that doing this would actually prevent them lying?

It's simply one more measure, to address this concern even though as I explained it's not really valid.

Also, this is nothing to do with randomisation. If you take some ill people and some healthy people and asign all the healthy ones to a treatment group and all the ill ones to a control you will get an extremely significant result saying the treatment worked because the statistical analysis assume that the original groups were similar.

What part of "randomized double blind" are you not getting? I'm not "assigning" people to treatment vs control on the basis of their condition.


Not true. With virtually every test of the paranormal, a negative result causes the claimants to say that the conditions were wrong or the test was unfair. In order to maintain any credibilty the JREF must ensure that any test will actually detect what it is looking for.

That's fine. I assert that the test I have proposed is sufficient for me. The end. ;)


Not really. As I said, it is well known that if you have a dataset with many variables, you are almost guaranteed to find correlations between some of them. If you use these to alter subsequent tests you are doing bad science, plain and simple. They could be used as the basis for an entriely different set of tests, but this does not seem to be what you are planning.

Agreed to the first sentence only.

They can be used to inform later tests - where the same standard of "decide in advance, randomized double blind" is used. And it's the normal practice in fact, whether explicitly labeled as such or not.

Also, you said "No difference in the Score Equation, participation criteria, or significance test will be permitted between second and third rounds.". The posts to which I was repying did not say that you would only be altering the equation after the first round, so I assumed you meant you would alter it after every test. I apologise if this is not the case.

I changed my mind on the first two (SE & PC): so long as they are fixed before the start of a given round, there is no reason to make them identical between rounds. Whereas there is reason to change them, to make the test more sensitive and based on actual data gathered last time so that you're not including a bunch of extraneous variables that turn out not to be affected.


I thought you had said that it would not be available, but it appears I misread one of your posts, so I apologise. I am still concerned that you will provide it, but will not allow any discussion of whether it is acceptable to anyone else.

I'm not interested in allowing discussion except for the purposes of avoiding a false positive. IMO there is no possible way for this to happen given the constraints I already have placed. Can you think of one?



Is content posted on a public forum covered by copyright?

Absolutely yes. See above post.

And if so, is it copyright to the poster or the forum's owner?

The copyright is always to the originator of the work unless they have signed a contract that says otherwise (i.e. the registration agreement, quoted above). Per the Berne convention (IIRC) there is no need to explicitly say this; it's automatic.


Why does it have to be a guess? If you want to measure if people get better from prayer then your equation should measure this. It may not be perfect, and I would consider consulting qualified people who run similar tests, but there should be nothing random about it at all.

Because I don't know in advance how they will get better. Maybe just HRQOL? Maybe a submeasure? Maybe their cancer staging? $ spent in treatment? There are a lot of variables, and fixing which ones you're testing for in advance without data (i.e. for round 1) is entirely arbitrary.



This is just silly. Either a test is the JREF preliminary or not. If it is not, then it cannot be counted as one retroactively, if it is, then a negative result will count as a failiure. If all applicants were allowed to cherry pick their positive results the prize would have been won long ago.

It's not "retroactive"; again you seem to have significantly misunderstood what I wrote.

Round 1 positive, preliminary -> Round 3 final (skip round 2)
Round 1 negative, prelim -> Round 2 ?, prelim

How is this "retroactive"? (Suggestion: look up the definition before you answer.)

digithead
27th September 2006, 05:33 PM
I disagree. Fair use is only for limited extracts of a work for reasonable purposes (e.g. education). It just so happens that I have taught a college course myself, and for it had reason to obtain copyright permission - and have previously researched it in some depth. You can't quote an entire work without permission. You can refer people to a public discussion, but the fact that I have posted it publically doesn't mean that you're granted any rights to it other than to read it where I posted it.

Copyright law, btw, is not primarily about "ideas" but about content, i.e. my words as written. Ideas are what patents are for.

Note that fair use has four required tests, one of which is: "amount and substantiality of the portion used in relation to the copyrighted work as a whole". (http://www.copyright.gov/title17/92chap1.html#107) E.g. you can't (without permission) copy a short story that I write and publish online in its entirety, even for a classroom discussion.

I suggest you read the law more before accusing me of not knowing it. :D


P.S. You evidently haven't read the registration agreement for the JREF forum, which states in part:
Copyright

Any post or article published on the JREF forum by a Member is the copyright of the Member and may not be reproduced, copied or otherwise re-published without the express permission of the Member. By posting on the Forum a Member grants the JREF a non-exclusive licence to publish, republish or reproduce their work, in its entirety or as the JREF sees fit, in perpetuity. The James Randi Educational Foundation is the copyright holder of the JREF Forum.


That means that in addition to violating my copyright you would be violating the registration agreement...

You're also making assumptions that I will be copying your exact words, postings, etc. which is not necessary at all to describe your study...

In fact, I can sum up exactly what I'd ask my class on a test:

"Someone has proposed a study to determine if intercessory prayer creates better outcomes for sick people. They want to randomize about 50 participants into a control (no prayer) and treatment group (prayer) and then measure if a person got better or not. They argue that their simple randomization scheme will be sufficient to overcome possible confounders such as demographics, type of disease, disease severity, etc. Do you agree? Explain your position."

I know I haven't violated my membership agreement. Have I violated your copyright? Not in the least...

And anyone that wants to reproduce my test question, feel free...

So how can you prevent me from using your study as an example?

digithead
27th September 2006, 05:50 PM
It just so happens that I have taught a college course myself

I sure hope it had nothing to do with research methods or statistics, because you obviously don't understand confounding or heterogeneity that your study fails to account for...

ChaosEngineer
27th September 2006, 07:04 PM
I think there's a fundamental problem with the nature of the prayers.

Normally, prayers for healing don't have any malicious intent. They're just: "Please let Grandma get better."

But in the case of a prayer study, there's implied malice: "Please heal the strangers in the experimental group while withholding healing from the strangers in the control group." No matter how positive the person praying tries to be, there's always going to be something in the back of their mind that's hoping for negative results in the control group. No good deity would answer such a wicked prayer.

The solution is to not use sick people as test subjects. Use something like coin flips instead.

Here's the appropriate prayer: "Dear God, I've done my budget for the month and I have exactly $X free after I pay household expenses. I'm going to flip a coin 100 times and count the number of times that heads comes up. I solemnly vow to donate that percentage of the money to charity, and to spend the remainder (if any) on liquor and pornography."

If you consistently throw heads 90-100% of the time, then that'll prove that your prayer was effective. That should qualify you for the million dollar challenge, but you might need to pledge the money to charity.

saizai
27th September 2006, 07:12 PM
You're also making assumptions that I will be copying your exact words, postings, etc. which is not necessary at all to describe your study...

In fact, I can sum up exactly what I'd ask my class on a test:

"Someone has proposed a study to determine if intercessory prayer creates better outcomes for sick people. They want to randomize about 50 participants into a control (no prayer) and treatment group (prayer) and then measure if a person got better or not. They argue that their simple randomization scheme will be sufficient to overcome possible confounders such as demographics, type of disease, disease severity, etc. Do you agree? Explain your position."

I know I haven't violated my membership agreement. Have I violated your copyright? Not in the least...

And anyone that wants to reproduce my test question, feel free...

So how can you prevent me from using your study as an example?

Doing that is not a violation of copyright. :)

However, don't forget to include the fact that it is a double blinded randomized control-group trial, and that there is a requirement for a certain amount of statistical significance.

And I'd like to see what responses you get.

saizai
27th September 2006, 07:15 PM
I think there's a fundamental problem with the nature of the prayers.

Normally, prayers for healing don't have any malicious intent. They're just: "Please let Grandma get better."

But in the case of a prayer study, there's implied malice: "Please heal the strangers in the experimental group while withholding healing from the strangers in the control group."

You misunderstand the setup. They are praying for one particular individual. They don't know anything at all about the rest.

Your argument is only valid if you're willing to also argue that every prayer for one person is a prayer against everyone else.


The solution is to not use sick people as test subjects. Use something like coin flips instead.

No thanks; I am interested in whether intercessory prayer works for realworld medical usage in the way I've set up, not in the way you propose. Of course it's a perfectly valid test; it's just not the one I'm doing.

Cuddles
28th September 2006, 03:31 PM
What does asking them questions have to do with the blinding? You seem to be confused as to what 'blinding' means...

Blinding means you anyone involved in the trial cannot know which group any participants are assigned to. If you determine that the groups are equivalent by asking participants questions you must know which groups htey are in, and therefore you are not blinded, by definition.

Wrong, sorry. They are statistically identical by definition. Perhaps you haven't read as many actual research studies as I (the ones I've seen are mainly in the areas of cognitive science and neurology, fwiw) but there are studies conducted and published routinely with n<50 which nevertheless produce perfectly sound results.

Strange, the words are all English, but somewho it makes no sense. Are you seriously claiming that just because you have a random selection, any groups you choose will turn out to be identical?

I disagree. Please provide a monte carlo sim that does take into account your supposed other factors and explain how they would affect the result even after randomized double-blinding. (Note the and; I want actual numbers rather than just your handwaving about "this will affect it" without anything to back that statement up.)

No, it's your test, you do it. You asked for us to point out any problems, and we have. It's your responsibility to either show that they are not actually problems, or fix them, not ours.

Also, when you say "supposed other factors", are you seriously claiming that you believe there are no other factors involved other than prayer? We have already mentioned numerous confouding factors such as type and severity of disease, wealth, social status, age, sex, race, etc. So far you haven't even tried to address these, and seem to be pretending that they don't exist, or are magicked away by randomising an inadequtely sized sample.

You haven't read the rest of it evidently.

The score equation created is used in the next round. Which has new data collected. Thus for that round the SE was made in advance.

In the post I quoted you claimed that you were creating the equation before collecting the data. Clearly this is not true for all your tests, even if it is for some. Therefore your equation is extremely likely to be picked specifically to give a positive result. This is not good.

What part of "randomized double blind" are you not getting? I'm not "assigning" people to treatment vs control on the basis of their condition.

As I said, the statistical analysis has nothing to do with the randomisation. If you cannot prove that your two groups are essentially similar then your analysis will be meaningless no matter how small your p value.

That's fine. I assert that the test I have proposed is sufficient for me. The end. ;)

I agree, the end. If you refuse to make the test sufficient for anyone apart from yourself you have no chance of ever being tested.

Agreed to the first sentence only.

They can be used to inform later tests - where the same standard of "decide in advance, randomized double blind" is used. And it's the normal practice in fact, whether explicitly labeled as such or not.

Yes, they can be used to inform tests for the new correlations you have found. That is exactly what I said. They can't be used to conduct further tests for the same thing, since if you are looking for something else it is no longer the same test.

I changed my mind on the first two (SE & PC): so long as they are fixed before the start of a given round, there is no reason to make them identical between rounds. Whereas there is reason to change them, to make the test more sensitive and based on actual data gathered last time so that you're not including a bunch of extraneous variables that turn out not to be affected.

The JREF specifically says that the preliminary and final tests have the same protocol. If you change your measure this would not be the case.

I'm not interested in allowing discussion except for the purposes of avoiding a false positive. IMO there is no possible way for this to happen given the constraints I already have placed. Can you think of one?

You may not be interested, but it is likely that the testers will be. They may accept the equation as given, but if you refuse to allow even the possibility of changing it I really doubt you will be tested. Consider that you could come up with an equation that guaranteed a positive result. If they can't alter this then you would win whatever the result. I am not suggesting that you will actually do this, but with a million dollars at stake you can bet that the JREF will not allow this option to be available.

Absolutely yes. See above post.



The copyright is always to the originator of the work unless they have signed a contract that says otherwise (i.e. the registration agreement, quoted above). Per the Berne convention (IIRC) there is no need to explicitly say this; it's automatic.

Fair enough. I know very little about this sort of thing.

Because I don't know in advance how they will get better. Maybe just HRQOL? Maybe a submeasure? Maybe their cancer staging? $ spent in treatment? There are a lot of variables, and fixing which ones you're testing for in advance without data (i.e. for round 1) is entirely arbitrary.

Do you really not see the problem with this? The whole point of a trial is htat you test for one thing and control for all the other things you can think of. As I have said before, and you agreed with, if you look for a change somewhere you will find one. This is the entire reason the measure is always specified in advance.

It's not "retroactive"; again you seem to have significantly misunderstood what I wrote.

Round 1 positive, preliminary -> Round 3 final (skip round 2)
Round 1 negative, prelim -> Round 2 ?, prelim

How is this "retroactive"? (Suggestion: look up the definition before you answer.)

ret·ro·ac·tive [ rèttrō áktiv ]


adjective

Definition:

applying to past: relating or applying to things that have happened in the past as well as the present

You are attempting to apply the status of "JREF preliminary test" to the test which occured in the past. QED. As I said, if everyone was allowed to choose which test was the official preliminary after they knew the result, everyone who took the test would pass.

In addition this would cause severe problems with the rest of your protocol, since you seem so keen on altering your measure of improvement. Since, as I said above, the JREF says that the preliminary and final tests must use the same protocol you would not be allowed to change anything (except the sample of course), and would therefore be left with your "random guess" at what your measure should be.

saizai
28th September 2006, 05:04 PM
Blinding means you anyone involved in the trial cannot know which group any participants are assigned to. If you determine that the groups are equivalent by asking participants questions you must know which groups htey are in, and therefore you are not blinded, by definition.

Not so. You only gather the data. You don't compare the groups (and unblind them) until all data is collected.


Strange, the words are all English, but somewho it makes no sense. Are you seriously claiming that just because you have a random selection, any groups you choose will turn out to be identical?

"Choose"?

Yes, I am saying that two groups randomly selected from the same pool will be statistically identical. Prove me wrong. With math and monte carlo sims.


In the post I quoted you claimed that you were creating the equation before collecting the data. Clearly this is not true for all your tests, even if it is for some. Therefore your equation is extremely likely to be picked specifically to give a positive result. This is not good.

Not so. For each body of data analyzed, the equation is set before that data is gathered. The existence of other data is irrelevant because it's not part of the new set.


I agree, the end. If you refuse to make the test sufficient for anyone apart from yourself you have no chance of ever being tested.

"For anyone apart from yourself"? WTF?

My understanding is that the JREF tests MY claim, not that of every theist worldwide. They can make their own applications if they so desire. So can you.


The JREF specifically says that the preliminary and final tests have the same protocol. If you change your measure this would not be the case.

Changing the measure is part of the protocol. JREF's only relevant concern is to ensure the protocol is methodologically proof against a false positive.


You may not be interested, but it is likely that the testers will be. They may accept the equation as given, but if you refuse to allow even the possibility of changing it I really doubt you will be tested. Consider that you could come up with an equation that guaranteed a positive result. If they can't alter this then you would win whatever the result. I am not suggesting that you will actually do this, but with a million dollars at stake you can bet that the JREF will not allow this option to be available.

Tell me how I could come up with such an equation, given that by definition it has no way of knowing which group any person is assigned to?


Do you really not see the problem with this? The whole point of a trial is htat you test for one thing and control for all the other things you can think of. As I have said before, and you agreed with, if you look for a change somewhere you will find one. This is the entire reason the measure is always specified in advance.

See above. That is why the Round 1 measure is determined in advance - but lacking a reason to determine it in a particular way, it is necessarily arbitrary.


You are attempting to apply the status of "JREF preliminary test" to the test which occured in the past. QED. As I said, if everyone was allowed to choose which test was the official preliminary after they knew the result, everyone who took the test would pass.

Um, no. What test "occured in the past"? If the first preliminary fails, then we try it again (with a better-informed SE). Quite simple.

Cuddles
29th September 2006, 07:32 AM
Let me put it very simply. You asked us to point out any flaws with your test. We have done so. You have ignored pretty much everything we have said. The JREF is very likely to ask the same questions. If you don't address them, you will not be tested.

digithead
29th September 2006, 11:59 AM
Not so. You only gather the data. You don't compare the groups (and unblind them) until all data is collected.

Your concept of blinding is flawed. Someone, usually the statistician, assigns a subject to either the control or treatment group, so this person is not "blind" to which group a subject is assigned. The blinding occurs when neither the treatment provider or subject knows which group the subject has been assigned to...

And yes, you do compare groups ahead of time to verify your randomization procedure didn't result in introducing additional bias...


Yes, I am saying that two groups randomly selected from the same pool will be statistically identical. Prove me wrong. With math and monte carlo sims.

And here we have the crux of your misunderstanding. You're confusing pools with populations. Subjects can be from the same pool but not from the same population unless they have the same mean and variance. Females typically are not from the same statistical population as males. Skin cancers are definitely not the same statistical population as brain cancers...

Mathematically, two populations have either different means, different variances, or both. When you mix populations into a pool and randomly select them, unless you're very lucky, you have to adjust for the things that might confound your outcome...

In addition, simple comparison tests like the t-test assume homogeneity of the variances within groups. You cannot make that assumption with a heterogeneous group sample (e.g., different types of cancers, demographics, etc.)...

Do you understand the criticisms now or are you still going to defend your position?

ChaosEngineer
29th September 2006, 03:26 PM
You misunderstand the setup. They are praying for one particular individual. They don't know anything at all about the rest.
I disagree. These are people who have volunteered to take part in a prayer study. It's reasonable to assume that they want the test to produce positive results. So no matter how hard they try to focus on healing the experimental group, there's always going to be an implied prayer that people in the control group stay sick.

Your argument is only valid if you're willing to also argue that every prayer for one person is a prayer against everyone else.
If I'm not taking part in a prayer study, then I don't have any reason to want strangers to stay sick. If the person I'm praying for gets better, then I'll be happy, but I certainly won't be unhappy if random strangers also recover.


(On the coin-flip experiment: )

No thanks; I am interested in whether intercessory prayer works for realworld medical usage in the way I've set up, not in the way you propose. Of course it's a perfectly valid test; it's just not the one I'm doing.
Well, OK. It's your time and money.

But I'm puzzled about why you want to test your hypothesis using a complicated experiment with hard-to-interpret results. You could test the same hypothesis using a simple experiment with easy-to-interpret results.

Really, if intercessionary prayer has a measurable benefit, then you should see that benefit in the simplest possible experiment. The more complexity you add, the more likely it is that the benefit (or lack thereof) will get lost in the statistical noise.

And now that I think about it, the simple experiments have already been done and they've never shown any benefit. Do you have a theory as to why your experiment might work when so many other people have failed? What factor is genuinely new here?

saizai
29th September 2006, 04:31 PM
And yes, you do compare groups ahead of time to verify your randomization procedure didn't result in introducing additional bias...

Would you mind showing the math for how the randomization procedure could "introduce" additional bias? Specifically, one that results in a false positive more than (p*100%) of the time? Include a monte carlo sim demonstrating the math with real numbers please.

'cause, y'know, that's what the p is for...

saizai
29th September 2006, 04:35 PM
I disagree. These are people who have volunteered to take part in a prayer study. It's reasonable to assume that they want the test to produce positive results. So no matter how hard they try to focus on healing the experimental group, there's always going to be an implied prayer that people in the control group stay sick.


If I'm not taking part in a prayer study, then I don't have any reason to want strangers to stay sick. If the person I'm praying for gets better, then I'll be happy, but I certainly won't be unhappy if random strangers also recover.

That's fine. If that is how prayer works, then evidently it works equally for everyone related to some particular group and not just the person you are in fact praying for.

Since that's not something that can be controlled for, it's an acceptable limit of the study.


But I'm puzzled about why you want to test your hypothesis using a complicated experiment with hard-to-interpret results. You could test the same hypothesis using a simple experiment with easy-to-interpret results.

I don't consider it particularly hard to interpret.

In any case, my question is not whether prayer works in some general sense, but whether it works for helping people who are sick. Thus your alternate experiment is irrelevant to answering my question.

digithead
29th September 2006, 04:55 PM
Would you mind showing the math for how the randomization procedure could "introduce" additional bias? Specifically, one that results in a false positive more than (p*100%) of the time? Include a monte carlo sim demonstrating the math with real numbers please.

'cause, y'know, that's what the p is for...

No, until you understand and demonstrate to us that you know the difference between pools and populations anything else is a waste of time...

Thinktoomuch
6th October 2006, 01:34 AM
Saizai, to the outsider like me, this thread looks exactly like any other challenge thread. You are an academic, why did you stoop to argue like any other crackpot about what JREF want? If you back away now you lose face with the forum readers (with me anyway, I can only speak for myself), which may not be a big deal but why bringing this on yourself?

Now that you got yourself into this predicament, IMO your best bet is to set aside your preparatory work and ask officially JREF to provide you with a test protocol that meets their criteria. Possible outcomes:

- JREF declines on the basis that the claim is too difficult to test properly or asks you to pay a lot for the work they need to do, in which case you can withdraw without losing face. You do not get the million but if your work is sound as you believe, it will be vindicated by the peer reviews;

- JREF provides you with a reasonable protocol, therefore, as a genuine scientist, you should be happy to apply it;

- JREF provides you with an extremely stringent protocol that would cost a fortune to apply and it is either:
- unreasonably demanding and can be shown to be so to
the scientific community, or
- a good justification for asking your sponsors to cough up
the money if they are honestly looking for results that will
stand up to scrutiny. If they don't, you might canvas widely
for new sponsors, if this does not work either you are clear.

Bear with me, all my life I have upset people offering advice without being asked, old habits are difficult to change... as seems to be true for many other opinionated people on this forum...

69dodge
21st April 2007, 11:11 PM
Here's something to think about, saizai.

Suppose you pick a scoring method that assigns to each receiver a random number between 0 and 100, entirely ignoring whether they got better or worse.

1) If prayer doesn't work, will the study be too likely to produce a positive result? (Answer: no.)

2) If the result of the study turns out to be positive---it probably won't, of course, but if it does---would this positive result constitute even the slightest evidence for the efficacy of prayer? (Answer: also no.)

So, something appears to be wrong with your repeatedly and vigorously stated position that all that's needed is to ensure a small probability of false positives.

What makes a particular result deserving of the title "positive" in the first place? It needs to be improbable assuming prayer doesn't work, but it also needs to be probable assuming prayer does work. Otherwise, the study isn't studying prayer at all.

Suppose I buy a lottery ticket, and decide that if I win, then prayer cures cancer. Clearly, this makes no sense at all. I probably won't win the lottery, but even if I should happen to win it, my win would obviously give me absolutely no reason to believe that prayer cures cancer. What has the lottery to do with cancer? But isn't it entirely true that if prayer doesn't cure cancer, I am unlikely to win the lottery? So, the probability of a false positive is extremely low? Yes. But what's also true is that if prayer does cure cancer, I'm still unlikely to win the lottery. And both of those probabilities matter.

saizai
22nd April 2007, 12:42 AM
69:

1. It will be likely exactly as much as the statistical uncertainty is. That in turn is dependent on the bell curve involved and the sample size.

2. A positive result would constitute evidence for the efficacy of prayer... at influencing the score equation decided upon. (E.g. a theist might argue that God wanted to show herself by demonstrating a positive result, and therefore the prayers went toward influencing the [psuedo-]random number generators used.)

Remember, I am claiming nothing whatsoever about mechanism. In fact, I have no theories about what mechanisms may or may not exist, nor any a priori basis to have one. So I have no reason to say that it will affect that and not something else. My choice of the HRQOL as the initial measure is largely arbitrary; a matter of personal taste, if you will. A better mechanism is to base the score equation on the results of the previous round (i.e. on what items tested appeared to show the highest effects), which is what I would do for round #2.


You cannot objectively say "well we will thow out this result because we think it's silly". You set the measurement, you gather the results, and whatever the statistical uncertainty of the result, that's what it is. If you get a positive result and don't believe it for some reason, you run the test again, with a larger sample size. But you don't get to toss the result; that is not proper protocol, and opens the door to all sorts of fallacy.


You must realize, this is true of ALL research of this paradigm (i.e. double-blind randomized controlled trials), of which there is a LOT. Probability dictates that p of it will in fact be false positive; p is usually <.05 for normal clinical trials, and further reduced through repetition, etc. But it still exists. It could still be wrong, and seemed to be true merely by an accident of chance.

The same is true of mine. If I get a positive result, there is a certain chance - p - that the result is false. But you cannot claim that it is other than p; e.g. that because the result is positive, the result is false, which is what you are arguing. If you doubt it, you simply run a trial again, or with greater numbers, until p is sufficiently low for your taste. The standard in academic research is p<.05 (5%); very high certainty is considered to be p<0.001 (.1%). Of course, important experiments get duplicated, and this further reduces the p overall if the experiments are similar enough to enable meta-analysis.

Elaedith
22nd April 2007, 03:43 AM
69:

1.

The same is true of mine. If I get a positive result, there is a certain chance - p - that the result is false. But you cannot claim that it is other than p; e.g. that because the result is positive, the result is false, which is what you are arguing. If you doubt it, you simply run a trial again, or with greater numbers, until p is sufficiently low for your taste. The standard in academic research is p<.05 (5%); very high certainty is considered to be p<0.001 (.1%). Of course, important experiments get duplicated, and this further reduces the p overall if the experiments are similar enough to enable meta-analysis.

p is the probability of getting an outcome when the null hypothesis is true, not the probability THAT the null hypothesis is true.

69dodge
22nd April 2007, 04:39 AM
2. A positive result would constitute evidence for the efficacy of prayer... at influencing the score equation decided upon. (E.g. a theist might argue that God wanted to show herself by demonstrating a positive result, and therefore the prayers went toward influencing the [psuedo-]random number generators used.)


I thought the study was supposed to test whether prayer helps improve the health of cancer patients, not whether prayer affects random number generators.

Remember, I am claiming nothing whatsoever about mechanism. In fact, I have no theories about what mechanisms may or may not exist, nor any a priori basis to have one. So I have no reason to say that it will affect that and not something else.


You can't design a study to test for something, if you haven't decided what you want to test for.

You don't have to claim that you personally and wholeheartedly believe prayer works, and that you believe it works in a particular way. But you have to decide what sort of prayer you want to test for---how it would work if it did work---because otherwise you won't be able to decide how to test for it.

Probability dictates that p of it will in fact be false positive;


What is the "it", p of which will be false positives? (Rhetorical question. I answer it below.)

The same is true of mine. If I get a positive result, there is a certain chance - p - that the result is false.


That's not what p is. This is what p is:

Suppose prayer doesn't work. And suppose you haven't done the study yet. What is the probability that, when you do the study, you will get a positive result? (Such a positive result would necessarily be a false positive, because we're supposing that prayer doesn't work.)

The probability p is a probability that is based on the assumption that prayer definitely doesn't work.

Of course, we don't actually know whether prayer works. That's why we're doing a study. And when the study is done, we still won't know for sure whether prayer works, although we will have gotten some new information that will affect how likely we think it is to work. So, we're never in a situation where the assumption underlying p is known to be true. So, p is not as directly useful as one might imagine. It is relevant, to be sure, but not directly so.

But you cannot claim that it is other than p;


It is almost certainly other than p. By "it", I mean this probability:

Suppose, realistically and unlike before, that we don't know whether prayer works. Suppose the study has been done, and the result was positive. Now, what is the probability that prayer doesn't work? (If prayer doesn't work, the positive result was a false positive.)

This is a different question from the previous one, and it has, in general, a different answer, even though both questions could be phrased, imprecisely, as, "What is the probability of a false positive?". In both, we're interested in the probability of the combination: prayer doesn't work and the study result is positive. However, in one, we already know that prayer doesn't work but we don't know whether the study result will be positive, while in the other, we already know that the study result was positive but we don't know whether prayer works.

What's the probability it's raining, if it's cloudy? What's the probability it's cloudy, if it's raining? Not the same thing.

e.g. that because the result is positive, the result is false, which is what you are arguing.


I'm arguing that, supposing the study's result turns out to be positive, knowing p isn't enough to decide whether that positive result is probably a true positive or probably a false positive. Not only must you know what the probability of a positive result was on the assumption that prayer doesn't work (i.e., p), you also must know what the probability of a positive result was on the assumption that prayer does work. And, therefore, you have to be specific enough about the sort of prayer you want to study, to be able to determine the latter probability.

(The prior probability that prayer works matters too, but that's not what I was talking about here.)

Porterboy
22nd April 2007, 04:58 AM
Sorry if someone has already mentioned this, but have any of you studied the work of Peter Fenwick. Here's an article about him I found: http://www.thepsychictimes.com/articles/fenwick.htm In addition to his NDE research he carried out an experiment into prayer studies.

I live near this guy and went to a lecture of his once. It was most interesting.

fls
22nd April 2007, 07:24 AM
See http://www.prayermatch.org/ . It should be a complete description - methodology, goals, my intent / opinion, etc.

The backend programming isn't ready yet but the basics (i.e. user accounts and the public pages) are there. I intend to begin once a sufficient number of participants are signed up; the backend will be ready by then.

If you have a critique, please make sure that:
* you've read all the pages linked from the main page
* you can explain why your perceived flaw in my design would cause a false positive result, i.e. a statistically significant difference between the active and control groups of Recipients in the second and/or third round

I am aware that I have put limits on it that may cause false negatives, and am quite okay with that; my problem not yours. ;)

If I have left out anything it is probably by mistake (I only just finished writing the content); point it out and I'll correct it.

BTW, I have previously suggested this as a bona fide MDC, but the understanding I reached with Randi's representative was that they are only interested in things that can be proven in a small-scale, one-person fashion. I do not claim any such power or effect; if there is an effect, I only expect a small but statistically significant difference between the active and control groups.

Thanks!

P.S. Yes, I've read the rules and FAQ.

I have read through the pages at the site you have linked and understand what you are proposing. I have not read through all of the posts on this thread (although I will attempt to do so later), so I apologize is any or all of this has already been addressed.

My biggest concern is with respect to the ethics of performing this study. I realize that you only consider it your problem if the results are negative, but it is not that simple.

You have not explained what you hope to accomplish with this study and how it will be achieved. The general indication is that you wish to provide results that will push the research forward among serious researchers. That is, a positive result would be considered valid and worthy of further consideration by those who are currently unconvinced. There has already been a lot of research on this topic. What you need to explain is exactly what the methodologic concerns were from previous studies and how your study will overcome these concerns. And how your results can be received in a way that they can be taken seriously. Normally, studies are published in peer-reviewed journals in order to make sure that there has been at least a minimum degree of oversight to assess validity. If you can not accomplish that, it seems likely that the very people you wish to reach will not accept the results as valid. You do not mention any connection to an academic institution or ethical approval from an independent review board. Normally, both of those are required for consideration for publication.

Your study design sets you up for failure (I'll elaborate in a bit). I understand that you consider that only your problem, however it should be considered unethical to waste the time of people who are already going through a very difficult period in their lives, and to add to their distress by providing false hope. By false hope, I don't mean the presupposition that the prayer itself will offer benefit, but the presupposition that participation in this study can advance this area of research - i.e. that participation in this study can be meaningful. How can it be meaningful if the chances of it finding a real effect are miniscule and if "positive" findings will probably be ignored by serious researchers?

Let me explain why your study sets you up for failure (I'm not assuming you don't know this, just wanting to make it explicit). Let's start by assuming that there is actually a real effect that you could find. What is the chance that you will actually find that real effect vs. the chance that you will find spurious effects? You are collecting a lot of data and it is vaguely defined. Once you sit down to analyze it, you will probably be able to find dozens of ways in which to compare the two groups (the "data mining" you refer to). Since you have set your p to <0.05, by chance you will find several outcome variables that are different between the two groups. We have also assumed that there is a real effect on some outcome(s). And one would assume that that outcome will also show a signficant difference, except that your study is so under-powered that it is probably far more likely to miss the effect than it is to capture the effect. The variability on your variables is very high and previous studies have failed to demonstrate an effect, suggesting that the effect is (at the most) small. Taking that into consideration, I'd be suprised if your study has a power greater than 0.10 to detect a difference. What that means is that any differences you do detect are far more likely to be false-positives than they are to be true-positives. When you go on to repeat the study, focussing only on those variables, the results will almost certainly be negative because you failed to select the relevant outcomes (assuming that there are any to select). These ideas are discussed in greater detail in this paper (http://medicine.plosjournals.org/perlserv/?request=get-document&doi=10.1371/journal.pmed.0020124).

Positive results will likely be subject to extra scrutiny for validity (without additional support from other research). Outcome measures whose validity has already been established (e.g. a visual analog scale for pain) should be used. Otherwise you don't know whether the answers to your questions reliably or validly measure anything of interest. The low number of participants makes unequal sorting of confounders likely, and if you don't know what you are measuring, it will be easy to miss this.

ETA: I accidentally edited this out.

It would be preferable to have a neutral third party analyze the results. The outcome measures should be coded (converted into a form suitable for analysis) blind.

Other minor quibbles that don't affect the study...

You refer to "data mining" as selection bias. Selection bias refers more to how you select the population from which your samples will be drawn (in this case, people aware of your site who have cancer, have an interest in prayer and healing, and make the effort to participate) which leads to issues of generalizability and confounding. Although, to be fair, I find that often selection bias as a term gets used as a catch-all for different kinds of bias - sample biases in particular - so it's use may not really be constrained to a particular type of bias.

It is not "impossible to prove a negative". It is no more or less possible to prove a negative than it is a positive. However, I see this bit of "wisdom" repeated frequently, including on this forum, so that's probably a whole separate discussion.

Linda

fls
22nd April 2007, 12:59 PM
Okay, I've had a chance to read through the thread. I see some of the issues I mentioned have already been raised.

Somehow I got the impression from my quick reading that you had already decided upon an N of 25, which is what my comment about being underpowered was based upon. I see that that number is still undetermined.

I mentioned confounders and this has also been discussed. With so many variables (likely mostly unmeasured) that can affect outcome, the concern is that unequal sorting of the confounders/independent variables could lead to a false positive, and that there is a chance that this unequal sorting could happen in the same direction for each study. Depending upon the strength of the association between confounder and outcome, the chance of this happening may be greater than the one in a thousand standard from JREF (i.e. the chance of confounding may be greater than the chance that the null should be rejected). Starz' Monte Carlo sim doesn't eliminate this concern as it tested different parameters than the ones that we are concerned about.

It isn't really the main area of concern, though.

Originally Posted by saizai(Frankly, this is a point that's always puzzled me about the practice of medicine: why not try to enhance the placebo effects as much as possible if your goal is to heal the patient? Obviously, one must control for them as a confounder [note that I'm using this to mean "thing which could result in a false positive"] when doing research for the effectiveness of treatments, but that doesn't mean that the placebo is ineffective; quite the opposite has been proven true repeatedly.)

That is a misconception. The "placebo effect" represents what was going to happen anyway, plus some changes in the subjective evaluation of symptoms/outcomes. A healing effect specific to placebo has not been demonstrated (http://content.nejm.org/cgi/content/abstract/344/21/1594).

Linda

digithead
22nd April 2007, 01:32 PM
Linda,

It's not worth your time with this guy. To use an old title from a mid 80's Billy Bragg album, it's like "Talking with the Taxman about Poetry."

Many of us have already tried to educate him on basic statistical theory, sampling from different populations, confounding, and clinical trial design but his hubris blinds him to his ignorance...

Anyhow, I like your definition of the placebo effect, it's the most succinct one I've ever seen...

-digithead

saizai
22nd April 2007, 01:34 PM
p is the probability of getting an outcome when the null hypothesis is true, not the probability THAT the null hypothesis is true.

*laugh* Correct. Figures there'd be someone around who wants to be precise.

In any case, p is effectively the uncertainty factor, i.e. the chance that you haven't proven what you thought you had, which is all I was using it for in my argument.

I thought the study was supposed to test whether prayer helps improve the health of cancer patients, not whether prayer affects random number generators.

The study is supposed to test whether prayer creates an effect in this study. It is set up especially so that if it creates a HRQOL effect on people prayed for, that can get detected... but who am I to be picky? :P

You don't have to claim that you personally and wholeheartedly believe prayer works, and that you believe it works in a particular way.

I don't; I'm an agnostic and have never been a theist. I simply want to test it.

But you have to decide what sort of prayer you want to test for---how it would work if it did work---because otherwise you won't be able to decide how to test for it.

Per above, I'm testing for additive prayer on seriously ill patients. The choice of cancer patients is largely arbitrary; I'd be open to switching it to any group that has rapid (w/in 1 year) changes of health condition (so as to create a bell curve that would make group differences sensitive), is appealing to pray for, etc.

This is a different question from the previous one, and it has, in general, a different answer, even though both questions could be phrased, imprecisely, as, "What is the probability of a false positive?". In both, we're interested in the probability of the combination: prayer doesn't work and the study result is positive. However, in one, we already know that prayer doesn't work but we don't know whether the study result will be positive, while in the other, we already know that the study result was positive but we don't know whether prayer works.

What's the probability it's raining, if it's cloudy? What's the probability it's cloudy, if it's raining? Not the same thing.

Per above, this is certainly true. However, I think we are getting outside the range of valid objections to my protocol, and into simply general objections to all research, particularly speculative research. As it is not related to any particular flaw of *my* methodology, I'd rather not get into it.

Not only must you know what the probability of a positive result was on the assumption that prayer doesn't work (i.e., p), you also must know what the probability of a positive result was on the assumption that prayer does work.

And that's not possible to know. (Though theists may claim otherwise.)

Certainly it's not possible to even discuss what that p(positive | it works ) without discussing X. And I will not get into any discussion about X, as I consider it a waste of time given the dearth of valid non-contradictory evidence to determine X.

You have not explained what you hope to accomplish with this study and how it will be achieved. The general indication is that you wish to provide results that will push the research forward among serious researchers. That is, a positive result would be considered valid and worthy of further consideration by those who are currently unconvinced. There has already been a lot of research on this topic. What you need to explain is exactly what the methodologic concerns were from previous studies and how your study will overcome these concerns. And how your results can be received in a way that they can be taken seriously. Normally, studies are published in peer-reviewed journals in order to make sure that there has been at least a minimum degree of oversight to assess validity. If you can not accomplish that, it seems likely that the very people you wish to reach will not accept the results as valid. You do not mention any connection to an academic institution or ethical approval from an independent review board. Normally, both of those are required for consideration for publication.

While I thank you for your concern, that is not something I am interested in discussing, as it falls within the "what will you do with it if you win" category.

By false hope, I don't mean the presupposition that the prayer itself will offer benefit, but the presupposition that participation in this study can advance this area of research - i.e. that participation in this study can be meaningful. How can it be meaningful if the chances of it finding a real effect are miniscule and if "positive" findings will probably be ignored by serious researchers?

That is only one aspect of it.

You could just as well claim that serious researchers would [I]never accept a positive finding, and that therefore any research is completely fruitless. I happen to disagree. However, per above, I do not want to discuss this further, as it is not related to a specific critique of (and preferably, improvement to) my methodology.

You are collecting a lot of data and it is vaguely defined.

It is not vaguely defined, except the comments section, which I have said that I do not intend to use for the purposes of conclusion-relevant analysis.

Once you sit down to analyze it, you will probably be able to find dozens of ways in which to compare the two groups (the "data mining" you refer to). Since you have set your p to <0.05, by chance you will find several outcome variables that are different between the two groups.

Certainly. Which is why I set the analysis before the data is collected, per standard rigorous protocol. No sharpshooter fallacy here. :)

These ideas are discussed in greater detail in this paper (http://medicine.plosjournals.org/perlserv/?request=get-document&doi=10.1371/journal.pmed.0020124).

Thank you for the reference. I'll have to read it later, since I'm a bit busy at the moment.

Positive results will likely be subject to extra scrutiny for validity (without additional support from other research). Outcome measures whose validity has already been established (e.g. a visual analog scale for pain) should be used.

I tentatively intend to use the well-established SF36v2 HRQOL as a measure for round 1. What I use for round 2 will be decided after round 1 is complete.

It would be preferable to have a neutral third party analyze the results. The outcome measures should be coded (converted into a form suitable for analysis) blind.

I intend to collect all data by internet application, and only use paper for verification / signature collection. So the neutral third party is a computer program, with the relevant parts of it being open sourced.

You refer to "data mining" as selection bias. Selection bias refers more to how you select the population from which your samples will be drawn (in this case, people aware of your site who have cancer, have an interest in prayer and healing, and make the effort to participate) which leads to issues of generalizability and confounding. Although, to be fair, I find that often selection bias as a term gets used as a catch-all for different kinds of bias - sample biases in particular - so it's use may not really be constrained to a particular type of bias.

Sorry for the lax use of terms. What I mostly was referring to is formally known as the Texas sharpshooter's fallacy.

It is not "impossible to prove a negative". It is no more or less possible to prove a negative than it is a positive. However, I see this bit of "wisdom" repeated frequently, including on this forum, so that's probably a whole separate discussion.

Indeed it is.

Argument from ignorance is generally valid only in some very limited circumstances, where you have proven the ability of the test to detect the thing tested for, and are claiming that the (new) negative results are therefore evidence that the thing tested for does not exist in the place it was newly tested for.

See my "Charlie the Treasure Hunter" analogy; should come up on a forum search.

Somehow I got the impression from my quick reading that you had already decided upon an N of 25, which is what my comment about being underpowered was based upon. I see that that number is still undetermined.

Correct. I would like 50<n<500 but it's primarily a pragmatic question, of how many qualified participants can be recruited.

I mentioned confounders and this has also been discussed. With so many variables (likely mostly unmeasured) that can affect outcome, the concern is that unequal sorting of the confounders/independent variables could lead to a false positive, and that there is a chance that this unequal sorting could happen in the same direction for each study.

How would it do so [at likelihood > p], given that the sorting into groups is random?

Depending upon the strength of the association between confounder and outcome, the chance of this happening may be greater than the one in a thousand standard from JREF (i.e. the chance of confounding may be greater than the chance that the null should be rejected). Starz' Monte Carlo sim doesn't eliminate this concern as it tested different parameters than the ones that we are concerned about.

If you believe that the sim is invalid, please propose an alternate sim so that we can test your hypothesis. :)

A healing effect specific to placebo has not been demonstrated (http://content.nejm.org/cgi/content/abstract/344/21/1594).

Again, thanks for the reference; will read later.

I don't think that this one affects my methodology, however.

andyandy
22nd April 2007, 01:36 PM
That is a misconception. The "placebo effect" represents what was going to happen anyway, plus some changes in the subjective evaluation of symptoms/outcomes. A healing effect specific to placebo has not been demonstrated (http://content.nejm.org/cgi/content/abstract/344/21/1594).

Linda

the interesting question is how medical science should evaluate subjective patient evaluations of pain reduction - and whether or not this can be said to have a physiological or psychological root - or indeed if it's a false dichotomy to separate psychological from psyiological at all.....

saizai
22nd April 2007, 01:42 PM
Oof, let's please not get into the 'what is pain really' thing; I had more than enough of that from John Searle. :p

andyandy
22nd April 2007, 02:25 PM
Oof, let's please not get into the 'what is pain really' thing; I had more than enough of that from John Searle. :p

lol

i won't derail your thread....i might start something in SMMT though....things normally get a bit heated when philosophy and science collide - could make an interesting topic :D

fls
22nd April 2007, 02:52 PM
While I thank you for your concern, that is not something I am interested in discussing, as it falls within the "what will you do with it if you win" category.

I'm very disappointed to hear that. You do not seem to be a professional and so you are probably not under an obligation to act in an ethical manner, but I was hoping that you would choose to do so, anyway. That you have no interest in improving your methodology in order to decrease the probability that you are wasting part of what life remains for these people is disturbing to me.

That is only one aspect of it.

You could just as well claim that serious researchers would never accept a positive finding, and that therefore any research is completely fruitless. I happen to disagree. However, per above, I do not want to discuss this further, as it is not related to a specific critique of (and preferably, improvement to) my methodology.

It is related to critique and improvement of your methodology, just not the part that you care about.

It is not vaguely defined, except the comments section, which I have said that I do not intend to use for the purposes of conclusion-relevant analysis.

It is vaguely defined in that at no point have you indicated what useful or relevant concept you are measuring when you collect data such as "cost of treatment". The only measure you have that seems to be reasonably reliable and valid (as a measure of pain) is your 10 point pain scale. Even your hard outcomes, such as death, will be measured in an unreliable fashion, making it unclear what exactly it means to not be registered as a death.

Certainly. Which is why I set the analysis before the data is collected, per standard rigorous protocol. No sharpshooter fallacy here. :)

I was referring only to your first study, which is exploratory in nature and does not involve pre-set comparisons.

Argument from ignorance is generally valid only in some very limited circumstances, where you have proven the ability of the test to detect the thing tested for, and are claiming that the (new) negative results are therefore evidence that the thing tested for does not exist in the place it was newly tested for.

See my "Charlie the Treasure Hunter" analogy; should come up on a forum search.

The circumstances are no more limited than they are for testing a positive.

The analogy didn't come up for me, but I think I know what you mean.

Correct. I would like 50<n<500 but it's primarily a pragmatic question, of how many qualified participants can be recruited.

It would be more responsible of you to figure out beforehand how many you need before undertaking a three year project.

How would it do so [at likelihood > p], given that the sorting into groups is random?

I will work on an illustrative example (which may be more useful than a Monte Carlo sim).

If you believe that the sim is invalid, please propose an alternate sim so that we can test your hypothesis. :)

It would make more sense to wait until you have an idea what the parameters will be.

Linda

saizai
22nd April 2007, 04:24 PM
I'm very disappointed to hear that. You do not seem to be a professional and so you are probably not under an obligation to act in an ethical manner, but I was hoping that you would choose to do so, anyway. That you have no interest in improving your methodology in order to decrease the probability that you are wasting part of what life remains for these people is disturbing to me.

You seem to have grossly misunderstood what I said.

I am interested in improving methodology. The quote I was responding to, however, had no suggestions for methodology, only for what I might do with the study once it was completed, how it might be published, etc. That I do not see any reason to discuss here.

I am under obligation to act ethically simply on ethical grounds, not out of any academic contract.

It is related to critique and improvement of your methodology, just not the part that you care about.If you have a suggestion for how to improve the methodology of the study itself - to make it more reliable, more sensitive, more comprehensive, etc - I'm listening.

Your definition of 'methodology' does not seem to be within that.

It is vaguely defined in that at no point have you indicated what useful or relevant concept you are measuring when you collect data such as "cost of treatment". The only measure you have that seems to be reasonably reliable and valid (as a measure of pain) is your 10 point pain scale. Even your hard outcomes, such as death, will be measured in an unreliable fashion, making it unclear what exactly it means to not be registered as a death."Cost of treatment" = total pre-insurance medical bills. Not really a subjective or 'soft' measure.

10 point pain scale is superceded by the SF-36v2 HRQOL.

Explain what you mean re deaths? That was unclear. Preferably, propose an improvement to whatever flaw(s) you see...

I was referring only to your first study, which is exploratory in nature and does not involve pre-set comparisons.Ah. That is part of the design that has changed since. The new one is:
Round 1: score equation = SF36v2 HRQOL (tentatively)
Round 2: score equation determined based on Round 1 data, default to SF36v2 again

No exploratory round.

It would be more responsible of you to figure out beforehand how many you need before undertaking a three year project.Two year (it's been revised).

And yes, it would be decided beforehand; but choice of N is dependent on quite a number of things, some of which aren't yet settled.

Again, do you have a specific suggestion for what it should be and why rather than just non-constructive criticism?

I will work on an illustrative example (which may be more useful than a Monte Carlo sim).Please make it one that is simable. Previous posters have claimed to have examples that, when simulated, turned out (per my argument) to be spurious.

articulett
22nd April 2007, 05:32 PM
Suppose this was a "wishing on a star" study--and you were going to match up wishees and wishers-- do you see how hard it would be prove that any of the star wishing had any effect on the wishees? Sure, there might be some effect of some sort--but we would have no way of attributing it to star wishing.

People have probably already pointed you to the Harvard heart prayer study--that is what a good study looks like. You can easily see if there are effects and if the effects are related to prayer and/or knowing that one might be being prayed for etc. You have nothing to control for a placebo effect or any way of measuring healing.

You've asked us to go to your website and evaluate your protocol, now do something for the people who were kind enough to respond to you. Go watch this video http://www.whydoesgodhateamputees.com/video8.htm and tell us how you propose to control for such an illusion? People attribute healings to all sorts of things--prayer, wheatgrass juice, rain dances, good karma, etc.--but that doesn't mean any of these things are responsible. Most things get better on their own. Many things respond positively just to the idea that someone is doing something or cares. The placebo effect is very real.

Listen to what the smart people on this forum are telling you if you are serious. And learn how to design a good prayer study. http://www.hno.harvard.edu/gazette/2006/04.06/05-prayer.html

Are you familiar with Randi's astrology "class"? http://www.youtube.com/watch?v=3Dp2Zqk8vHw

If you want to be taken seriously, you need to show some awareness of how people fool themselves and what a double blind study is--you seem not to understand either...or even why your protocol isn't a protocol. You may as well see if chanting for world peace works while you're implementing your study.

saizai
22nd April 2007, 07:17 PM
articulett - I don't think you understand my methodology. The measurement is clear; there is no need to control for placebo since all recipients are treated identically (it's blinded); and you can tell whether it has an effect by comparing the groups.

You do not seem to understand either that this is a randomized controlled trial, or what one is; I am not making any analysis from any specific situation, but from the group differences (or lack thereof). The illusion video you linked to does not apply to me. (Also, please note that I am not myself a theist.)

If you have a specific reason why you believe that my protocol is not in fact a blinded randomized controlled trial, please quote my study design and explain what fallacy you believe is going on.

P.S. It happens that my major is in cognitive science, so I am well familiar with a very wide range of cognitive and visual illusions, errors, fallacies, etc. :) I am not making any.

69dodge
22nd April 2007, 09:24 PM
*laugh* Correct. Figures there'd be someone around who wants to be precise.

In any case, p is effectively the uncertainty factor, i.e. the chance that you haven't proven what you thought you had, which is all I was using it for in my argument.


The chance that you haven't proven what you thought you had, given what?

I think you underestimate the importance of precision on this point.

Per above, this is certainly true. However, I think we are getting outside the range of valid objections to my protocol, and into simply general objections to all research, particularly speculative research. As it is not related to any particular flaw of *my* methodology, I'd rather not get into it.


I do not object to all research. I object to hypothesis testing that concerns itself only with the level of significance of the test but ignores questions of power. A positive result from such a test is quite impossible to interpret.

And that's not possible to know. (Though theists may claim otherwise.)

Certainly it's not possible to even discuss what that p(positive | it works [in manner X]) without discussing X. And I will not get into any discussion about X, as I consider it a waste of time given the dearth of valid non-contradictory evidence to determine X.


One does not have to believe that X exists, to discuss X for the purpose of designing a study to check whether X exists.

One has to discuss X, to design a study to check whether X exists.

How would it do so [at likelihood > p], given that the sorting into groups is random?

If you believe that the sim is invalid, please propose an alternate sim so that we can test your hypothesis. :)


Can we back up here?

A simulation is fine, but what should be simulated?

You think ("at likelihood > p") that the simulation should simulate many runs of the experiment, all conducted under the assumption that prayer is ineffective. Some of these runs will have a negative result; some will have a postive result. The simulation should report the fraction of runs that give a positive result.

How does this kind of simulation accurately reflect the situation we find ourselves in, should the real experiment produce a positive result? In this situation, we have to decide whether the positive result is probably a false positive or probably a true positive.

The simulation involves many negative results; in the real situation, the result is known to be positive.

The simulation involves no true positives; in the real situation, the result might be a true positive. (Obviously. The whole question is to decide whether it is or isn't!)

Such a simulation is irrelevant.

Well, let me qualify that. It is irrelevant from a scientific point of view, from the point of view of someone who isn't sure whether prayer works and who is interested in finding out what a positive experimental result would imply about the issue. From the point of view of (a caricature of) the JREF, which is already sure that prayer doesn't work and which is interested solely in keeping its million dollars with high probability, it is sufficient.

saizai
22nd April 2007, 10:22 PM
The chance that you haven't proven what you thought you had, given what?

I think you underestimate the importance of precision on this point.

Please explain then how the difference is relevant here, and how exactly a better design could be made.

If the flaw is with any research that attempts to ask what I'm asking, well, ain't much I can do about that. :p

I do not object to all research. I object to hypothesis testing that concerns itself only with the level of significance of the test but ignores questions of power. A positive result from such a test is quite impossible to interpret.

See above.

I'm happy to discuss ways to improve the methodology. I am not particularly interested in discussing the problems with trying to do a study a prayer at all, e.g. ones based on claiming that the prior probability is infinitesimal and therefore ANY result from ANY study design is going to be a very low posterior probability. That's just not very useful.

So please: offer a suggestion for how it can be improved, or don't waste your time and mine.

One does not have to believe that X exists, to discuss X for the purpose of designing a study to check whether X exists.

One has to discuss X, to design a study to check whether X exists.

Not very much. I am interested in whether prayer can influence the outcomes I am measuring. I am not, per se, testing whether prayer exists.

This is because I don't care whether prayer exists, unless it is a sort that can influence the things I am interested in measuring.

Hopefully the distinction makes sense.

Thus, my choice of measure is not dependent on X, but on what types of prayer, as a class, I care about.

A simulation is fine, but what should be simulated?

You think ("at likelihood > p") that the simulation should simulate many runs of the experiment, all conducted under the assumption that prayer is ineffective.

Why? Conduct them under either assumption.

Some of these runs will have a negative result; some will have a postive result. The simulation should report the fraction of runs that give a positive result.

Yup.

How does this kind of simulation accurately reflect the situation we find ourselves in, should the real experiment produce a positive result? In this situation, we have to decide whether the positive result is probably a false positive or probably a true positive.

The simulation involves many negative results; in the real situation, the result is known to be positive.

The simulation involves no true positives; in the real situation, the result might be a true positive. (Obviously. The whole question is to decide whether it is or isn't!)

Such a simulation is irrelevant.

Well, let me qualify that. It is irrelevant from a scientific point of view, from the point of view of someone who isn't sure whether prayer works and who is interested in finding out what a positive experimental result would imply about the issue. From the point of view of (a caricature of) the JREF, which is already sure that prayer doesn't work and which is interested solely in keeping its million dollars with high probability, it is sufficient.

I see. Interesting points.

However, the JREF has previously taken on experiments - less complicated perhaps - which still reduce to the same issue: there is some chance that the applicant will get a positive result by luck alone. In this context, what we are discussing, ultimately, must resolve to that.

Take that into account when proposing your modified simulation, one which compares both prior assumptions of prayer working and prayer not working. Remember that JREF is willing to accept positive results simply by virtue of them being very unlikely (e.g. 1/1000th).

Clearly they are not making the rather more complicated prior/posterior probability sort of argument based on previous research, the power of the experiment being conducted, etc.

And again, rather than purely attacking the power of the study, please suggest improvements to the methodology that would not have the flaws you see in the current one.

If you cannot do so, then there is no point having this discussion.

Cuddles
23rd April 2007, 04:03 AM
Ah. That is part of the design that has changed since. The new one is:
Round 1: score equation = SF36v2 HRQOL (tentatively)
Round 2: score equation determined based on Round 1 data, default to SF36v2 again

No exploratory round.

Round 2 is not the same as round 1. Round 2 is based on data gathered in round 1. Therefore round 1 is exploratory. In addition, the JREF preliminary and final test have to follow the same protocol. As it stands, your protocol is not acceptable for the JREF challenge because you plan on changing the most important part of your test specifically to make it more likely to get a positive in the final test.

fls
23rd April 2007, 10:48 AM
There may be hope for you yet. :)

While you have disparaged my (and others') suggestions as not relevant, you have gone ahead and made some of the suggested changes.

"Cost of treatment" = total pre-insurance medical bills. Not really a subjective or 'soft' measure.

Height is not a soft measure either.

Do you really have any idea whether or not "cost of treatment" is related to improvement in health? Or even in which direction?

There are costs associated with asking too many questions - more drop outs for one. Now that you've chosen a reasonable outcome measure, I would suggest dropping all the miscellaneous questions you had on your list.

Explain what you mean re deaths? That was unclear. Preferably, propose an improvement to whatever flaw(s) you see...

This is the question you need to answer - will you find out about all of the deaths if you take a passive approach?

Ah. That is part of the design that has changed since. The new one is:
Round 1: score equation = SF36v2 HRQOL (tentatively)
Round 2: score equation determined based on Round 1 data, default to SF36v2 again

No exploratory round.

That's a good start. You will need information on how the SF36 performs in people with cancer. Does it discriminate between different levels of HRQOL or are most people clustered at one end (poor) of the scale? What's the variability, sensitivity, specificity, etc.?

I don't understand what you mean by "score equation determined based on Round 1 data". Wouldn't you just look for significant differences in the score between the two groups? Are you talking about adjusting the scores based on other variables? 'Cuz then it's back to being exploratory.

And yes, it would be decided beforehand; but choice of N is dependent on quite a number of things, some of which aren't yet settled.

Again, do you have a specific suggestion for what it should be and why rather than just non-constructive criticism?

Yes, I did make specific suggestions as to what and why. You should have a sample size that has an adequate power to detect a small effect. The "why" was contained in one of the paragraphs you dismissed.

Please make it one that is simable. Previous posters have claimed to have examples that, when simulated, turned out (per my argument) to be spurious.

I think with the changes that are being made, it will become a non-issue.

Linda

saizai
23rd April 2007, 03:34 PM
Round 2 is not the same as round 1. Round 2 is based on data gathered in round 1. Therefore round 1 is exploratory. In addition, the JREF preliminary and final test have to follow the same protocol. As it stands, your protocol is not acceptable for the JREF challenge because you plan on changing the most important part of your test specifically to make it more likely to get a positive in the final test.

No, the protocol is still the same: I decide what the test will be before the data is gathered, and the test has no way to know which group someone is in. I happen to be choosing the test arbitrarily in the case of the first round, and based on data acquired in the case of the second round. There is no test I can choose within that constraint that would invalidate proper protocol. If you believe there is, propose one.

The first round data is still valid if it turns out that the thing I was testing for turned out to show an effect.

While you have disparaged my (and others') suggestions as not relevant, you have gone ahead and made some of the suggested changes.

If you are referring to the two-round design, that is something I decided on a few months ago.

Height is not a soft measure either.

Do you really have any idea whether or not "cost of treatment" is related to improvement in health? Or even in which direction?

Who said it has to be? Suppose prayer makes things cheaper (or more expensive). :p

There are costs associated with asking too many questions - more drop outs for one. Now that you've chosen a reasonable outcome measure, I would suggest dropping all the miscellaneous questions you had on your list.

That would reduce data available for finding a good test to use in the second round. And it would reduce the data available for other purposes; I am not doing this just for the Challenge after all.

This is the question you need to answer - will you find out about all of the deaths if you take a passive approach?

No. But that's why you get the doctors' phone numbers and call 'em up if someone stops responding. And show them a document that the participant signed at the outset allowing them to release relevant medical information to you.

That's a good start. You will need information on how the SF36 performs in people with cancer. Does it discriminate between different levels of HRQOL or are most people clustered at one end (poor) of the scale? What's the variability, sensitivity, specificity, etc.?

*nod* And I don't have that yet. Hence "tentative".

I don't understand what you mean by "score equation determined based on Round 1 data". Wouldn't you just look for significant differences in the score between the two groups? Are you talking about adjusting the scores based on other variables? 'Cuz then it's back to being exploratory.

How to score round 2 is decided based on what happened in round 1 (e.g. if it turned out that from variables A..Z, variables F, H, and Y showed significant group differences in round 1, then you'd make a score equation for round 2 that averages F, H, & Y somehow). You are still only scoring round 2 data of course, and what you're calculating for the conclusion is still the difference between groups.

Yes, I did make specific suggestions as to what and why. You should have a sample size that has an adequate power to detect a small effect. The "why" was contained in one of the paragraphs you dismissed.

So what exactly do you believe to be an adequate sample size and why? Please show me the math you use to arrive at your proposed number.

fls
24th April 2007, 07:47 AM
Okay. I realize by saying this that I run the risk of appearing condescending. But I've thought about this quite a bit and I've decided that it's more important for me to say this than it is for me to worry if someone thinks I'm arrogant. And I also want to make it clear that I am not claiming that amateurs cannot do good science. There are many excellent examples from science fairs and other venues that prove otherwise.

Saizai, you do not understand what I have said. And this is my fault, as I assumed a certain level of technical knowledge on your part. But as it stands, your "experiment" cannot be considered a legitimate exercise. You have not made any reasonable a priori constraints on what you are looking for. You have made no attempt to choose variables that are a reliable and valid measure of what you claim to be attempting to measure. You have introduced a bias into your methods that increases the probability of obtaining a falsely significant result which will lead to two problems. The naive will falsely proclaim that the results show that prayer has an effect, and legitimate researchers will not be able to use your results for further study as they will be useless when there is no way to separate out spurious effects from possible real effects. There's more, but that's more than enough.

Your method of dismissing these concerns unless I demonstrate how they could cause a problem and how I would suggest they be fixed, only works if you are talking about fine-tuning an otherwise solid design. Otherwise, what you are basically asking is for me to design your study for you. Since this is exactly the kind of thing that I do, I could probably do this in my sleep (and my students/colleagues might claim that sometimes I did ;)). But frankly, with your attitude, it's like pulling teeth to make any sort of progress (assuming it is actually possible to make any progress - I'm not sure, now, that I've seen any in spite of 6 pages of attempts).

I realize that I can't stop you. But perhaps I can influence whether or not the JREF is involved. I will point out to the JREF that it takes advantage of vulnerable people to present this as a legitimate scientific endeavor. And I will also point out to them any biases or other methodologic problems that increase the possibility of obtaining a result that will be falsely presented as "significant" in order to prevent an unwarranted awarding of the prize.

Linda

Cuddles
24th April 2007, 08:36 AM
No, the protocol is still the same: I decide what the test will be before the data is gathered, and the test has no way to know which group someone is in. I happen to be choosing the test arbitrarily in the case of the first round, and based on data acquired in the case of the second round. There is no test I can choose within that constraint that would invalidate proper protocol. If you believe there is, propose one.

The first round data is still valid if it turns out that the thing I was testing for turned out to show an effect.

No, the protocol is clearly not the same. You say that you base part of round two on the results of round one. If anything about round two depends in any way whatsoever on round one, then it cannot be the same as round one. The preliminary and final in the JREF challenge must be exactly the same. Not similar. Exactly. If you do not assess both rounds in exactly the same way then you do not have a valid protocol for the JREF challenge.

As Linda says, you have had plenty of advice and criticism here. I suggest you start paying attention to it rather than dismissing everything out of hand. If you carry on in the same way I guarantee you will never be accepted for the challenge.

digithead
24th April 2007, 10:27 AM
This is not for Saizai but for anyone else that might be persuaded by his argument that that simple random sampling will overcome any confounding issues...

An article by Kernan, et al (1999) in the Journal of Clinical Epidemiology 52(1) did a simulation of the effect of simple randomization in the presence of an important or prognostic factor (e.g. things such as age, disease severity, etc.). Note that these examples were included in their article. To quote them:

To illustrate the chance that simple (unstratified) randomization may lead to treatment groups that are unbalanced with respect to a prognostic factor, consider a trial of two therapies in a disease with an important prognostic factor that is present in 15% of patients. The chance that the two treatment groups will differ by more than 10% for the proportion of patients with the prognostic factor is 33% for a trial of 30 patients, 24% for a trial of 50 patients, 10% for a trial of 100 patients, 3% for a trial of 200 patients, and 0.3% for a trial of 400 patients. (p. 20)

They also found that as the incidence of the prognostic factor increases, the false positive rate also increases. For an n of 30 and a factor present in 30% of patients, this false positive rate is 43%. For an n of 50, it drops to 38%. One has to have an n of 400 to drop the rate to 2%. Note that this is in the presence of only one confounder. It logically follows that more than one confounder increases this false positive rate.

So for small sample trials (n<400), simple randomization that does not account for confounders through stratification increases the risk of a false positive.

So unless Saizai is incredibly lucky and has a completely homogenous population in which to sample from, his experimental design and proposed sample size will likely increase the probability of a false positive, contrary to all of his claims and hand waiving.

saizai
24th April 2007, 02:56 PM
You have not made any reasonable a priori constraints on what you are looking for.

Such as? I repeatedly asked for specific examples.

You have made no attempt to choose variables that are a reliable and valid measure of what you claim to be attempting to measure. Such as? I repeatedly asked for specific examples.

And what do you think I am attempting to measure?

You have introduced a bias into your methods that increases the probability of obtaining a falsely significant resultSuch as? I repeatedly asked for specific examples.

Your method of dismissing these concernsI am dismissing only talk about what one might do with the results, as that isn't what I am interested in discussing here.

I have not dismissed anything about specific ways to improve the methodology, I have argued against them. My arguments have not been refuted, just ignored.

No, the protocol is clearly not the same. You say that you base part of round two on the results of round one. If anything about round two depends in any way whatsoever on round one, then it cannot be the same as round one. The preliminary and final in the JREF challenge must be exactly the same. Not similar. Exactly. If you do not assess both rounds in exactly the same way then you do not have a valid protocol for the JREF challenge.

Not so. They already say that they will be different in order to make it more difficult. That is a change, right? It's in the choice of N.

The protocol is identical: I choose, arbitrarily, a score equation at the beginning of the round. In terms of ensuring that there is no fallacy or fraud going on, it does not matter what I choose. You have ignored my repeated requests for an example of what I might choose that would cause an unusually high false positive rate.

An article by Kernan, et al (1999) in the Journal of Clinical Epidemiology 52(1) did a simulation of the effect of simple randomization in the presence of an important or prognostic factor (e.g. things such as age, disease severity, etc.). Note that these examples were included in their article. To quote them:

And I've repeatedly asked you to propose a stratified random sampling methodology that you believe would be better.

So far you have not.

P.S. It helps if you provide a link (http://www.jclinepi.com/article/PIIS0895435698001383/abstract) to the article, even if it's only available to the plebes in abstract.

digithead
24th April 2007, 03:13 PM
You asked us to critique your methodology, several of us who have significant experience with clinical trials and program evaluation have repeatedly showed you the errors in your design...

I repeatedly at the beginning of this thread implored you to seek the assistance of a biostatistician to help you work through the flaws...

We have now showed you clear evidence that you are wrong regarding your simple random sampling scheme...

Yet you have repeatedly dismissed all of our concerns, in some instances very arrogantly...

Why should any of us help you given your hubris?

saizai
24th April 2007, 03:49 PM
digithead - So I take it you're not going to propose an improvement via stratified random sampling?

digithead
24th April 2007, 04:01 PM
digithead - So I take it you're not going to propose an improvement via stratified random sampling?

No, there is nothing that I can propose for you as you obviously do not want to take any advice....

This board is entirely inappropriate place to provide you with experimental design advice beyond what we've already done...

We've shown you that your sampling design and sample size are inadequate for the hypothesis that you have and are prone to increasing false positives in the face of confounders, especially at the small differences that you expect...

What you do with that fact is entirely up to you...

I'll say it again despite all evidence that you cannot take any advice that does not conform to your belief that you can't possibly be wrong - seek the assistance of a biostatistician who will work with you throughout your entire clinical trial. Engaging this board as that person will not give you the level of service you so obviously need...

saizai
24th April 2007, 04:02 PM
Pity, I thought you might be up for constructive discussion rather than just bashing. Ah well.

I suppose that if I were proposing a stratified random sampling, you'd be bashing me for doing something that's 'too complicated', right? ;)

digithead
24th April 2007, 05:01 PM
Pity, I thought you might be up for constructive discussion rather than just bashing. Ah well.

I suppose that if I were proposing a stratified random sampling, you'd be bashing me for doing something that's 'too complicated', right? ;)

You're beyond help at this point...

I'm with Linda, I feel sorry for any of the patients that take part in your study. Hopefully, the hospital or medical staff that you will need to engage in this study will force you into some sort of IRB approval before you even recruit one patient because that's probably the only thing that will get you listen to criticism...

Startz
24th April 2007, 06:13 PM
This is not for Saizai but for anyone else that might be persuaded by his argument that that simple random sampling will overcome any confounding issues...

An article by Kernan, et al (1999) in the Journal of Clinical Epidemiology 52(1) did a simulation of the effect of simple randomization in the presence of an important or prognostic factor (e.g. things such as age, disease severity, etc.). Note that these examples were included in their article. To quote them:

So unless Saizai is incredibly lucky and has a completely homogenous population in which to sample from, his experimental design and proposed sample size will likely increase the probability of a false positive, contrary to all of his claims and hand waiving.

Let me say I have a deep suspicion that it is a good thing that JREF gets run by magicians rather than statisticians. And I will be shocked if a challenge ever comes out of this. Having said all that, these discussions sometimes provide a useful platform to clear up scientific matters.

Digithead has provided a citation from a scientific journal. The referenced article supports the notion that you get better results using stratification. But on the narrow point that Saizai has claimed, the article completely supports him. The authors write:
For trials with unstratified randomization, the erroneous finding of a statistically significant (P<0.05) difference between treatments occurred about 50 of 1,000 times in their computer simulation, regardless of endpoint rates in the constituent strata or sample size.

In other words, if you randomize as Saizai suggests and use a critical value for five percent test, you get a false positive five percent of the time - just as basic statistics tells you will happen.

-Dick Startz

articulett
24th April 2007, 06:46 PM
No, there is nothing that I can propose for you as you obviously do not want to take any advice....

This board is entirely inappropriate place to provide you with experimental design advice beyond what we've already done...

We've shown you that your sampling design and sample size are inadequate for the hypothesis that you have and are prone to increasing false positives in the face of confounders, especially at the small differences that you expect...

What you do with that fact is entirely up to you...

I'll say it again despite all evidence that you cannot take any advice that does not conform to your belief that you can't possibly be wrong - seek the assistance of a biostatistician who will work with you throughout your entire clinical trial. Engaging this board as that person will not give you the level of service you so obviously need...

Just want to thank you for your posts...I'm learning a lot--it's been a while since I've done statistics--I know that the OP cannot hear you, but you are edifying others, I assure you.

It's always a bad sign when someone makes an assertion and then says "prove me wrong"--as though that had anything to do with whether they were "right".

digithead
24th April 2007, 08:58 PM
Let me say I have a deep suspicion that it is a good thing that JREF gets run by magicians rather than statisticians. And I will be shocked if a challenge ever comes out of this. Having said all that, these discussions sometimes provide a useful platform to clear up scientific matters.

Digithead has provided a citation from a scientific journal. The referenced article supports the notion that you get better results using stratification. But on the narrow point that Saizai has claimed, the article completely supports him. The authors write:


In other words, if you randomize as Saizai suggests and use a critical value for five percent test, you get a false positive five percent of the time - just as basic statistics tells you will happen.

-Dick Startz

No, their article makes that claim for large sample trials (n>400). Their simulations clearly demonstrate that controlling for confounding by stratification is necessary for small sample trials. They find that the false positive rate increases when there is confounding in those trials...

They even have a clear delineation on how to proceed depending on the circumstances and endpoints...

digithead
24th April 2007, 09:07 PM
As a matter of fact, they finish their article with 10 guidelines, the first being:

For superiority trials that seek to demonstrate the superiority of one therapy over another, consider stratified randomization when the overall sample size for a trial is small (,200 patients per treatment arm) or when interim analyses or subgroup analyses are planned that will involve small samples of a larger cohort. p25

And Randi consults with statisticians, read the challenge rules....

saizai
24th April 2007, 09:51 PM
FWIW, digithead, I don't currently have access to the fulltext of the article. If you can pass me a PDF (or txt) I will read and respond.

Startz
24th April 2007, 11:05 PM
No, their article makes that claim for large sample trials (n>400). Their simulations clearly demonstrate that controlling for confounding by stratification is necessary for small sample trials. They find that the false positive rate increases when there is confounding in those trials...

They even have a clear delineation on how to proceed depending on the circumstances and endpoints...

While I suspect that the number of forum members who find this a useful way to learn statistics is small, let me try to set the record straight anyhow.

1. Non-stratified sampling does not affect the size of a test. It does affect power. Therefore, if the Challenge standard is no more than 1/1000 (or whatever) of a false positive, randomization works fine. If one is trying to get a better research design, then power also matters. For this, stratification can help.

2. Nothing in (1) is affected by sample size, because the critical value for a test adjusts for sample size.

(1) and (2) are just facts of mathematics. They don't really admit of argument. In contrast, legitimate objections to the outcome of simulations are that what's being simulated doesn't correspond to the experiment being run. What's important is that the math correspond to the experiment being run, which isn't always the experiment that we think we see.

There's a long history of scientists being fooled when investigating "paranormal" events. I suspect statisticians are as vulnerable as physicists. That's the source of my opinion (which is not a fact of mathematics) that magicians and those giving magicians a skeptical assist are awfully important.
-Dick Startz

69dodge
25th April 2007, 01:34 AM
An article by Kernan, et al (1999) in the Journal of Clinical Epidemiology 52(1) did a simulation of the effect of simple randomization in the presence of an important or prognostic factor (e.g. things such as age, disease severity, etc.).

[...]

They also found that as the incidence of the prognostic factor increases, the false positive rate also increases. For an n of 30 and a factor present in 30% of patients, this false positive rate is 43%.


I'm also interested in seeing the full paper, if possible.

What is the definition of a positive result? If it's defined, as usual, so that 5% is the probability of a positive result given the null hypothesis, then of course, on the null hypothesis, 5% of results will be positive. So, what, exactly, was 43% of what?

I bet that the statistical test they use, which defines "positive result", is based on a null hypothesis that assumes no confounding factors. (What? You mean, not everything is normally distributed? :D) So, then, where there are confounding factors, the statistics come out wrong.

Saizai, you also need to be more specific about what the definition of a positive result is, in your setup. I understand that each patient will get a score that is a number between 0 and 100. So, you'll have a bunch of numbers for the patients who were prayed for, and another bunch of numbers for the patients who weren't prayed for. Now what? How do you get a binary positive/negative decision out of all these numbers?

To satisfy the JREF, I think you'll need to use some sort of non-parametric test, which makes no assumptions about the overall distribution of the scores, but assumes only that the patients were randomly assigned.

digithead
25th April 2007, 03:38 AM
1. Non-stratified sampling does not affect the size of a test. It does affect power. Therefore, if the Challenge standard is no more than 1/1000 (or whatever) of a false positive, randomization works fine. If one is trying to get a better research design, then power also matters. For this, stratification can help.
If this were true then why ever adjust or stratify for a confounder?


2. Nothing in (1) is affected by sample size, because the critical value for a test adjusts for sample size.
You should really look at some of the Bayesian thoughts on this, anything by James Berger out of Duke should help you...

(1) and (2) are just facts of mathematics. They don't really admit of argument. In contrast, legitimate objections to the outcome of simulations are that what's being simulated doesn't correspond to the experiment being run. What's important is that the math correspond to the experiment being run, which isn't always the experiment that we think we see.
No, they're also facts of clinical trials. Do you really think that pure randomization will handle clinically relevant factors such as age, disease severity, disease treatment, socioeconomic status, smoking status, comorbid conditions, etc. that have proven to be important in nearly every clinical trial?

And the paper I submitted for proof only discussed the presence of 1 clinically relevant factor. Stratification is an absolute necessity in small sample trials.

There's a long history of scientists being fooled when investigating "paranormal" events. I suspect statisticians are as vulnerable as physicists. That's the source of my opinion (which is not a fact of mathematics) that magicians and those giving magicians a skeptical assist are awfully important.
-Dick Startz
Absolutely correct but irrelevant. There are also people who think of statistics as a sort of magic that conveys a sacred advancement into the realm of believability. Do any of my statements and suggestions make you believe that I am credulous when it comes to statistical design?

Seriously, this guy wants to design a trial to test the power of intercessory prayer on disease outcomes. But it doesn't matter if he were testing a new drug designed to eliminate AIDS, the clinical trial protocols are the same. Regardless of what you're trying to test, you need to adjust for clinically relevant factors that can enhance or obscure an effect. You need to isolate the effect of just the treatment in question from all of the other things that could also explain it. Simple randomization does not do this unless he samples from a homogenous population which is about as likely as winning the powerball.

He has multiple outcomes with the QOL instrument he's using. He also is collecting data regarding costs, etc. to see if IP has an effect. Can anyone here really support his decision to ignore clinically relevant factors such as age, disease severity, disease treatement, etc. given what his sample size (n=50) and experimental design (t-tests) are? Does anyone besides Saizai think that this is a good design?

Cuddles
25th April 2007, 03:49 AM
You have not made any reasonable a priori constraints on what you are looking for.

You have made no attempt to choose variables that are a reliable and valid measure of what you claim to be attempting to measure.
Such as? I repeatedly asked for specific examples.

You want us to give you examples of something you haven't done? Linda's point was that you have not made any constraints. How can she possibly give you an example of something that isn't there?

Not so. They already say that they will be different in order to make it more difficult. That is a change, right? It's in the choice of N.

The protocol is identical: I choose, arbitrarily, a score equation at the beginning of the round. In terms of ensuring that there is no fallacy or fraud going on, it does not matter what I choose. You have ignored my repeated requests for an example of what I might choose that would cause an unusually high false positive rate.

Exactly. You choose a scoring system at the start of each round. That it, you score each round in a different way. The rounds do not have the same protocol. It is not at all the same as chaning N. All that means is that you are following the same protocol, but you follow it a different number of times. In fact, it has been debated many times that N could be exactly the same for both. If the preliminary has a 1/1000 chance and the final has a 1/1000 chance the the total chance of winning is 1/1,000,000. In your case you are proposing changing the very thing that determines whether you win or lose. How do you not understand that this could never be acceptable?

fls
25th April 2007, 11:15 AM
You want us to give you examples of something you haven't done? Linda's point was that you have not made any constraints. How can she possibly give you an example of something that isn't there?

I got a kick out of that as well. :)

Linda

Thinktoomuch
26th April 2007, 11:32 PM
I got a kick out of that as well. :)



...where?:D

Jokes aside. You say that Saizai can't do it while you can design a solid study in your sleep. Fine. Can you please specify:
- how much would you charge to provide the design
- how much do you estimate the cost of conducting the study would be.
Then we can talk practicalities.
The paradox of these chat rooms is that they are next to useless without competent professionals and professionals that join them jeopardise their credibility by stooping to the amateurs' level.

Gulliver
26th April 2007, 11:51 PM
saizai,

Please allow me to join the debate late. I believe that I may have a fresh outlook on the task you present for us here. I hope that I don't throw out the baby with the bathwater.

I have a bit of a reputation around this Forum of proposing radical changes and my own challenges, replete with cash incentives. Let's all work together to build the best protocol.

Your goal is to show that prayer influences health, even at great distances. I'd suggest that you hypothesize that prayer even improves health.

First, I suggest that you've selected your populations poorly. Selecting those suffering from disease and those using the Internet and those willing to volunteer greatly limits your ability to use randomization to avoid confounding effects. The power of your tests will suffer accordingly. Why not select healthy individuals from a controlled population, such as a prison, a church, or a university?

Second, I suggest that you've selected your outcomes poorly. Selecting pain, for example, is highly subjective and readily confounded. As a terminal cancer patient (not quite end-stage, thank you.), I can tell you that my pain varies based on whether a friend called today, the amount sunshine, how long my palliative drugs have been in the refrigerator (shelf life issue), how long it's been since my chemotherapy (the chemo has a palliative benefit too.), and a host of other items. You must accept one of: 1) a very large sample size (Don't! It's too expensive.), 2) stratification (Don't. It's too complicated.), or 3) a new outcome that you can readily and accurately measure (Do this!) Why not select blood pressure (Give everyone an electronic BP cuff.)? How about mental acuity (Have everyone take a web-base skill test.)?

Third, I suggest that you're not blind enough. Your studies include a great deal of cognitive science. You know about how we deceive and trick even ourselves into believing what we want. You, indeed absolutely no one, should know that assignments until the end of the trials. You must not be assigning individuals based on Round 1 results into groups in Round 2. Totally blind your studies. It's the only way to go, really! I know it may seems hard, but we can help. It's really easier than you think. (Randi, by the way has a "magical" way of "divining" such blinded studies that are just "amazing"!)

Fourth, I urge you to go with a A-B study to reduce the confounding effects. Assign half of the receivers to Group A, half to Group B. Assign the providers to A. Run one trial. Measure all receivers. Assign the providers to B (tell them that A have been cured, killed, died, forgotten, converted to Pastamania, or something not so cruel or funny.) Run one trial. Measure all receivers. Now ANOVA. If you assume (and you really can't, but hey there's still the randomization) that no confoundering variables were more likely to occur during one trial than the other, you've blocked the confounding effects.

To tie a bow around it:
1) You and I design a simple web-based intelligence test.
2) You and I design a simple survey that asks the questions (first name, age, gender, eye color, nose length, whatever, and verified email address) the answers to which (except for the email address) we'd like to provide to the providers, and a second survey to get email addresses of providers.
3) I drive 50 miles north on a Sunday morning and visit a number of churches, posting fliers asking them to visit a certain website to register for the study as receivers.
4) I drive 50 miles south on the next Sunday morning and visit a number of churches posting fliers asking them to visit another certain website to register for the study as providers.
5) The computer programs (I write them and you review them) implement our protocol, sending the correct emails to the receivers and providers at the appropriate times and after two months sends both of us an email of the resultant data. I provide all needed software, hardware, and domain services. I maintain complete lockdown of the machines, the code, and the data. You do not get to know the website names or towns or churchs. Indeed, we do not get any data, except for a daily "heartbeat" report listing number of registrations, tests, emails sent, and days remaining in the current step) until after the test's conclusion.
6) If the test shows a significant positive effect of prayer, then you win $1000 (of my own money) and my support in applying for MDC. If not, you agree to cease all requests for donations on any website forever more regarding any paranormal claim, especially the healing effects of prayer.

By the way, I am tempted to offer $5 to anyone who improves by better than average in test 2 over test 1 or test 3 over test 2. Perhaps it should be $100 to the church with the person with the best improvement during a trial and $100 to the church of the provider associated with the best provided-for improvement. I'm still considering the ramifications, such as collecting information about where to send checks, and the effect on my checking account.

I believe that you'll do us both a favor by taking some serious time to really consider this proposal. I believe that you won't find more support anywhere than in the text above.

Determinedly,
Gulliver

Thinktoomuch
27th April 2007, 12:27 AM
What timing! Gulliver, it looks like you have saved me some money (assuming that Linda does not find fault with your design also ....:D )

Happy to shore up your checking account if this goes anywhere.
(Based on previous examples, you will excuse me for not running to the cheque book just yet...)

Gulliver
27th April 2007, 01:16 AM
What timing! Gulliver, it looks like you have saved me some money (assuming that Linda does not find fault with your design also ....:D )

Happy to shore up your checking account if this goes anywhere.
(Based on previous examples, you will excuse me for not running to the cheque book just yet...)

Thanks so much for support, and so quickly too. Now if I had just prayed for such a miracle, we might have something :)

Oh, and there's definitely fault to be found in the proposal, but time and the Forum's kind members will address most of it. (And I do subscribe to pretty-good-now is better than perfect-never philosophy.)

Most gratefully,
Gulliver

69dodge
27th April 2007, 04:09 AM
You, indeed absolutely no one, should know that assignments until the end of the trials.


I don't think he will. His computer will.

(I don't know what would stop him from asking his computer, though. He should clarify this.)

You must not be assigning individuals based on Round 1 results into groups in Round 2.


The rounds involve different individuals. No one participates in both rounds.

He plans to look at the results of round 1 to see what things prayer apparently influenced the most, and then he predicts that in round 2 it will influence the same things in totally different people.

Fourth, I urge you to go with a A-B study to reduce the confounding effects. Assign half of the receivers to Group A, half to Group B. Assign the providers to A. Run one trial. Measure all receivers. Assign the providers to B (tell them that A have been cured, killed, died, forgotten, converted to Pastamania, or something not so cruel or funny.) Run one trial. Measure all receivers. Now ANOVA. If you assume (and you really can't, but hey there's still the randomization) that no confoundering variables were more likely to occur during one trial than the other, you've blocked the confounding effects.


That makes a lot of sense.

Cuddles
27th April 2007, 04:12 AM
1) You and I design a simple web-based intelligence test.

I question the choice of the ability to take an intelligence test as a valid metric. There are far too many things that will affect this, even if you ignore the risk of deliberate cheating. Just as many things affect pain, so do many things affect people's ability to take tests, time of day, state of mind, hunger, thirst, caffeine, etc.. In addition, it is fairly well established that IQ tests are not actually valid tests of IQ because people learn to better at them. Given that, testing to see if people do better at what is effectively an IQ test given lots of time and practice seems virtually guaranteed to find a difference.

Finally, you have not solved the most important objection to Saizai's claim. Neither his test nor yours will provide proof of anything. The whole point of Randi's challenge is that it provides undisputed proof that the applicant can do what they claim, although the actual method of doing so can still be disputed. For example, if a dowser can find which bucket has water hidden under it 19/20 times, it certainly shows that they can find water, although does not prove that dowsing itself actually works. However, a study that shows a statistically significant difference between two groups proves absolutely nothing, all it shows is that something interesting might be happening and more research could be needed. Neither Saizai's test or yours will ever be acceptable as challenges for the million because they are just not challenges, they are simply medical studies.

Jekyll
27th April 2007, 04:25 AM
By the way, I am tempted to offer $5 to anyone who improves by better than average in test 2 over test 1 or test 3 over test 2. Perhaps it should be $100 to the church with the person with the best improvement during a trial and $100 to the church of the provider associated with the best provided-for improvement. I'm still considering the ramifications, such as collecting information about where to send checks, and the effect on my checking account.

This would be enough incentive for me to intentionally cheat by answering badly on the first test. I think that cash prizes should probably be avoided unless you control for this.

fls
27th April 2007, 04:37 AM
...where?:D

Jokes aside. You say that Saizai can't do it while you can design a solid study in your sleep. Fine. Can you please specify:
- how much would you charge to provide the design
- how much do you estimate the cost of conducting the study would be.
Then we can talk practicalities.
The paradox of these chat rooms is that they are next to useless without competent professionals and professionals that join them jeopardise their credibility by stooping to the amateurs' level.

I'm sorry. I can't tell how I'm supposed to take this. Perhaps it's just my pre-coffee state.

Linda

Gulliver
27th April 2007, 12:17 PM
First and foremost, I thank you for your reply. I learn a great deal from such comments, and yours are most kind as well.
I don't think he will. His computer will.

(I don't know what would stop him from asking his computer, though. He should clarify this.)
May I clarify please? I intend that saizai will not be able to obtain any data except for the heartbeat email. I'll maintain the computer involved. I won't tell him the websites involved. I won't give him access at any time. The fliers will contain a "password" for the church involved. Without the password, no one will be able to register. I intend not to disclose the towns or churches or websites and to maintain a complete and secure lock-down on the computer involved. We get the results only after the complete test. I even believe that we can agree on the statistical test before the first step. I'd like to have my computer set up before the lock-down to run the test automatically and email the results without further human intervention.

The rounds involve different individuals. No one participates in both rounds.

He plans to look at the results of round 1 to see what things prayer apparently influenced the most, and then he predicts that in round 2 it will influence the same things in totally different people.

That's a fair gig. I should have done better on this point. To improve my point, please let me say that saizai might still accidentally cause a bias. Let's say he notices that men are less likely to improve with prayer. He might assign more women to the receiving group, even when random assignment is used. I've actually made this mistake once. In a graduate level experimental design course, I rejected certain coin tosses (fell on the floor, didn't rotate enough, didn't catch it right, etc.) whenever I didn't like the result. I rather sure that I wasn't intentionally introducing the bias, but the videotape sure made it look that way. My professor was happy to have proven her point on two of the ten students. I was saddened to earn a poor grade for my stupidity.

Most gratefully,
Gulliver

Gulliver
27th April 2007, 12:46 PM
Cuddles,

As I recall we've worked to great results before. Consider, for example, our teaming on dealing with Dargo. May I please have your kindness to review this carefully? I believe in you and your abilities, but find your responses here difficult to accept. I will do my best to respect your comments and to appreciate the effort you must have made to share with me your insights.

I question the choice of the ability to take an intelligence test as a valid metric. There are far too many things that will affect this, even if you ignore the risk of deliberate cheating. Just as many things affect pain, so do many things affect people's ability to take tests, time of day, state of mind, hunger, thirst, caffeine, etc.. In addition, it is fairly well established that IQ tests are not actually valid tests of IQ because people learn to better at them. Given that, testing to see if people do better at what is effectively an IQ test given lots of time and practice seems virtually guaranteed to find a difference.

I must respectfully disagree. Intelligence tests are well established as not varying under many conditions. Caffeine intake is probably the most notorious, but please reference http://www.springerlink.com/content/t25366v1554q33m0/ for an interesting summary of a recent article in the peer-reviewed _Cellular and Molecular Life Sciences (CMLS)_. Even given your claim, won't the A-B nature of the experimental design eliminate any influence? Let's say 10 out of 100 subjects cheat to improve their performance, by whatever means. Since they don't know in which trial they're receiving prayer support, they can't bias the result.

Finally, you have not solved the most important objection to Saizai's claim. Neither his test nor yours will provide proof of anything. The whole point of Randi's challenge is that it provides undisputed proof that the applicant can do what they claim, although the actual method of doing so can still be disputed. For example, if a dowser can find which bucket has water hidden under it 19/20 times, it certainly shows that they can find water, although does not prove that dowsing itself actually works. However, a study that shows a statistically significant difference between two groups proves absolutely nothing, all it shows is that something interesting might be happening and more research could be needed. Neither Saizai's test or yours will ever be acceptable as challenges for the million because they are just not challenges, they are simply medical studies.

Now, Cuddles, I really think you need to sit in a cozy armchair and ponder your statements above. You're much too smart to make this claim. We aren't interested in how it works. We are concerned that after eliminating other factors can we show with statistical accuracy that a paranormal force effected the outcome under controlled situations. If so then the claimant deserves our further consideration. I would, of course, defer to JREF for the next test if this one shows a positive result. I would not claim to have proven anything.

If you insist on maintaining your position, would you please provide a quote from the JREF FAQs or instructions that eliminates "medical studies" as "just not challenges"?

I sincerely hope that we can disagree here and maintain our professional friendship. You have my respect, and I ask your indulgence to allow me to disagree so tersely.

With real gratitude,
Gulliver

Gulliver
27th April 2007, 12:58 PM
This would be enough incentive for me to intentionally cheat by answering badly on the first test. I think that cash prizes should probably be avoided unless you control for this.

I must agree with you in many ways. I can see your point. While I'm only tempted to make cash incentives a part of the protocol, I remain undecided. To counter the harm you so clearly expressed, I offer that we might double our sample size with only $200 in incentives. Since the cheaters wouldn't know in which A-B trial they're receiving prayer, they could not bias the outcome.

(Full Disclosure: I'm a theist. I know. I know. But I am not religious and I definitely don't believe in any type of divine intervention. It's difficult for me to write the next few sentences.)

I suggest that cheaters could create bad data points, assuming that a fair God wouldn't intercede for a cheater, even when receiving prayer support. This problem would reduce the test's ability to detect legitimate effects on those who don't cheat. So cheaters, under this assumption, hurt, not help saizai's case.

I'll keep pondering the point, but I leaning toward no cash incentives.

Better educated now,
Gulliver

69dodge
28th April 2007, 05:54 AM
May I clarify please? I intend that saizai will not be able to obtain any data except for the heartbeat email. I'll maintain the computer involved. I won't tell him the websites involved. I won't give him access at any time.


Right, I understand that this is what you're proposing.

It seemed that you were criticising his, different, proposal (as found on his website www.prayermatch.org), and I thought that the criticism wasn't necessarily warranted. Even under his own proposal, he claims that he won't know which patients get prayed for and which don't, until the end of the study.

Let's say he notices that men are less likely to improve with prayer. He might assign more women to the receiving group, even when random assignment is used.


Yes, that is exactly the sort of thing he intends to do. But why would it be a problem? I think that prayer can't improve anyone's health. A demonstration that prayer improved the health of women would still be remarkable.

I've actually made this mistake once. In a graduate level experimental design course, I rejected certain coin tosses (fell on the floor, didn't rotate enough, didn't catch it right, etc.) whenever I didn't like the result.


That's different, I'm pretty sure. The decision to reject a particular coin toss was based, at least in part, on the result of that coin toss.

But suppose you had two rounds of coin tossing. In the first round, coins that land on the floor happen to be mostly heads. You are aiming for tails, say. Therefore, you decide to ignore, in the second round, all coins that land on the floor. This would not be a problem. Assuming that coins which land on the floor have a 50% chance of coming up heads, you haven't increased your chances of getting lots of tails in the second round, by choosing to ignore coins that land on the floor.

Similarly, assuming prayer doesn't work for anyone, saizai can't increase his chances of getting a positive outcome from his study, by limiting the second round to women. Of course, if prayer does work for women, he can. But that's ok---we want him to, in that case!

Thinktoomuch
28th April 2007, 07:14 AM
I'm sorry. I can't tell how I'm supposed to take this. Perhaps it's just my pre-coffee state.

Linda
My apologies, I was too curt and possibly ambiguous. Perhaps it is my post-senility state.;)
I'll try to explain what prompted me to butt in and blurt out as I did. I am getting progressively impatient with trying to scrape little nuggets of wisdom in an ever increasing pile of trivialities. After scanning six pages of running around in circles, I saw that you finally put your foot down in the name of professionality, therefore, I tried to address you from finance professional to academic professional. Gulliver's intervention provided what appeared to be a viable alternative to my hazy plan, but the flak that he is copping now (I had not realised he was a theist, by the way, this is not going to make things easy for him in this context) suggests that this is another dead end. If Saizai wants to go that way and you are happy to vouch for the protocol, however, I am quite happy with it. This will require a lot of running around, though.

So, here is my plan to cut the c*ap.

- You do what you professionally do and give me a quote of what your honorary would be for providing a JREF-proof protocol and an estimate of how much it would cost to conduct the study properly. If you decline to do so, please also refrain from flogging again a dead horse on these pages.

- If your honorary is, as I expect, reasonable (I would like to think that I am generous but not stupid, and rely on the fact that you deemed appropriate to spend a lot of time on this for nothing), I will offer Saizai to pay for it if he commits to conducting the study at the required standard and following your protocol. If he is not prepared to do so, there is no more point for anybody here to continue with the trivialities, as he is obviously not in good faith as regards his presence in this forum.

Again, I do not believe that my bank account is seriously threatened.

William Smith
28th April 2007, 07:37 AM
I love it when the plot thickens.

Jekyll
28th April 2007, 12:30 PM
I suggest that cheaters could create bad data points, assuming that a fair God wouldn't intercede for a cheater, even when receiving prayer support. This problem would reduce the test's ability to detect legitimate effects on those who don't cheat. So cheaters, under this assumption, hurt, not help saizai's case.

I'll keep pondering the point, but I leaning toward no cash incentives.

Better educated now,
Gulliver
Well that's an interesting idea, but even if god did disown cheaters I'm not sure that He would disown people who deliberately did badly in IQ tests so that their church receives additional donations.

Anyway, baring the big G. taking personal offence, cheaters will only serve to increase the noise of the data and not skew it
in any particular direction which should make it harder for S. to win if there is an effect, and not make any difference otherwise. It's just something to keep in mind when you're planning.

fls
29th April 2007, 06:59 AM
My apologies, I was too curt and possibly ambiguous. Perhaps it is my post-senility state.;)
I'll try to explain what prompted me to butt in and blurt out as I did. I am getting progressively impatient with trying to scrape little nuggets of wisdom in an ever increasing pile of trivialities. After scanning six pages of running around in circles, I saw that you finally put your foot down in the name of professionality, therefore, I tried to address you from finance professional to academic professional. Gulliver's intervention provided what appeared to be a viable alternative to my hazy plan, but the flak that he is copping now (I had not realised he was a theist, by the way, this is not going to make things easy for him in this context) suggests that this is another dead end. If Saizai wants to go that way and you are happy to vouch for the protocol, however, I am quite happy with it. This will require a lot of running around, though.

Gulliver's protocol is also inadequate.

So, here is my plan to cut the c*ap.

- You do what you professionally do and give me a quote of what your honorary would be for providing a JREF-proof protocol and an estimate of how much it would cost to conduct the study properly.

I have been willing to volunteer my services, as do other professionals who participate in this forum and the JREF. Saizai has made it clear that he is not interested in the study that I would recommend, making issues of payment irrelevant.

If you decline to do so, please also refrain from flogging again a dead horse on these pages.

When dictating to me what behaviour of mine is unacceptable, it would help if you were specific. At this point, all I can tell is that I am not allowed to provide criticism. Specific standards will allow me to assess the feasibility of implementation. Alternate means of assisting you to deal with your disturbance over my posts may be recommended.

- If your honorary is, as I expect, reasonable (I would like to think that I am generous but not stupid, and rely on the fact that you deemed appropriate to spend a lot of time on this for nothing), I will offer Saizai to pay for it if he commits to conducting the study at the required standard and following your protocol. If he is not prepared to do so, there is no more point for anybody here to continue with the trivialities, as he is obviously not in good faith as regards his presence in this forum.

Again, I do not believe that my bank account is seriously threatened.

I think you've made your point.

Linda

Thinktoomuch
29th April 2007, 08:54 AM
Quote:
If you decline to do so, please also refrain from flogging again a dead horse on these pages.


When dictating to me what behaviour of mine is unacceptable, it would help if you were specific. At this point, all I can tell is that I am not allowed to provide criticism. Specific standards will allow me to assess the feasibility of implementation. Alternate means of assisting you to deal with your disturbance over my posts may be recommended.



By interpreting my pleading as "dictating" to you and an indication of my "disturbance" over your posts you suggest a personal antagonism that was not intended. I leave to the other readers to see for themselves in what proportion this misunderstanding is due to my expression or to your excessive sensitivity. My apologies regardless. You became the subject of my request because of your claim that writing protocols is what you do for a living. It would have been directed to any other who had said so.

It was obvious to me that if you did not want to recommend a protocol it would be because you are convinced Saizai "is not interested in the study that you would recommend", therefore, my conclusion was that there would be no point in continuing with useless criticism, hence the vernacular flogging of the dead horse. If you disagree, I would be interested in your reasoning: I see it only as demeaning.

In any case, I am sincerely grateful for your response. Just by the act of responding you have forced Saizai to either contradict you and show interest in a scientifically valid protocol if it is provided to him at no cost, or stop lying about being in this forum to learn, in which case I would expect all sensible people to also stop giving useless advice. Either way my objective of attempting to raise a little the usefulness of the forum has been achieved. Whether my attempt is actually effective and the desired result is actually achieved, time will tell. As you say, I have made my point.

fls
29th April 2007, 10:36 AM
If you decline to do so, please also refrain from flogging again a dead horse on these pages.
When dictating to me what behaviour of mine is unacceptable, it would help if you were specific. At this point, all I can tell is that I am not allowed to provide criticism. Specific standards will allow me to assess the feasibility of implementation. Alternate means of assisting you to deal with your disturbance over my posts may be recommended.
By interpreting my pleading as "dictating" to you and an indication of my "disturbance" over your posts you suggest a personal antagonism that was not intended. I leave to the other readers to see for themselves in what proportion this misunderstanding is due to my expression or to your excessive sensitivity.

Excessive sensitivity?!!! Let's look at this objectively. If we define the "dead horse" in this case as "driving home the point that posters are not seriously considering what others are saying" (specifically of the form "Saizai you are not listening/sincere" and "[everyone else] is not offering useful suggestions" (from Saizai)), then it really got going on page 6. Since then there have been 21 posts that could reasonably be characterized as "dead horse beating". Out of that total, 3 posters have contributed 2 DHBP (Dead Horse Beating Posts), 2 have contributed 3 DHBP, 1 contributed 4, and 1 contributed 5, making the average contribution 3 DHBP. I contributed 2 DHBP, which is not only out of the top three, it's also below average. Therefore, for you to single me out as the only recipient of your chastizement demonstrates a directed attack that was quite undeserved.

:)

Seriously though, even though I know better, it's hard for me to resist taking advantage when the way something is expressed allows for two different interpretations. I think it's out of my system now. ;)

My apologies regardless. You became the subject of my request because of your claim that writing protocols is what you do for a living. It would have been directed to any other who had said so.

It was obvious to me that if you did not want to recommend a protocol it would be because you are convinced Saizai "is not interested in the study that you would recommend", therefore, my conclusion was that there would be no point in continuing with useless criticism, hence the vernacular flogging of the dead horse. If you disagree, I would be interested in your reasoning: I see it only as demeaning.

I agree, but I think you also are beating the dead horse, even though you are presenting it as an attempt to move the discussion forward. You got your dig in by stating that you didn't think Saizai would accept your offer. And I suspect that you didn't really think I would take you up on your offer, either.

In any case, I am sincerely grateful for your response. Just by the act of responding you have forced Saizai to either contradict you and show interest in a scientifically valid protocol if it is provided to him at no cost, or stop lying about being in this forum to learn, in which case I would expect all sensible people to also stop giving useless advice. Either way my objective of attempting to raise a little the usefulness of the forum has been achieved. Whether my attempt is actually effective and the desired result is actually achieved, time will tell. As you say, I have made my point.

Maybe there is something to salvage. It is often hard to tell just exactly when someone has made it clear that they are insincere. Most of the time I think that it is best to give people the benefit of the doubt. It would probably lose a cost-benefit analysis, though.

Linda

Gulliver
29th April 2007, 07:06 PM
Gulliver's intervention provided what appeared to be a viable alternative to my hazy plan, but the flak that he is copping now (I had not realised he was a theist, by the way, this is not going to make things easy for him in this context) suggests that this is another dead end.
While I thank you for your concern, I ask that you not allow my theism to worry you. I have no difficulty compartmentalizing (is that a word?) my non-scientific beliefs from the real world. I do, however, make every effort to disclose fully my weaknesses but other rely on me.

To clarify, I am not religious, do not attend any religious gatherings, do not pray, and never expect any intervention for any supernatural beings. I just conclude that there is not that is truth than we can prove from the observations of nature. I believe, but do not know, that this includes the existence of another intelligence.

I commit to you to behave with the utmost integrity in all scientific or rational efforts.

Earnestly,
Gulliver

Gulliver
29th April 2007, 07:46 PM
Gulliver's protocol is also inadequate.
...
Linda
I, of course, appreciate your taking time to review my protocol.

Not wasting any time, I had asked Friday for independent reviews by a full professor, Ph. D., M.D. of neuroscience and an associate professor, M. D. of family medicine (to complement my Ph. D. in operations research). I'm happy to report that both were excited about the test. The neuroscientist is eager to modify the Boston test to this protocol. Both gave their preliminary approval on the design, calling it ingenious. (*blush*)

(I'm having a tough time convincing them that we have to wait on saizai. I suspect that we're going to run this protocol without him unless he gets moving. The family doctor sits on a subcommitte of the University's Human Subject's Review Board and has already sent me the release text (for the website to display during registration) they'd require in order to get the University's support.)

I believe that this won't be the first time that an applicant has seen someone else take over his idea.

(Oh and just to clarify, we all three believe that odds that the test will show a positive is about the same as p.)

Eagerly,
Gulliver

fls
29th April 2007, 09:34 PM
I, of course, appreciate your taking time to review my protocol.

Not wasting any time, I had asked Friday for independent reviews by a full professor, Ph. D., M.D. of neuroscience and an associate professor, M. D. of family medicine (to complement my Ph. D. in operations research). I'm happy to report that both were excited about the test. The neuroscientist is eager to modify the Boston test to this protocol. Both gave their preliminary approval on the design, calling it ingenious. (*blush*)

The concerns I had with your protocol included the use of outcome tests of unknown reliability, validity or relevance, the possibility of a carryover effect (which you will likely be unable to exclude) biasing your results because of the crossover design, a highly selected population making generalizing or drawing conclusions difficult, and the high probability that the study will contribute nothing to the advancement of knowledge in this area. It would be inadequate to lay the question to rest if negative. And if positive it suffers from the same issue as Saizai's study - false positives are far more likely than true positives, making it difficult to know whether there is anything worth pursuing further. And without measuring relevant outcomes, it would be hard to know what a true-positive even means (e.g. if it's even a good thing).

It's not my time and money (at least, I hope it's not my tax dollars ;)), though. And I'm starting to feel very much like a third-wheel on this thread since this is clearly up to you and your colleagues now. :)

I'm having a tough time convincing them that we have to wait on saizai. I suspect that we're going to run this protocol without him unless he gets moving. The family doctor sits on a subcommitte of the University's Human Subject's Review Board and has already sent me the release text (for the website to display during registration) they'd require in order to get the University's support.)

I believe that this won't be the first time that an applicant has seen someone else take over his idea.

(Oh and just to clarify, we all three believe that odds that the test will show a positive is about the same as p.)

Eagerly,
Gulliver

Good luck.

Linda

Gulliver
29th April 2007, 11:34 PM
fls,

I can't help but notice that you're using the past tense. I'll assume that you meant to use the past perfect (happening now and in the past.)

You'd didn't ask me to reply regarding your concerns. I'm not sure that we can repair our relationship enough to make a discourse here useful, but I'm willing to try. I've dealt with many experts over my tenure in the academia who seem to own only a sniper's rifle. I hope that you're willing to use another tool of a more cooperative nature. Please consider, for example, your original unexplained "is inadequate" sniping.

Allow me to respond neutrally to your most recent comments.

The concerns I had with your protocol included:
the use of outcome tests of unknown reliability, validity or relevance,
I haven't specified the outcome tests, so I believe that you attack a straw man. I suggest we can very much rely about the Boston short-term memory recall test that is well-documented and widely accepted even in a web-based environment. If you attack the Boston test in this protocol, I ask respectfully that you provide your reasons.

the possibility of a carryover effect (which you will likely be unable to exclude) biasing your results because of the crossover design,

I don't understand this comment. Please tell me your reasoning that the carryover effect can't be blocked by blocking its variable in the ANOVA (or, if we chose, by a linear regression).
a highly selected population making generalizing or drawing conclusions difficult, and

Here you're correct. We could only conclude that the results apply to churchgoers and that only if assume that prayer effects individuals near me the same as those far from me. I suggest that many accepted and useful studies ,including even the Harvard study referenced earlier in the thread (only bypass patients in several hospitals), have a more selective sample population.
the high probability that the study will contribute nothing to the advancement of knowledge in this area.

I believe this is a conclusion that you've not adequately supported.It would be inadequate to lay the question to rest if negative.

I'm sure that it cannot lay all questions about the effect of prayer to rest. But I'm sure you're aware that science progresses in small steps. This study if negative would add to the body of evidence. If positive at very small p value, we could only conclude the need for further review of the question.
And if positive it suffers from the same issue as Saizai's study - false positives are far more likely than true positives, making it difficult to know whether there is anything worth pursuing further.

You fail to provide any evidence or even reasoning to back up your claim, "false positive are far more likely than true positive". I kindly ask you to provide such reasoning or some evidence.
And without measuring relevant outcomes, it would be hard to know what a true-positive even means (e.g. if it's even a good thing).

I disagree, of course, with your comments about the relevance, as I've stated above. I don't believe that we need to concern ourselves whether improvement in short-term recall is "a good thing". The study can clearly be neutral on this.
It's not my time and money (at least, I hope it's not my tax dollars ;)), though.

Since you included the emoticon, I infer that you're not serious on this issue.
And I'm starting to feel very much like a third-wheel on this thread since this is clearly up to you and your colleagues now. :)

I must respectfully disagree. It is not clear to me at all. I remain, for one, at saizai's pleasure in pursuing this study. For two, I have been quite clear, and not just in this thread, that I value constructive criticism from JREF Forum members.
Good luck.

Linda
May I also solicit from any Forum member their thoughts on the Gulliver protocol, especially as it relates for fls's concerns?

Gratefully,
Gulliver