PDA

View Full Version : New Robertson & Roy paper...any thoughts?


Dogwood
28th April 2004, 04:25 PM
Hi guys. Long time no post eh? I see things are mostly the same around here. Pity.

Anyway, I have recently received and read the latest and long awaited (at least by me) paper from Robertson and Roy, following up on their proposed protocols. I was curious if anyone else had read it and would care to share their thoughts. Although, I have promised Claus a review and will provide one if he's still interested, to be honest I don't know if I can make it worth the bandwidth. Mike D? Anyone?

Paul C. Anagnostopoulos
28th April 2004, 04:47 PM
Hey Mark!

Can we see the paper someplace?

~~ Paul

Dogwood
28th April 2004, 04:55 PM
I'm sure this paper will be available on-line in a couple of months. The JSPR has recently begun archiving their publications. (Details can be found here. (http://www.spr.ac.uk/index.php3?page=onlinelibrary)), but it is not available without a subscription or a special order now as far as I know, sorry.

Paul C. Anagnostopoulos
28th April 2004, 04:56 PM
Synopsis?

~~ Paul

Dogwood
28th April 2004, 05:05 PM
Impossible for me to reply meaningfully without a degree in statistics Paul, and unfortunately, I don't have one. What's most frustrating about this paper is that no raw data and no examples are presented, so it makes it even more difficult for me to analyze the results provided. That's pretty much my whole review.

dharlow
29th April 2004, 05:00 AM
Originally posted by mark tidwell
Hi guys. Long time no post eh? I see things are mostly the same around here. Pity.

Anyway, I have recently received and read the latest and long awaited (at least by me) paper from Robertson and Roy, following up on their proposed protocols. I was curious if anyone else had read it and would care to share their thoughts. Although, I have promised Claus a review and will provide one if he's still interested, to be honest I don't know if I can make it worth the bandwidth. Mike D? Anyone?

Hi Mark.

My SPR materials arrived the other day and I got a chance to briefly read the paper. I thought the experimental designs were very good, and manipulated enough variables that a comparison of the results of the different designs would be very useful. However, I was disappointed when I got to the results section. It seems that they lumped the data from the designs together rather than analyze them separately. In addition, no transcripts were provided, just some select statements with no indication under what design the statements were given. To be far, I only briefly scanned it, and I did not understand the statistical analysis used. I would have to take a closer look to be more firm in my conclusions. I will be interested in seeing if SI (or perhaps more likely the UK Skeptic magazine) will publish a critique.

In the accompanying Psi Report from the SPR, Donald West made some brief comments on the research at a recent Study Day, and there was a hint that Roy and Robertson will be publishing more of their results in the future. I think a publication of the transcripts would be of interest, especially those obtained under the main design.

Ed
29th April 2004, 05:07 AM
Originally posted by mark tidwell
Hi guys. Long time no post eh? I see things are mostly the same around here. Pity.

Anyway, I have recently received and read the latest and long awaited (at least by me) paper from Robertson and Roy, following up on their proposed protocols. I was curious if anyone else had read it and would care to share their thoughts. Although, I have promised Claus a review and will provide one if he's still interested, to be honest I don't know if I can make it worth the bandwidth. Mike D? Anyone?

Can you knock in the Abstract? What are they claiming the results to be, in a nutshell?

Dogwood
5th May 2004, 07:00 PM
Originally posted by Ed


Can you knock in the Abstract? What are they claiming the results to be, in a nutshell?

Okay, here's the abstract, retyped by my own tired hand.

(From, "RESULTS OF THE APPLICATION OF THE ROBERTSON-ROY TO A SERIES OF EXPERIMENTS WITH MEDIUMS AND PARTICIPANTS" by T. J. Robertson and A.E. Roy. The Journal of the Society of Psychical Research Volume 68.1 Number 874 January 2004)

Any emphasis is original to the authors. Any typos are mine.

This paper is the third in a series of papers by Robertson and Roy that together describe and test a method of assessing claims of mediumistic communication. In this paper we describe the results obtained by applying the Robertson-Roy Protocol (RRP) in a designed suite of experiments that enables in each experiment (a) the categories (such as a recipient who believes he or she is a non-recipient) of all participants present to be unambiguously determined, (b) the operation of a variety of normal factors (such as body language and verbal response to a medium) to be controlled. The RRP was tested over two and a half years in a study involving 13 sessions held in a number of locations in England and Scotland, with some 300 participants from a wide variety of cultural backgrounds. Ten mediums delivered 73 sets of statements during these sessions. The study demonstrated that the RRP, although time-consuming both in application and reduction of acquired data, is a practical, repeatable and useful procedure in assesing the ability of mediums to transmit relevant information to recipients. The results of the study provided a reliable, and objective, quantitative measure of the significance to be placed in the higher fraction accepted by the recipients of the number of statements in the sets delivered to the recipients than those accepted by non-recipients in those sessions. Due to the design of the experiments the results cannot be due to normal factors such as body language and verbal response. The probability that the results are due to chance is one in a million. The evaluation by the Robertson-Roy weighting procedure of the statements delivered by the mediums is also shown to support the negation of a a sceptical hypothesis.

My previous reviews of the RRP can be found at the Skeptic Report, here (http://www.skepticreport.com/psychics/robertsonroy2.htm). Discuss.

voidx
5th May 2004, 10:02 PM
Posted by Mark Tidwell
The evaluation by the Robertson-Roy weighting procedure of the statements delivered by the mediums is also shown to support the negation of a a sceptical hypothesis.

Seems to me this would be the crux of the experiment. What exactly is the weighting procedure, and the relevant data to go along with it. This would be time consuming, but I'm sure if we had even some random samples from collected data we could begin to agree/disagree with their weighting procedure as its the obvious subjective component. We've seen in the past that the main split of opinion with those that believe in mediumship and those that do not, is that we differ greatly on what signifies a qualitative and distinct hit, rather than the usually rather vague and generalized information usually seen in mediumship transcripts.

Edited to add: That last sentence concerns me as well, that they say it rather conclusively rules out a skeptical hypothesis. I would think if they were being objective they would use mundane, but perhaps thats just semantics. Either way, I think the original statement that body language and verbal cues are effectively ruled out in this scenario is more accurate. That hardly means that any skeptical hypothesis are thus negated.

Vitnir
6th May 2004, 12:20 AM
Didn't they use a protocol that was "approved" by sceptics. They published the protocol first in a separate paper right?

How significant was the findings anyway? 99% of all messages correct or a small but significant % over chance?

If I was a psychic I think I would win the jackpot on the lottery but I wouldn't tell anyone, who wants to get swamped by reporters and hundreds of old ladies who want to know where their cat is?

Darat
6th May 2004, 02:57 AM
Originally posted by Vitnir
...snip...

If I was a psychic I think I would win the jackpot on the lottery but I wouldn't tell anyone, who wants to get swamped by reporters and hundreds of old ladies who want to know where their cat is?

Perhaps old ladies with their cats would be a bit tiresome, but finding missing children...?

Mike D.
6th May 2004, 07:38 AM
Originally posted by voidx

...I think the original statement that body language and verbal cues are effectively ruled out in this scenario is more accurate. That hardly means that any skeptical hypothesis are thus negated.

voidx,

Is it not a commonly held skeptical hypothesis regarding mediums that they build their readings at least in part by paying attention to "body language and verbal cues" provided by sitters?

Mike

CFLarsen
6th May 2004, 07:44 AM
Originally posted by mark tidwell
Although, I have promised Claus a review and will provide one if he's still interested, to be honest I don't know if I can make it worth the bandwidth. Mike D? Anyone?

You know where to send it.

Darat
6th May 2004, 07:47 AM
Originally posted by Mike D.


voidx,

Is it not a commonly held skeptical hypothesis regarding mediums that they build their readings at least in part by paying attention to "body language and verbal cues" provided by sitters?

Mike

Perhaps by some but I would say that "body reading" is just one of many ways information could be passed from a sitter to a medium.

voidx
6th May 2004, 09:30 AM
Originally posted by Mike D.


voidx,

Is it not a commonly held skeptical hypothesis regarding mediums that they build their readings at least in part by paying attention to "body language and verbal cues" provided by sitters?

Mike
As Darat mentions its one method. I would say its the most basic aspect of how mediums can get information and feedback from sitters. Its by no means the most productive in my opinion. Vague and generalized information applying to a large portion of the population and then subjective judging and weighting of how much said information applies to the sitter is the more important technique in combintation with sitter feedback. So for them to rule out body language and verbal cues, and then also to state that in their opinion this shows a negation of all skeptical hypothesis is rather illogical, and it gives me a slight bit of pause to just how objective their being in their interpretation of the results.

Again the subjective weighting procedure of how much the information applied to the sitter is the critical part of any experiment on mediumship, this one included. And in my opinion its the achilles heel of any such experiment simply because there can be so much disagreement over it. I would not make any judged opinion on this studies results unless we saw and ourselves got a variety of examples of just how the weighting procedure applied to information given.

Mike D.
6th May 2004, 09:55 AM
Originally posted by voidx

So for them to rule out body language and verbal cues, and then also to state that in their opinion this shows a negation of all skeptical hypothesis is rather illogical, and it gives me a slight bit of pause to just how objective their being in their interpretation of the results.


voidx,

The abstract says "-a- skeptical hypothesis," not "-all- skeptical hypotheses."

Mike

Darat
6th May 2004, 10:03 AM
Originally posted by Mike D.


voidx,

The abstract says "-a- skeptical hypothesis," not "-all- skeptical hypotheses."

Mike

But the actual sentence is unclear i.e.

"The evaluation by the Robertson-Roy weighting procedure of the statements delivered by the mediums is also shown to support the negation of a sceptical hypothesis."

From that sentence, even looking at the context I'm not sure if they are just referring to body reading or something more.

voidx
6th May 2004, 10:07 AM
Originally posted by Mike D.


voidx,

The abstract says "-a- skeptical hypothesis," not "-all- skeptical hypotheses."

Mike

This sentence:

The evaluation by the Robertson-Roy weighting procedure of the statements delivered by the mediums is also shown to support the negation of a a sceptical hypothesis.

makes that decidely unclear in my opinion. I notice there are two a's there, so perhaps there's a typo involved.

Mike D.
6th May 2004, 10:23 AM
Originally posted by voidx


This sentence:

makes that decidely unclear in my opinion. I notice there are two a's there, so perhaps there's a typo involved.

Perhaps Mark can clarify it for us.

Dogwood
10th May 2004, 11:27 AM
That's my typo, sorry. Should be one "a" only. BTW there's an interesting article on this paper at Skepdic here. (http://skepdic.com/refuge/bunk22.html)

Loki
10th May 2004, 04:29 PM
mark,

Thanks for the summary. I've been waiting for this article to be "freely available" since reading about the proposed protocol last year. I admit to being unwilling to pay the SPR for anything - and yes, I can see the 'catch 22' situation of refusing to pay for something that might change my mind about whether I should pay!

My first though on reading the summary is be worried by the following statements : "Due to the design of the experiments the results cannot be due to normal factors such as body language and verbal response. The probability that the results are due to chance is one in a million." I appreciate that this is from the abstract, but both these statements seem simplistic and intended to cut off criticism.

A few comments (made without seeing the report details, obviously).

They say 300 participants, in 13 trials. That's an average of 23 people per 'session'. They have done 73 readings over these sessions. It is also mentioned that they did 'some' fake readings, where the audience was asked to evaluate a reading when there was in fact no medium present ( would be interested in how they prepared such a 'fake' reading - did they 'reuse' a reading from a previous session, or did they 'create' one from randomly generated generic statements? ). Straight away, it seems the 73 is a fairly small number of trials - but I guess it all depends upon the strength of result they are claiming.

The principle they are exploring is that if a reading is given to the 23 sitters, and each sitter subjectively evaluates the reading in terms of "does this apply to me", then the real target sitter of the 23 will score 'high', while the rest score 'low'. Given a set of 23 scores, there are a number of ways to examine these scores to determine "success".

The most obvious is to see if the real target scored higher than everyone else. So the simplest possible outcome is to simply say that out of 73 trials, 'x' number of real targets scored the reading highest. Mark, do they give a simple figure for this?

It's also possible (in theory, if mediumship was real) for the real target to NOT score the reading as the highest - since the general principle is that the medium's statements are sometimes 'symbolic' rather than factual, it's possible for the real target to "miss" the connection, while another sitter mistakenly picks it up. So I guess they would be more intertested in seeing where the real target's score falls in the scale of final scores - is it evenly spread, or consistently at the top of the range. I'm not sure 73 readings is enough for this sort of analysis.

A second way of looking at the data is to examine the 'spread' of the scores. If all 23 scored a reading somewhere between '55' and '60' (for example), then this seems to indicate that the reading was highly generic. If 15 sitters rated it '10' and the other 8 rated it '90' then there is something very specific in this reading.

It would be interesting to cross reference the results to see if there is any pattern between 'generic readings versus specific readings' and 'real sitter scores high versus real sitter scores low'

I would also be interested in seeing if there was any other factors that might bias the results. Such as gender bias - Were there approximately equal numbers of men and women in the sessions? Did the actual real targets of each reading have the same male/female ratio as the overall session? Possible age bias - was there a similar ratio of 'young' to 'old' in the real targets as compared to the overall session? If the scores when the target is a 'older woman' for example are consistently higher thatn for 'young men' as targets, then the mediums may be simply 'playing the odds'.

I'd also like to see them vary the protocol in a number of ways, to try and isolate more variables :

1. Have the medium attempt a reading when there was no target!
2. Have the medium re-read the same target multiple times (unknown to the medium or the sitters)
3. Bring in sitters from a "different" cultural/ethnic background or with a common background - say women under 30 how have lost a child. All without telling the medium, of course.

materia3
10th May 2004, 05:13 PM
On pages 28 and 29, the authors say:

"If we carried out the same experiment every 2.5 years for approximately ten thousand times the accepted age of the universe (14,000,000,000 years), we would have expected such a result to happen about once. This suggests that some factor other than chance is at work in obtaining such a result. It would appear that the two groups A and B really do represent two different populations."

I think it is difficult for anyone to talk about this study without having the full text or trying to do it just from the abstract.

Drooper
11th May 2004, 02:46 AM
Originally posted by materia3
On pages 28 and 29, the authors say:

"If we carried out the same experiment every 2.5 years for approximately ten thousand times the accepted age of the universe (14,000,000,000 years), we would have expected such a result to happen about once. This suggests that some factor other than chance is at work in obtaining such a result. It would appear that the two groups A and B really do represent two different populations."

I think it is difficult for anyone to talk about this study without having the full text or trying to do it just from the abstract.

Nothing I have read about the paper so far, including the Summary give me any comfort that this pair have used proper statistical inference.

To overcome a confidence limit of 0.000001 with so few trials is absurd. They use no language which could be understood unambiguously be a statistician.

However, most curiously, in their eager hyperbole about the high statistical confidence of this result, they bring to the mind of the reader the question that if their result is valid, why exactly is it the only statistically verified success in the history of the universe??

Quasi
11th May 2004, 03:24 AM
This is a "Schwartz" play. Make outrageous statements with no data. To date, Gary Schwartz of the U of Arizona has also claimed success, but sadly has not shown any raw data, at all. Yawn, is there anything more boring than a retread of GS's stuff? Hardly unexplained, people have known about these tricks for thousands of years.

Ed
11th May 2004, 03:28 AM
Originally posted by materia3
On pages 28 and 29, the authors say:

"If we carried out the same experiment every 2.5 years for approximately ten thousand times the accepted age of the universe (14,000,000,000 years), we would have expected such a result to happen about once. This suggests that some factor other than chance is at work in obtaining such a result. It would appear that the two groups A and B really do represent two different populations."

I think it is difficult for anyone to talk about this study without having the full text or trying to do it just from the abstract.

I defy anyone to find writing of this bombastic nature in any real journal. Don't they have a review process?

Vitnir
11th May 2004, 04:49 AM
I defy anyone to find writing of this bombastic nature in any real journal. Don't they have a review process?

I have seen some quite bombastic and stupid comments from referees.

I would also be most interested in finding out what kind of statements the mediums made and the mathematical model used to calculate their results.

Perhaps old ladies with their cats would be a bit tiresome, but finding missing children...?

If I was a genuine psychic I would be swamped by people wanting help. The old ladies with their cats would outnumber the parents with missing children by a factor of X. I would therefore have disconnected my telephone long before any parents could call me. I guess I could solve high profile cases and rake in the rewards anonymously though. So if anyone was a real psychic the smart thing to do in order to stay sane would be to keep quiet about it. Maybe in order to stay alive too, the wonderful human behaviour of herd mentality might gather a lynch mob if your ability could be perceived as a threat by anyone.

Drooper
11th May 2004, 04:53 AM
Originally posted by materia3
On pages 28 and 29, the authors say:

"If we carried out the same experiment every 2.5 years for approximately ten thousand times the accepted age of the universe (14,000,000,000 years), we would have expected such a result to happen about once.

Well, the JREF million bucks should be a snack then.:) I am burning with anticipation. With results like these they must be hounding Randi to get a test set up as soon as possible.















On clear reflection, I doubt it.

materia3
11th May 2004, 12:54 PM
I don't know squat about statistics. Its too bad some of the more proficient members like Mr, Hoyt and Mr. Tai Chi (I read their posts on statistics) don't pull their weight, buy the issue of the JSPR (7 pounds stg) and tell us what it all really means. There are a lot of statistics in this paper and all us dumb people can do is go by the conclusions and some of the plain facts.

The study took 2.5 years.

It involved 10 mediums ... not one or two.

It involved 300 participants, both recipients and non-recipients or decoys as well as proxies. I don't know how some person above decided there were only 23 readings. Was this in the abstact?

There were 13 sessions, involving 8 different experimental designs. Each session involved between 15 and 29 sitters,
recipients, non-recipients or proxies depending on the design of the session.

There were multiple investigators who were blind to what each other knew.

There were 15 different types of participant categories.

The referees probably allowed for the fact that Prof. AE Roy
is a distinguished astronomer and astrophysicst who finds it convenient to explain probabilities in terms of aeons.

Robertson and Roy are going to publish installment 4 with specific details according to a comment by Robrtson elsewhere.

Loki
11th May 2004, 04:44 PM
materia3,

I don't know how some person above decided there were only 23 readings. Was this in the abstact?
Perhaps you need to improve your reading before worrying about statistics?

I said there was an average of 23 sitters per session (13 sessions, 300 participants). There were 73 'true' readings, and an unspecified number of 'false' readings (no medium involved) - so an average of 5 readings per session. So - on average - for any session there were 23 people present, and 5 'medium generated' readings done.

materia3
11th May 2004, 05:47 PM
Yes, you said an average of 23. Actually it would help you to understand the study and perhaps explain it better if you actually had it. I would like to know how anyone can discuss and even draw inferences as to averages without considering the entire study?

What is more important, and to the investigator's credit, they evaluated the total of individual data points over each of the 13 experimental sessions conducted over 2.5 years. In Data Group A, the individual points of data numbered approximately 1,600. In the second series of 13 experimental sessions the researchers calculated 1,700 points. It is this large number of data points, which some might call "validations" that were weighted and then studied as to probabilities. I understand it is fashionnable here to expound on subjects which we have no knowledge of and that it is not against the rules but why not equip yourself with a copy of the paper and then come back and explain your objections? The argument that the data pool was small is not true even if this is what your implied in your averaging of total participants divded by the number of one set of the 13 experimental sessions.

The 8 separate types of experimental design employed cover most of the objections mentioned above.

Loki
11th May 2004, 06:21 PM
materia3,

We appear to be having a needless "confrontation"?

I would like to know how anyone can discuss and even draw inferences as to averages without considering the entire study?
Hopefully I've made it clear that I have not (yet) seen the details of the study. I have tried to limit my comments to (a) issues relating to the protocol, as previously outlined in public statements by R&R and (b) the small information available in the abstract. You're quite correct that it would not be possible to make firm statements, and draw conclusions, about the report without seeing the details. Hopefully you can see that I haven't done either of those things (assuming the above comment was intended to include me).

...why not equip yourself with a copy of the paper and then come back and explain your objections?
I intend to. I've been anticipating this report for some time now. However, at this time I prefer to NOT pay the SPR - I'm prepared to wait...

The argument that the data pool was small is not true even if this is what your implied in your averaging of total participants divded by the number of one set of the 13 experimental sessions.
You misunderstand the point of mentioning the 'average' (or perhaps you don't). I was attempting to set to a basic framework for determining a simple 'yes/no' anaylsis, and it (still) feels like this will be insufficient. Therefore, I assume thet R&R will need to use a different approach - your comments regarding 'data points' and 'validation' seem to confirm that their research is unable to deliver a simple statistical result. But I can't comment further, since I haven't seen it yet.

The 8 separate types of experimental design employed cover most of the objections mentioned above.
I'll take your word for it for now, although their outlined protocol seemd to fail to address several issues. I assume they refined the protocol as they went?

materia3
11th May 2004, 06:49 PM
I intend to. I've been anticipating this report for some time now. However, at this time I prefer to NOT pay the SPR - I'm prepared to wait...

Fair enough. I did not mean to have a confrontation. A number of people, not you, posted remarks and haven't seen the paper either. They even compared it to another study. I can't figure out how critical thinkers can do this without studying the work.

You may have to wait decades unless you break down and spend about fifteen dollars to buy this issue of the JSPR. Only members can access their online archive, more expensive, and they are years behind in getting their complete publications online. There is no other place where this paper could be legally posted in it entirety. Maybe you can get it on inter-library loan and xerox a copy.


You misunderstand the point of mentioning the 'average' (or perhaps you don't). I was attempting to set to a basic framework for determining a simple 'yes/no' anaylsis, and it (still) feels like this will be insufficient. Therefore, I assume thet R&R will need to use a different approach - your comments regarding 'data points' and 'validation' seem to confirm that their research is unable to deliver a simple statistical result. But I can't comment further, since I haven't seen it yet.

All weighting begins with yes/no but this is not enough. They have weighed the relevance of their data points, all some 3,300 of them. An "average" of 23 readings is misleading and makes it sound like they did not have a signficant sample on which to base their probabilities.


I'll take your word for it for now, although their outlined protocol seemed to fail to address several issues. I assume they refined the protocol as they went?

Maybe someone who also has the paper can summarize each of the 8 protocols used. I think we need to see these protocols before we can lodge any objections to them or state they have left something out.

Vitnir
12th May 2004, 12:13 AM
I ordered a copy of the article through the university library, I hate waiting but another week isnt so bad.

Drooper
12th May 2004, 02:59 AM
Originally posted by materia3
Yes, you said an average of 23. Actually it would help you to understand the study and perhaps explain it better if you actually had it. I would like to know how anyone can discuss and even draw inferences as to averages without considering the entire study?

What is more important, and to the investigator's credit, they evaluated the total of individual data points over each of the 13 experimental sessions conducted over 2.5 years. In Data Group A, the individual points of data numbered approximately 1,600. In the second series of 13 experimental sessions the researchers calculated 1,700 points. It is this large number of data points, which some might call "validations" that were weighted and then studied as to probabilities. I understand it is fashionnable here to expound on subjects which we have no knowledge of and that it is not against the rules but why not equip yourself with a copy of the paper and then come back and explain your objections? The argument that the data pool was small is not true even if this is what your implied in your averaging of total participants divded by the number of one set of the 13 experimental sessions.

The 8 separate types of experimental design employed cover most of the objections mentioned above.


This is the problem.

I am a pretty competent statistician (studied to post gradute level). I don't give a flying thingy whether I am reading the paper or the synopsis. The thing is clear as mud from a statistical point of view. Statements about mllions of years, blah, blah, blah give me cause for concen about the competience of the analysis.

The fact that people like you seem to think that the complexity of the study is something of merit is worrying. Eight different experimental desgins? Ludicrous.

The fact that there are adjudicators is a cause for concern.

The fact that, as you put it, data is "weighted" (incidently this term appeared in an earlier version of the synopsis) makes bright red lights flash in front of me.

This just doesn't pass any sniff test and I don't feel like humouring you or anyone else with my "well, let's keep an open mind" facade.

If there is such powerful statistical evidence that this exists, then a far more simple experiment, excluding "adjudicators" should be able to detect the effect very easily.


I don't believe an organisation with such an illustrious history of publishing naive and flawed work deserves a penny of my hard earned cash. If the paper does what they say it does, they should make it available free of charge on their web site (it would be in their own interests), along with the FULL data, not the gobbledygook they write about it.

Drooper
12th May 2004, 03:05 AM
Originally posted by materia3


Fair enough. I did not mean to have a confrontation. A number of people, not you, posted remarks and haven't seen the paper either. They even compared it to another study. I can't figure out how critical thinkers can do this without studying the work.

See my response above. Do you realise how many other people like this publish nonsense offering "proof" and demanding payment to see it?



All weighting begins with yes/no but this is not enough. They have weighed the relevance of their data points, all some 3,300 of them. An "average" of 23 readings is misleading and makes it sound like they did not have a signficant sample on which to base their probabilities.

Bing! - data subjectively weighted by relevance.

A statistician needs to read no more. Your study is worthless.


Maybe someone who also has the paper can summarize each of the 8 protocols used. I think we need to see these protocols before we can lodge any objections to them or state they have left something out.

Repeat after me. "It's the statistics, it's the statistics".

You could have 800 protocols, but once you start "weighting the data" it is all worth nothing.


Why do you think the JREF challenge specifically stipulates that there will be no judges, judging, adjudicators or adjudicating used in any of its testing for the One Million Dollar Challenge?

Vitnir
12th May 2004, 04:00 AM
It depends highly how they have reported their statistical methods in their paper, I don't believe that weighting data automatically disqualifies results. If they provide enough data in the paper so that anyone can form an opinion on the results then I would be interested. Hopefully they have enough experience in writing papers that they can write so others can understand.

Ed
12th May 2004, 04:07 AM
I will foresee the future but make no claim on the million dollars.

These guys will have "proven" whatever it is they set out to prove. This is the woo-way. What I garentee is that the results will be dependent on something not quite kosher. Weighting will fit the bill. Remember, unless there is a good a priori reason for weighting, it is nothing more than a way of unethically selecting data.

In this latest paper look for

- needlessly elaborated designs (if you can't dazzle 'em with footwork, baffle 'em with bullsh!t)
- non-objective assumptions/treatments
- statistics that are unnecessarily complex

The reason that I am convinced that this area of human inquery is a total waste of time is that, repeatedly, we have seen lieing, obfuscation and misdirection associated with every attempt to demonstrate an effect. Every time. There is never a clear unambigious demonstration of anything. Why is that?

Well, time will tell.

To our resident woos: These guys have had a long time to think about what they did. They had plenty of input and the luxury of examining all of the bad research that came before. If, as I predict, their paper is yet more hand waving, why do you think that is? Would you bet that the "demonstration" is clear?

Bah

Drooper
12th May 2004, 04:22 AM
Originally posted by Vitnir
It depends highly how they have reported their statistical methods in their paper, I don't believe that weighting data automatically disqualifies results. If they provide enough data in the paper so that anyone can form an opinion on the results then I would be interested. Hopefully they have enough experience in writing papers that they can write so others can understand.

From what has been placed in the public domain it can be established:

1. There has been a subjective element in the data.

2. Judgement has been the basis of altering the data (weighting)


From both these it is impossible to establish the true probability distribution on which to make any inference.


This is a dud study. It will be found to be so.

materia3
12th May 2004, 07:52 AM
How can one be made to understand the importance of weighting a data point, a piece of information versus just a yes/no rating?
Perhaps an example might help:

Medium: I am getting a J name....do you know anyone with a J name?

Subject: Yes, my friend's uncle has a J name.

Medium: Here or passed?

Subject: Here.

Medium: Okay, then the spirit just wants to send him his regards.

Or, it can go like this:

Medium: I am getting a J name.....do you know anyone with a J name?

Subject: Yes, my friend's uncle's son has a J name

Medium: Here or passed?

Subject: Passed.

Medium: Okay, the spirit has J with him. It is important you tell this family he came through.

------------------------------------------------------------------------------

Result: Pure garbage...unadulterated crap. But if you break the above drivel down without weighting it and just rate it we have nothing but hits. This is why assigning relative value units (RVU/weighting) to garbage like this is important. You would have us believe it is sufficient to tabulate or quanitatively rate readings and ignore the qualitative aspects. Fine.

Robertson and Roy were dealing with 3,300 data points. The only sane way to rate them was by assigning a weight to each. If they just went yes/no it would be meaningless. Every generality, guess-hit and J name would be counted as a hit.

The 8 different design protocols were developed to meet every possible objection to one or the other. A single or double design would soon find a host of people pointing out additional protocols which need to have been covered. It has already been suggested above by people who have not read the paper that 8 may not be enough.

For a few above who conveyed a prejudicial bias against the SPR, (in order to defend their unwilliness to part with $15.00 in order to discuss this study based on the information and not speculation) it might help if they knew this 122 year old organization functions by volunteers and 1 or 2 paid clerical or office workers and vendors (i.e. printer/suppliers). It is a skeptical organization that was founded to publish and present lectures investigating the validity of various allegedly paranormal claims. Dr. Richard Wiseman, who is a CSICOP fellow, is on their editorial board and publishes regularly in the JSPR. Dr. Susan Blackmore, also a CSICOP fellow, is a regular published author in their journal. Christopher French is another and there are many more including in the past I am told James Randi. The SPR has no corporate agenda and this is stated in their goals and objectives on the inside front cover of the JSPR and on their website.

Ed
12th May 2004, 09:15 AM
Originally posted by materia3
How can one be made to understand the importance of weighting a data point, a piece of information versus just a yes/no rating?
Perhaps an example might help:

Robertson and Roy were dealing with 3,300 data points. The only sane way to rate them was by assigning a weight to each. If they just went yes/no it would be meaningless. Every generality, guess-hit and J name would be counted as a hit.



Which is why you have controls. Please don't cite 3,300 data points. That can be low for single subject designs.

The point is that the suggestion is that the design is not good enough to differentiate garbage from good stuff so "weighting" has to be employed. That is making up data.

materia3
12th May 2004, 11:23 AM
Do you know something about this study nobody else knows? The 3,300 data points included data points from the non-recipient controls as well as the recipients.

The qualitative (weighing) of both the recipient and non-recipient data points should be the same. Are you suggesting otherwise? I have not heard of this.

voidx
12th May 2004, 01:51 PM
Here's an example of the problem with weighting the responses.

"I'm getting a James, do you know a James."

Sitter A) Knows a James, but he's still alive and is a very distant cousin, so he considers it a rather weak hit, but it "applies" to him.

Sitter B) Has a grandfather James who passed and who he was hoping to talk to so he considers it a very strong hit so it "applies" to him also.

Spirit 1 is actually trying to get in contact and validate for Sitter A). We've seen round about validations like this used in mediumship before so its consistent with the "process" of mediumship.

The weighting procedure so far as I can determine is 50/50. Does this apply to you, yes/no. This is not a satisfactory weighting scenario for the standard types of information brought across that I've seen in transcripts for mediums. Now not having seen the transcripts from the R&R study, I can't comment, although I would assume its on the same level as other mediums I have seen.

In the above example they both would have answered yes. Well seeing in even a small amount of detail how the Yes to James actually applies to them we can see that the Yes is by and large useless as a measuring stick for accuracy. Now if the medium brought across, "Is your dead fathers name James" then that would be different. But I would argue this is the minority example of what type of information is brought across by mediums. Can anyone who has paid for access to SPR tell me if these transcripts of what the mediums brought across are yet available? Or is it just the summary and the statistical analysis? To me the transcripts are what are important here so we can get some idea just what people were answering yes/no too or saying "applied" to them.

Since "applies to me" can have such wide and varying degree of accuracy/relevance as it pertains to any given person, is it not likely to assume that any statistical analysis arising from this is also likely to vary in the degree of its relevancy and accuracy?

I admit its probably one of the better protocols I've seen so far, but it still relies on a large subjective "applying" factor that I find unacceptable. And so I would not myself be running around claiming that any results from this are absolutely conclusive.

voidx
12th May 2004, 02:03 PM
Actually I do think cold-readers could contribute to this study and should be used. While they work on feedback as one of their techniques, they also rely on using common events and names and numbers as starting guess'. Having a cold-reader produce a series of questions that they would commonly ask and just pretending to get yes/no validations on certain aspects, you could potentially create a completely fake reading. Re-run the trial, or better, add the cold-reader questions onto the list with the mediums. See if the cold-readers fake reading produces the same affect as the mediums. I know this is not the ideal scenario for cold-reading, but it would give some form of comparison. I think having a bunch of people who are not the recipients is not enough of a control group. The reason being is that they all know its a real medium giving the information, they just don't know if its for them or not, this would introduce bias to either work harder, or less hard to have the questions "apply" to them. If there was also a set of questions that was completely made up, unbeknowst to them, I think that would be an improved level of control.

Edited to add: My main point is that if mediums are cold-reading and using other mundance techniques such as educated guesswork and the like, then a medium with no sitter feedback should perform equally to a cold-reader with no feedback as they are both just creating the questions, information out of their head.

Loki
12th May 2004, 04:06 PM
Have been thinking about "why" R&R would need to weight the data points, and it seems to me that they shouldn't need to.

Start from the assumption that the medium can actually do what R&R say - that is, the medium can do a reading for someone that :

(a) the medium has never met;
(b) the medium is not in any physical contact with at all;
(c) the Medum knows *nothing* about the sitter except their physical location.

This is exactly what the outlined protocol proposes - that a medium is placed in room; in another room a group of potential sitters are asked to enter the room and sit on numbered chairs; and the medium then performs a reading given ONLY the number of a chair.

There seem to be only 3 options here :

1. The medium can do this precisely. 100% of the time the medium is able to 'narrow in' on a person sitting in a particular physical location.
2. The medium can do this, but with some margin of error. Perhaps they occasionally get the wrong chair. Despite these msitakes, they still succeed at a rate that is far better than chance (but not 100%).
3. The medium can't do this, and any apparent 'success' is just a matter of playing with the data.

If we accept R&R's claims, then they've statistically eliminated #3. I'd assume they have also failed to prove #1, but it doesn't really matter here, since the important thing is whether they've proven it can be done at least "sometimes, and more than chance".

So we have a group of mediums that we feel we have proven can perform a reading when physically separated from the sitters, and given nothing more than a chair number. Okay, let's then propose a new protocol - simply have the selected chair be empty on 50% of the trials, and have the medium simply answer "Yes" or "No" as to whether there is a sitter in the nominated chair. No more 'weighting', just a simple yes or no. Let the mediums consistently produce a hit rate of even 60% rather than the chance dictated 50%, and you've got proof of something happening.

Any guesses why R&R won't try this???

Loki
12th May 2004, 04:56 PM
voidx,

...but it still relies on a large subjective "applying" factor that I find unacceptable.
The 'control' in this study to manage the 'applying' factor is that the person doing the 'applying' is blind to the truth about whether it does actually 'apply'. In theoiry, given 'sufficient' questions and trials we should be able to arrive at a point that seems unlikely if pure chance was at play. R&R claim to arrived there. However, it sounds like they have performed several passes through the data, and applied various "wieghting's" to the sitter's replies to end up at this point. That's not good (if in fact it's what they've done).

If there was also a set of questions that was completely made up, unbeknowst to them, I think that would be an improved level of control.
The abstract says that this was done - the sitters were occasionally given questions that were NOT from a medium doing a reading on that group. It's not clear how these questions were determined - I assume it's either a 'genuine' reading from a rpevious session, or a randomly generated list of 'typical' questions.

voidx
13th May 2004, 07:59 AM
Originally posted by Loki
The 'control' in this study to manage the 'applying' factor is that the person doing the 'applying' is blind to the truth about whether it does actually 'apply'. In theoiry, given 'sufficient' questions and trials we should be able to arrive at a point that seems unlikely if pure chance was at play. R&R claim to arrived there. However, it sounds like they have performed several passes through the data, and applied various "wieghting's" to the sitter's replies to end up at this point. That's not good (if in fact it's what they've done).

My problem is that it can apply to varying degree's to different people, its not merely, it applies/it doesn't apply. I realize that the point is to do enough questions so that statistically all the questions in one reading should statistically "apply" to a higher and more consistent degree to the recipient compared to the mass of non-recipients.

To me however, if a set of say 43 statements is given, its likely that many of them do apply to many different people. So whether the reading was for them or not, these bits of information do genuinely apply to them. However, more of them should apply to a higher degree, therefore signalling out a single person. This should be visible consistently within each reading. If we have to take all responses and readings en masse to find this effect I'm concerned that perhaps we're incidentally creating something in our going through the numbers that is not actually there. Again hopefully they will release more detailed information. If they rated each individual reading, and then found that statistically each reading was above chance, and could then list a wide range of examples of transcripts that would seem to verify this, I would be more convinced.


The abstract says that this was done - the sitters were occasionally given questions that were NOT from a medium doing a reading on that group. It's not clear how these questions were determined - I assume it's either a 'genuine' reading from a rpevious session, or a randomly generated list of 'typical' questions.

I thought I had read that it had been done, but on rereading I didn't see it. Probably missed it, wouldn't be the first time :). However, I'd be curious to see how many trials of these made up readings were done. In my opinion an equal number of trials would have to be done in order to compare it to the authentic readings. Also is the abstract quoted in entirety by Mark above? Or is there more to it. Because the above quote doesn't seem to imply that fake readings were incorporated.

Mike D.
14th May 2004, 04:00 PM
Originally posted by Loki
... the sitters were occasionally given questions that were NOT from a medium doing a reading on that group. It's not clear how these questions were determined - I assume it's either a 'genuine' reading from a rpevious session, or a randomly generated list of 'typical' questions.

In their paper, on page 22, Robertson and Roy describe two experiment designs they used when doing what Loki has mentioned above. They write: "...Investigator A hands the medium a set of statements and asks her to deliver them as if they were genuine statements intended for a recipient. This set is taken from a previous experiment conducted somewhere else, with a different medium and another set of participants."

The difference between the two designs is that in one the particpants suppose that the reading is for someone in the room, while in the other design, a particular individual is singled out as the supposed recipient of the reading. (Remember that the medium is in another room, and both participants and medium are blind to each other.)

voidx
14th May 2004, 09:39 PM
Originally posted by Mike D.
In their paper, on page 22, Robertson and Roy describe two experiment designs they used when doing what Loki has mentioned above. They write: "...Investigator A hands the medium a set of statements and asks her to deliver them as if they were genuine statements intended for a recipient. This set is taken from a previous experiment conducted somewhere else, with a different medium and another set of participants."

The difference between the two designs is that in one the particpants suppose that the reading is for someone in the room, while in the other design, a particular individual is singled out as the supposed recipient of the reading. (Remember that the medium is in another room, and both participants and medium are blind to each other.)
Thanks for clarifying Mike. Any idea how often, or what percentage of the sessions were conducted in this manner? I'd be curious if they tabulated these results seperately as a comparison, or whether they had enough samples to do so.

Mike D.
15th May 2004, 01:59 PM
Originally posted by voidx

Thanks for clarifying Mike. Any idea how often, or what percentage of the sessions were conducted in this manner? I'd be curious if they tabulated these results seperately as a comparison, or whether they had enough samples to do so.

voidx,

In reading through the Robertson and Roy paper, I didn't find a specific reference to what you are asking here. There were eight experiment designs, with design #1 being the strict Robertson/Roy protocol that Mark Tidwell has spoken positively about in his article in Skeptic Report. The two experiment designs I mentioned in my post above are identified by R&R as designs 4 and 5. In introducing the eight designs, R&R state on page 21, "There were eight different experimental designs, numbered 1 to 8. Not all eight were used in any one session (for lack of time) but a selection was made, usually of six, the particular choice being dictated by the factors under investigation." It sounds to me from this that R&R could have potentially used designs 4 and 5 fairly often, but I don't really know.

As for your question about tabulating the results of designs 4 and 5 separately, I'll have to pass on attempting to answer that. I have no background in statistics, and frankly have little understanding of the section on statistics in the paper. I didn't notice anything that was to me obviously such a tabulation, but I'd prefer to have someone knowledgable in statistics answer this question. Perhaps in time others here will read the paper and have comments on some of these matters.

Mike

Felice
15th May 2004, 05:41 PM
I too am itching to get my hands on this paper. I just love dodgy stats :)

voidx
16th May 2004, 07:48 PM
Woah...hold the phone. R&R were using potentiall up to 8 experimental designs. So they list statistics that if one only read summary made it seem like 1 experimental design was used, and quite successful. Does anyone else have a problem that the results were garnered by using 8 different protocols, seemingly at random? If this is in fact the case, I'm afraid I have a problem with this. Can you clarify if the statistics state what protocol was used for each session? Are they tabulated seperately, and then a summary drawn from that, or ar we just told there were up to 8 experimental designs used, and then just given a heap of numbers with some results. The potential for reading something that's not actually there seems to be increasing perhaps.

materia3
16th May 2004, 09:08 PM
I stopped posting on this thread because it was futile discussing a subject with folks who do not have the source material. The last post above demonstrates this beautifully, no disprespect intended. But groping at suppositions in the absence of information is, you will have to admit, a bit "dodgy", to quote another poster.

There were 8 experimental designs, and 8 separate histograms for the results of each (across each design). The fractions of the 8 groups falling to the right of the probability value P appear in Table 3 on page 28.

The data is also broken down as weighted and unweighted which responds to those who stated their opposition to weighting or qualifying the data. This is represented graphically in Figure 5 and in tabular form in Table 4 for all the data unweighted and in Table 5 where the all the data is weighted. I'll let someone else who has the paper describe those results.

voidx
16th May 2004, 09:29 PM
Originally posted by materia3
I stopped posting on this thread because it was futile discussing a subject with folks who do not have the source material. The last post above demonstrates this beautifully, no disprespect intended. But groping at suppositions in the absence of information is, you will have to admit, a bit "dodgy", to quote another poster.

Hasn't been futile for me. Kudos for having the source material, not all of us do, nor are interested currently in paying for it. So I'm asking for clarifications, and giving reasons as to why I think those particular things are important to be clarified.


There were 8 experimental designs, and 8 separate histograms for the results of each (across each design). The fractions of the 8 groups falling to the right of the probability value P appear in Table 3 on page 28.

Great, just what I'd asked for. Any variance in performance between the designs? Did some produce more or less positive results for the mediums in your opinion?


The data is also broken down as weighted and unweighted which responds to those who stated their opposition to weighting or qualifying the data. This is represented graphically in Figure 5 and in tabular form in Table 4 for all the data unweighted and in Table 5 where the all the data is weighted. I'll let someone else who has the paper describe those results.
Well see, that they included the data also as unweighted(which is good I'll admit)doesn't speak to my concern about the weighting. Did they base their conclusion on the data when it was weighted, or unweighted? I'd assume the former. And if that were the case I'd still have issues with it. Because as we've seen in the past, the weighting of information is often arguable. Its a catch-22 I admit, make the weighting too simple (applies/doesn't apply) doesn't account well for all possible ways and degree's something could "apply" to someone. Making it too detailed and you risk inserting more bias and subjectivity than can be ruled out.

materia3
17th May 2004, 12:39 PM
It is the 3,300 data points that the statistical conclusions were ultimately based on. Weighted and unweighted, A & B, as
quoted previously, were virtually identical.

This was the basis of the authors' example concerning the fact this would likely occur one time in 14 zillion years or whatever. They didn't plan it this way,and they couldn't if they wanted to. They did not know what the results would be. The 3,300 data points, 1,600 and 1,700, were collected over all 8 design protocols conducted for 2.5 years involving the ten mediums and 400 participants. These results were collected in a highly randomized manner as far as I can tell. It would be good if others here who have the paper would weigh in on this, including Mark, who started this discussion.

Vitnir
19th May 2004, 12:11 AM
Yesterday I got the paper from my library. After speedreading the text (17 pages) I'm a bit disappointed. I would wish that they had written this paper together with an statistican because it's frankly a mess. That they have mixed data from different experimental design is strange. It would be more interesting if they had compared design nr1 which is the ideal sceptic type, neither the medium or the recipient are aware of each others identity and nr6 which is the James Edward version which gives unlimited cold reading possibilities. That they haven't done this comparison in the paper is naturally highly dubious, the suspicion is from me that they did this comparison and concluded that the results are dependant on which design they picked and decided to be creative.

weighting the data is done to give general statements less weight. If w=weight and r=the number of people claiming that a statement is addressed to them and n=the total number of people in the audience.
The weight given to a statement is then w=1-r/n
Thus if all members of the audience thinks that "A person with a letter W is important to you" is valid the weight is zero.

I'l try to show this article to a statistican I know later this week, maybe he can spot something fun.

Woomaster
16th September 2010, 12:43 AM
Hi!

I hope it's okay if I dig out this thread after six years. ;)

I'd like to know if there are any new thoughts on this? For me it seems that this paper has withstood the most criticisms in the last years. Am I right or, if not, where are the problems with this study? Or so to speak: Why is this not compelling evidence for psychic abilities in mediums?

Aepervius
16th September 2010, 01:25 AM
Hi!

I hope it's okay if I dig out this thread after six years. ;)

I'd like to know if there are any new thoughts on this? For me it seems that this paper has withstood the most criticisms in the last years. Am I right or, if not, where are the problems with this study? Or so to speak: Why is this not compelling evidence for psychic abilities in mediums?

Reading a few post on various forum and in this thread, it seems to have had problem with creative statistic. Which is why it isn't really touted as an evidence for mediumship, at least nowhere I could google.

I would be also interested into finding more about this.

Anyway here are some links :
http://skepticreport.com/sr/?p=571 (not sure if this is about the same paper?)
http://forums.randi.org/showthread.php?t=102863 (no answer on asking for help on that paper)

The way I see it, nobody is interested. That happens often with all sort of WOO paper.

Woomaster
16th September 2010, 01:42 AM
As far as I know, the SkepticReport article is about the protocol Robertson & Roy used in their third paper, but it was written before the outcome of the final study.