View Full Version : Ball-drawing experiments-Discussion with Dr. Ertel
fls
5th June 2009, 08:38 AM
This thread is an off-shoot for Wowbagger's book review here:
http://forums.randi.org/showthread.php?t=143156
Dr. Ertel's ball-drawing experiments form part of that discussion. He has been participating in that discussion via e-mail, and I have his permission to post his responses to my e-mails here.
This is a video of the ball-drawing experiment:
http://video.google.de/videoplay?docid=-2883940245344908268&ei=oiwpSp7VFYGUqAKSqJi3Cg&q=esp+paranormal&hl=de&client=firefox-a
These are two papers describing the experiments:
http://www.parapsych.org/papers/38.pdf
http://www.psych.uni-goettingen.de/home/ertel/ertel-dir/downloads/ertelchapterwithfigurespdf.pdf
The discussion of these issues starts here:
http://forums.randi.org/showthread.php?postid=4759306#post4759306
I will post the emails. It will take a little while to clean up the formatting.
Linda
fls
5th June 2009, 08:43 AM
Material in quotes is mine (Linda). Regular text is from Suitbert Ertel.
The analysis of the 5x5 matrices simply demonstrates that the ball-drawing of some participants is non-random.
Yes, it is non-random.
You think you know why:
These people should continue to do [well] because it has already been established that they are poor-random number generators. It does not need any special skill to get non-random draws - consciously or otherwise. They simply need to do a poor job of mixing (easy enough when what is difficult to achieve is good mixing) and to have a pattern to how they draw.
You seem to claim, for the 5x5 matrices test:
1. All test taking people are poor random number generators, i.e. the frequencies of numbers drawn in this test are unbalanced.
2. This has already been established.
3. Non-randomness of number frequencies occurs in any event, poorly mixing the balls will increase the non-randomness of number frequencies.
4. Individual differences among participants regarding deviations from random frequencies of drawn numbers are due to individually varying carefulness at mixing the balls.
I don’t know why you say (2) that “this” has already been established. What has been established?
If (1) is true, why do people’s frequencies of drawn numbers not deviate from chance in one of my one-bag experiments, when 10 participants were told to draw the numbers 1, 2, 3, 4, and 5 in this order 12 times in one run, thus equal frequencies of the five numbers were had to be called, one by one (1632 times each number was called).
The frequencies of drawn numbers in this test were
1: 1627
2: 1699
3: 1676
4: 1638
5: 1619
The differences are not significant. I deem it unreasonable to claim differences of drawn numbers for the 5x5 matrix test and not to claim the same thing for this particular 12345 ball drawing test.
By the way, those 10 participants had been selected as psi-gifted for the 12345-test because they had obtained significant hit surpluses in the standard test with deliberate number calls, conducted at home. Their hit rate in the 12345 test was 24.65 % , expected was 20%. The hit surplus was thus extremely significant without any nonrandom frequencies of drawn numbers.
Since you explain uneven frequencies of “hitting” the 25 cells of the matrix in the matrix test partly by the participants’ poorly shuffling the balls in the bags, you should explain, in the same way, the fact that individual participants, when they conduct this test repeatedly, will produce patterns which tend to be totally different from each other. How can the participants change the patterns? “Simply by poorly mixing” the balls, or more or less poorly in different test sessions? This idea I consider again as unreasonable.
Another problem arises for you with results of which one has been published in Damien’s book. You may remember the case of a pattern of high frequencies in row 3 in the middle of the matrix, high frequencies were evenly distributed in that row from column 1 to 5. Your explanation using nonrandomness of drawn numbers, increased by poor mixing, would have to claim differences of nonrandomness between the left and right bags even though they are used simultaneously by the same individual.
But participants will hardly shake balls in the left bag more frequently or more intensely than the balls in the right bag, or less intensely or less frequently. To me, these patterns escape your hypothetical explanation. The psi hypothesis is not weakened by these results. Rather I consider it strongly confirmed.
I like skeptical alternative explanations because they often reveal, for me at least , the power of human belief or disbelief which leads skeptics to constructions and speculations surpassing considerably the parapsychologist’s readiness to consider factual possibilities that violate our common gates of reason.
Suitbert
fls
5th June 2009, 08:46 AM
Quoted material is from Suitbert Ertel.
Suitbert:
You seem to claim, for the 5x5 matrices test:
1. All test taking people are poor random number generators, i.e. the
frequencies of numbers drawn in this test are unbalanced.
2. This has already been established.
3. Non-randomness of number frequencies occurs in any event, poorly
mixing the balls will increase the non-randomness of number frequencies.
4. Individual differences among participants regarding deviations from
random frequencies of drawn numbers are due to individually varying
carefulness at mixing the balls.
I don't know why you say (2) that "this" has already been established. What
has been established?
Linda:
I'm not attempting to say anything particularly remarkable or something you
will disagree with.
To say that something is random (under these circumstances) is to say that
each ball has an equal chance to be picked on each round. The balls could
be placed in strictly numerical order, but if the position from which a ball
is picked is random (that is, each ball position has an equal chance of
being picked on each round), then the result will be considered random. I
think that you will agree that each ball position within the bag does not
have an equal chance of being picked when it is humans doing the picking.
Some people will pick a ball from the top, some people from around the
sides, some people will plunge their hand into the centre of the balls, some
people will pull a ball from the bottom. But it will rarely, if ever,
happen that someone will vary their way of picking a ball on each round such
that they eventually pick from each ball position.
This is not a problem if each ball has an equal chance to be in those few
ball positions that tend to get picked by each person. A person can even
pick the same ball position every time and still end up with random
selections, if any ball has an equal chance of being in that ball position.
This is why mixing is important - that is how we ensure that each ball will
eventually find itself in the position from which balls are drawn. This is
the idea behind Markov chain mixing times, whereby one can figure out for
how long something must be mixed before the end result is random. For
example, one discovers that it takes 7 good shuffles of a standard 52-card
deck to mix the deck. And I think you will agree that one quick shake of
the bag does not really fully mix the bag after each draw.
And we have practical examples of the failures of adequate mixing. For
example, the 1969 United States draft lottery involved placing capsules in a
large jar and drawing one at a time. The results did not show an even
distribution, so some groups of men were more likely to be called up earlier
than others, because (on later inspection) the mixing had been inadequate.
Suitbert:
If (1) is true, why do people's frequencies of drawn numbers not deviate
from chance in one of my one-bag experiments, when 10 participants were
told to draw the numbers 1, 2, 3, 4, and 5 in this order 12 times in one
run, thus equal frequencies of the five numbers were had to be called, one
by one (1632 times each number was called).
The frequencies of drawn numbers in this test were
1: 1627
2: 1699
3: 1676
4: 1638
5: 1619
The differences are not significant. I deem it unreasonable to claim
differences of drawn numbers for the 5x5 matrix test and not to claim the
same thing for this particular 12345 ball drawing test.
Linda:
Do you really not see the difference?
If the balls are drawn in you 12345 test in the same way as the video you
showed (when the ball drawer occasionally could see into the bag as they
were drawing a ball, and the balls showed hardly any opportunity to exchange
places during the one quick shake), the ball drawers have opportunities to
pick the ball with the right number. Since the numbers were called with
equivalent frequencies, then you'd expect the ball drawers to pick the
numbers with roughly equivalent frequencies. In this case, there is an
intention, a priori, to pick a particular number.
When the matrices are forming, there isn't a pre-determined number to be
selected, so even looking into the bag or keeping track of where each ball
is placed isn't particularly useful, unless someone decides to try to
'always pick "3" in the right hand' as a way to form the pattern. And the
non-randomness doesn't depend upon drawing each number in different
frequencies, but rather clusters where one number in one hand tends to be
drawn at the same time as one number in the other hand. I don't know if you
found differences in number frequencies in the 5x5 matrices as well as
differences in the cell frequencies. It doesn't really matter. But the
frequencies you find in an unrelated experiment which involves different
intentions really can't be used to tell you what one expects to find in this
experiment. If I want to test whether using a particular drug reduces the
number of strokes in my patients, I can't decide that I don't need a control
group because prior experiments with this drug showed that the incidence of
side-effects was the same as placebo.
Suitbert:
By the way, those 10 participants had been selected as psi-gifted for the
12345-test because they had obtained significant hit surpluses in the
standard test with deliberate number calls, conducted at home. Their hit
rate in the 12345 test was 24.65 % , expected was 20%. The hit surplus was
thus extremely significant without any nonrandom frequencies of drawn
numbers.
Linda:
But your expectations are different here, so their attempts to fulfill those
expectations should give you those results.
Suitbert:
Since you explain uneven frequencies of "hitting" the 25 cells of the
matrix in the matrix test partly by the participants' poorly shuffling the
balls in the bags, you should explain, in the same way, the fact that
individual participants, when they conduct this test repeatedly, will
produce patterns which tend to be totally different from each other. How can
the participants change the patterns? "Simply by poorly mixing" the balls,
or more or less poorly in different test sessions? This idea I consider
again as unreasonable.
Linda:
But that really should be exactly what you expect. There is no indication
that the cells are being filled intentionally. They are only being filled
unevenly. And without intention, that unevenness should vary from round to
round.
Suitbert:
Another problem arises for you with results of which one has been published
in Damien's book. You may remember the case of a pattern of high
frequencies in row 3 in the middle of the matrix, high frequencies were
evenly distributed in that row from column 1 to 5. Your explanation using
nonrandomness of drawn numbers, increased by poor mixing, would have to
claim differences of nonrandomness between the left and right bags even
though they are used simultaneously by the same individual.
Linda:
Why not? Most people have a handedness, a preference to use one hand over
the other, an increase in dexterity from one hand to the other. Why
wouldn't you expect differences in clustering? Remember, we are not talking
about the deliberate creation of patterns here. Your choice of selecting a
few a posteriori as indicative of an "aesthetic gestalt" is entirely
arbitrary.
Suitbert;
But participants will hardly shake balls in the left bag more frequently or
more intensely than the balls in the right bag, or less intensely or less
frequently. To me, these patterns escape your hypothetical explanation. The
psi hypothesis is not weakened by these results. Rather I consider it
strongly confirmed.
Linda:
Again, why not? I will be more dextrous in my shaking of the right bag than
I am of the left bag, simply because I am right-handed.
Suitbert:
I like skeptical alternative explanations because they often reveal, for me
at least , the power of human belief or disbelief which leads skeptics to
constructions and speculations surpassing considerably the
parapsychologist's readiness to consider factual possibilities that
violate our common gates of reason.
Linda:
I think that answers your questions about why other scientists are
unimpressed with this work. You seem to be claiming that your work does not encounter skepticism among your parapsychological colleagues. In any other field of science, a scientist can expect to encounter skepticism from their colleagues. It is by addressing that skepticism and gaining the kind of
evidence which confirms or disconfirms those criticisms that others are
persuaded and progress is made. If I tried to claim that I didn't need
placebo control groups in order to demonstrate that a new treatment was
effective, my work would be (quite rightly) dismissed by my colleagues. It
is my impression that your colleagues' readiness to accept your work as
factually possible, regardless of the absence of reasonable controls, is why
the field has not shown any real progress in persuading anyone but
themselves that psi can be demonstrated.
Linda
fls
5th June 2009, 08:51 AM
From Suitbert Ertel:
1. Linda, what you write about randomness of ball positions is o. k., but ball positions are not essential. What counts in this experiment is only the randomness of numbers being picked. If this experiment were conducted with only 5 balls in a bag, one ball for each number 1, 2, 3, 4, and 5, then, after mixing the balls sufficiently, these numbers would have an equal chance of being picked. Now add another set of five balls 1 to 5, the probability of the five numbers being picked does not change. Add another set, nothing will change, continue adding sets of five balls, no change. No doubt, the probability of individual balls being picked will change, balls positioned on surface in the bag filled with hundreds of balls will have a greater chance to be picked than balls positioned at the bottom. But who cares? As I told you before, just drawn numbers do not have a greater chance of being picked again than other numbers even though the chance of a just drawn individual ball being picked again, even in a bag with only 50 balls (the standard condition) does change to some extent. In your long criticism you do not address the distinction between balls and numbers which is crucial. Why not?
2. Linda, you said "Don't you see the difference?" [when you look at the frequencies of drawn numbers 1 to 5 in the 12345 condition demanding equal frequencies of drawn numbers]. The differences you and I also "see" are statistically not significant (a chi squared test has a p-value of .58). Our verdict must be based on calculation, not on visual impression.
3. You write: " Since the numbers were called with equivalent frequencies, then you'd expect the ball drawers to pick the numbers with roughly equivalent frequencies. In this case, there is an intention, a priori, to pick a particular number."
Yes, the participants are instructed to draw a particular number, in the standard condition they always have an a priori intention to draw a particular number. But intention does not explain the result which shows that – for a minority of participants – the desired numbers have a greater chance of being picked. What should be explained is how the success of these participants comes about. Since an explanation by sensory leakage or some other ordinary mechanism does not hold, psi is the only applicable concept. Psi, however, cannot presently be regarded yet as an explanation. parapsychologists want and expect psi to get explained by future science.
4. I had considered the fact that in one instance of the 5x5 matrix test one participant obtained deviations from chance only for left bag numbers, not for right bag numbers and this was unexplainable by your “not-mixing-sufficiently” argument.
I had written: “Participants will hardly shake balls in the left bag more frequently or more intensely than the balls in the right bag, or less intensely or less frequently”. And you replied: “Why not? I will be more dextrous in my shaking of the right bag than I am of the left bag, simply because I am right-handed.”
So would you expect for standard ball tests (tests with intentions), in general, more hits for tests with balls drawn with the right hand than for tests with balls drawn with the left hand, because the great majority of participants are right-handed? Yes, you have to claim that.
I conducted a large number of standard tests with two bags, thus the condition was similar to the matrix test condition. But participants with two-bags had to wish to draw particular numbers right and left. Results: The total number of trials was 50,336. The number of hits expected by chance was 100,672. The observed number of hits, average of left and right bag hits, was 11,186. The crucial observation: The difference between right hand hits (11,227) and left hand hits (11,145) was statistically not significant (p = .58). Your argument, Linda, is thus invalid.
5. Linda, you wrote: “You seem to be claiming that your work does not encounter skepticism among your parapsychological colleagues.” „It is my impression that your colleagues' readiness to accept your work as factually possible, regardless of the absence of reasonable controls, is why the field has not shown any real progress in persuading anyone but themselves that psi can be demonstrated.”
I do not claim that my work does not encounter skepticism among my parapsychology colleagues. On the contrary, I have encountered much skepticism among my colleagues because I do not share their traditional dogmatic attitude regarding methodical precautions. Some editors, e.g. John Palmer, editor of the Journal of Parapsychology, rejected two submitted papers, the editor of European Journal of Parapsychology rejected two other papers.
My strategy is “postcaution”, the idea that conceivable non-psi effects can be tested by comparing results with and without the seemingly incautious condition. So I tested the hypothetical just-drawn-number advantage which proved to be effectless. I tested the right hand over left hand advantage which was effectless. I tested many more conditions and did not find any hint that deviations from chance could be explained by ordinary mechanisms.
Suitbert
fls
5th June 2009, 09:01 AM
From Suitbert Ertel:
1. Linda, what you write about randomness of ball positions is o. k., but ball positions are not essential. What counts in this experiment is only the randomness of numbers being picked. If this experiment were conducted with only 5 balls in a bag, one ball for each number 1, 2, 3, 4, and 5, then, after mixing the balls sufficiently, these numbers would have an equal chance of being picked. Now add another set of five balls 1 to 5, the probability of the five numbers being picked does not change. Add another set, nothing will change, continue adding sets of five balls, no change. No doubt, the probability of individual balls being picked will change, balls positioned on surface in the bag filled with hundreds of balls will have a greater chance to be picked than balls positioned at the bottom. But who cares?
I'm glad you agree that some positions are more likely to be chosen than
others. Like I said, I wasn't trying to say anything you wouldn't agree to.
As I mentioned in my last post, either each ball position must be chosen
randomly (which we agreed does not happen with human drawers), or each ball much have an equal chance of finding itself in one of those few ball
positions which get chosen. Who cares? I simply wanted to make it clear
that there is a distinction to be made, as it influences how we get
deviations from randomness.
As I told you before, just drawn numbers do not have a greater chance of being picked again than other numbers even though the chance of a just drawn individual ball being picked again, even in a bag with only 50 balls (the standard condition) does change to some extent. In your long criticism you do not address the distinction between balls and numbers which is crucial. Why not?
I think you are talking about the difference between ensuring randomness by
looking at whether the conditions are set up to be random (balls), and
whether or not the end result is random (numbers)? I agree that this is
what is at issue, if that wasn't clear from what I've said. In fact, that
is essentially my main criticism of your set-up. The purpose of good
experimental design is to create a situation where sources of bias have been
eliminated a priori. Instead you are arguing for allowing bias and then
attempting to discover these biases after the fact and appropriately
model/control for them. The problem is that this fails on both theoretical
and practical grounds. It means that you have to be extra clever - you have
to be able to think about all the ways in which the results may be biased,
and you have to figure out how to appropriately model those biases. Our
statistical techniques are designed to rule-in effects, not designed to
rule-out effects. Which means that our standard for ruling-out the effect
of a bias is far too low. For example, when Dean Radin went looking for a
particular bias in his pre-sentiment experiments as a result of some
criticisms and stated that he had ruled-out the possibility that it
contributed to the results, a power analysis showed that he allowed himself
a 40% chance of missing a real effect - that is, there was only a 60% chance
that he had ruled-out the possibility that bias was responsible for his
results. And this doesn't even take into account that there may be multiple
biases involved, any one of which is difficult to detect, which may add up
to a substantial effect (when you consider that your effect size is very
small).
The practical problem with your idea is that it has been demonstrated to be
wrong. In medicine, we frequently find that effects from observational
studies are over-turned by subsequent randomized, controlled studies. And
we try to do the same thing that you are doing with those observational
studies - we think of all the ways in which the results may be biased (age,
sex, socio-economic statues, etc.) and then model/control those sources of
bias to see if there is still an effect. And yet, we have discovered that
this simply isn't good enough. Observational studies led us to believe that
hormone replacement therapy or vitamin E reduced the risk of disease - even
taking into account potential influences. Yet, when subject to an
experimental design that ensured that bias was eliminated a priori - the
randomized controlled trial - it turned out that not only did these
therapies not have a benefit, they may even be harmful.
2. Linda, you said "Don't you see the difference?" [when you look at the frequencies of drawn numbers 1 to 5 in the 12345 condition demanding equal frequencies of drawn numbers]. The differences you and I also "see" are statistically not significant (a chi squared test has a p-value of .58). Our verdict must be based on calculation, not on visual impression.
I'm sorry. I wasn't clear. I wasn't talking about whether the frequencies
of the drawn numbers were different (I agreed that they were not). I was
asking you why you didn't seem to realize that there were substantial
differences in the study design.
3. You write: " Since the numbers were called with equivalent frequencies, then you'd expect the ball drawers to pick the numbers with roughly equivalent frequencies. In this case, there is an intention, a priori, to pick a particular number."
Yes, the participants are instructed to draw a particular number, in the standard condition they always have an a priori intention to draw a particular number. But intention does not explain the result which shows that – for a minority of participants – the desired numbers have a greater chance of being picked. What should be explained is how the success of these participants comes about. Since an explanation by sensory leakage or some other ordinary mechanism does not hold, psi is the only applicable concept. Psi, however, cannot presently be regarded yet as an explanation. parapsychologists want and expect psi to get explained by future science.
But as you admit in the video you provided of the Indian boy (if it is you
who provides the critical statement at the end of the video), sensory
leakage is not ruled-out since the boy simply looked into the bag on
multiple occasions. If the other participants in this experiment did the
same thing, there is a mundane explanation for the results, are they are
exactly what you would expect to find if the participants can sometimes see
which ball they want to pick.
4. I had considered the fact that in one instance of the 5x5 matrix test one participant obtained deviations from chance only for left bag numbers, not for right bag numbers and this was unexplainable by your “not-mixing-sufficiently” argument.
I don't understand how this is unexplainable. Even from the 12 sessions in
the video you provided one can see that the amount of shaking varies quite a bit from one session to the next, such that sometimes there did not seem to be much opportunity for mixing. Without documenting the amount of shaking on each trial for each participant in the 5x5 matrix, you simply don't know how much it varied, to what extent it was adequate, and to what extent it was associated with non-random results. You claim that your analyses after the fact rule it out, but you haven't documented that your analysis was sufficient to model or discover the effect, if present.
I had written: “Participants will hardly shake balls in the left bag more frequently or more intensely than the balls in the right bag, or less intensely or less frequently”. And you replied: “Why not? I will be more dextrous in my shaking of the right bag than I am of the left bag, simply because I am right-handed.”
So would you expect for standard ball tests (tests with intentions), in general, more hits for tests with balls drawn with the right hand than for tests with balls drawn with the left hand, because the great majority of participants are right-handed? Yes, you have to claim that.
Not at all. I am merely saying that this is a potential additional source
of variation. And we are not talking about "hits" in the 5x5 matrices. You
did not establish any particular intention a priori, so there is nothing to
"hit". All you are saying is that the matrices show greater variation than
you expected. I am asking, why did you expect so little variation
considering that you introduced extra variation with your research design?
I conducted a large number of standard tests with two bags, thus the condition was similar to the matrix test condition. But participants with two-bags had to wish to draw particular numbers right and left. Results: The total number of trials was 50,336. The number of hits expected by chance was 100,672. The observed number of hits, average of left and right bag hits, was 11,186. The crucial observation: The difference between right hand hits (11,227) and left hand hits (11,145) was statistically not significant (p = .58). Your argument, Linda, is thus invalid.
You are still misunderstanding what I said. The crucial difference is that
your standard experiments - those where you establish *a priori* the result
you want, are *different* from those where you are not looking for anything
in particular beforehand. If the reason for the increased hit rate is
chance and mundane sensory input, such as was demonstrated by the Indian
boy, then it is because you established the frequency *a priori* that you
observed a certain frequency (which happens to be consistent with "random") a posteriori. Your matrix experiments do not establish a frequency beforehand, so the information from your standard experiments does not help you one way or the other.
5. Linda, you wrote: “You seem to be claiming that your work does not encounter skepticism among your parapsychological colleagues.” „It is my impression that your colleagues' readiness to accept your work as factually possible, regardless of the absence of reasonable controls, is why the field has not shown any real progress in persuading anyone but themselves that psi can be demonstrated.”
I do not claim that my work does not encounter skepticism among my parapsychology colleagues. On the contrary, I have encountered much skepticism among my colleagues because I do not share their traditional dogmatic attitude regarding methodical precautions. Some editors, e.g. John Palmer, editor of the Journal of Parapsychology, rejected two submitted papers, the editor of European Journal of Parapsychology rejected two other papers.
I am very pleased to hear that. A lot of my impression of parapsychology
comes from those who have popularized it - such as Dean Radin, Honorton,
Jessica Utts, etc. It is reassuring to hear that the lax attitude to
methodological rigour is not shared by all in the profession, particularly
that it is not shared by some of the editors to the main journals of
publication.
I am curious, though. What is your opinion of those parapsychologists whose
attitude is that methodological rigour is useful and even crucial? Are they
similarly subject to "the power of human...disbelief"?
My strategy is “postcaution”, the idea that conceivable non-psi effects can be tested by comparing results with and without the seemingly incautious condition. So I tested the hypothetical just-drawn-number advantage which proved to be effectless. I tested the right hand over left hand advantage which was effectless. I tested many more conditions and did not find any hint that deviations from chance could be explained by ordinary mechanisms.
Suitbert
And I think that is the problem in a nutshell, as you have not demonstrated
that this method allows you to discover deviations from chance that could be
explained by ordinary mechanisms, and our previous experience with this
strategy in other fields of science shows it to be misguided.
Linda
Wowbagger
5th June 2009, 03:39 PM
It's not easy for one person to convey information from two different people. But, I think this has been a surprisingly fascinating endeavor, so far.
So, I appreciate the effort, fls!
I think we are gleaming more important details, from this, than my "go for the jugular" approach, on my other thread.
Moochie
5th June 2009, 04:36 PM
Yes, interesting discussion. Thanks.
M.
fls
7th June 2009, 05:42 AM
From Suitbert Ertel:
Linda,
do you think the analogies that you bring in in your discourse are fair? (example of Dean Radin's study, hormone replacement therapy, observational studies in medicine are overturned by control studies). I consider such analogies as unfair because they lack the scrutiny nessessary for showing that the methodical matter of Ertel's study (ES)and that of those other studies (OS) is identical, not only "somehow comparable" or "analogical". If OS is apparently bad and if you put ES with OS in the same basket you are exploiting an illegitimate evaluative generalization (a halo effect) instead of analyzing ES conditions in its own right.
Linda, I must also take issue with some other rhetorical turn in your comment. When I pointed at the randomness of ball positions which has nothing to do with drawing numbers at random you replied
"I agree that this is what is at issue, if that wasn't clear from what I've said. In fact, that is essentially my main criticism of your set-up. The purpose of good experimental design is to create a situation where sources of bias have been eliminated a priori."
First, "if that wasn't clear from what I've said" sounds as if you had said that, it merely wasn't clear enough. No, Linda, you haven't said that, you did not refer at all to drawing numbers at random, you implied that lack of randomness of ball positions alone made this test invalid. After reading my comment you "agreed that this [numbers at random, not balls at random] is what is at issue. Why don't you admit that in your preceding criticism you simply disregarded what is at issue?
The turn you then take is even more disappointing: "In fact, that is essentially the main criticism of your set-up...". This sounds as if the point that I wanted you to agree with was the sore thumb of my approach, i.e., that sources of bias, in your view, must be eliminated a priori. But the issue - drawing numbers at random, not balls at random, has nothing to do with eliminating sources of bias a priori. A priori or a posteriori, precaution or "postcaution", is an issue which deserves attention, but not at this point of your contribution which, to me, has been guided by too much impression management.
Now , our views on this issue may be put in a nutshell. In my view, conceivable sensory and other ordinary factors Fi (F1 to Fk) upon hit rates in some psi test need not be excluded from an experimental setting in advance if their effects can be analysed, by appropriate methodical means, after data collection. If any hypothesized Fi would reveal effects then the test under analysis would prove invalid as a psi test. In your view this strategy is not satisfactory since experimenters do not know all Fi factors that might play a role, some unknown factor might be effective, and not all Fi can be ruled out. Therefore, precautions must be taken to exclude any possible Fi.
In your own words: "you have to be extra clever - you have to be able to think about all the ways in which the results may be biased, and you have to figure out how to appropriately model those biases. Our statistical techniques are designed to rule-in effects, not designed to rule-out effects. Which means that our standard for ruling-out the effect of a bias is far too low."
My comment on this: I am not using any "standard of ruling out bias effects", I am using transparent experimental methods to discover the presence of bias. Of course, one must be extra clever to think about leakages of various kinds that might have affected the data. But this would also be necessary for excluding bias effects in advance. You can hardly exclude inconceivable factors. The advantage of testing conceivable bias effects post hoc is that otherwise you would never know whether they do or do not exist. Moreover, and in the first place, ball drawing test conditions under natural, non-restrictive conditions have the advantage of being psi-conducive.
Linda, your last words were:
“you have not demonstrated that this method [“post-cautionary method] allows you to discover deviations from chance that could be explained by ordinary mechanisms …”
You are right. But that’s what I expected. I thought that if psi existed I would probably not be able to discover deviations from chance that could be explained by ordinary mechanisms. This hypothesis has found strong support.
Linda, you continued …
“… and our previous experience with this strategy in other fields of science shows it to be misguided”.
Do you mean that my strategy must be wrong since I did not find ordinary mechanisms? Do I have to find ordinary mechanisms, otherwise you consider my approach misguided? This seems to be the logic that I find in your last para. You seem to be cautious by excluding ordinary factors in advance irrespective of detrimental effects that this precaution exerts on psi, while I prefer saying “give psi a good chance, just test all seemingly ordinary factors, give them an equally good chance to manifest themselves and come to conclusions after analyzing the data.”
Suitbert
fls
7th June 2009, 05:51 AM
From Suitbert Ertel:
do you think the analogies that you bring in in your discourse are fair? (example of Dean Radin's study, hormone replacement therapy, observational studies in medicine are overturned by control studies).
Yes. I think that issues such as Type II error and methodologies which allow bias to influence the results are relevant to this discussion. And that the examples that I used were fair and valid examples of those issues.
I consider such analogies as unfair because they lack the scrutiny nessessary for showing that the methodical matter of Ertel's study (ES)and that of those other studies (OS) is identical, not only "somehow comparable" or "analogical". If OS is apparently bad and if you put ES with OS in the same basket you are exploiting an illegitimate evaluative generalization (a halo effect) instead of analyzing ES conditions in its own right.
This is the second time that you have made the point that I haven't written a detailed analysis and argument with respect to a particular point of criticism. I agree that I did not take you through a step-by-step scrutiny of this particular methodological matter, but this should not be a surprise, considering that this is an informal discussion through e-mail. In a formal discussion, such as a critical essay submitted to a journal in response to a piece of published research, I would lay out my entire argument, in detail. Here, because there is an opportunity for back and forth responses, it makes more sense (to me) to discover where there is already agreement and understanding, and to expand upon those issues where there isn't. For example, if you had understood the relevance of Type II error to my criticisms, I could simply move on instead of bringing it up again below.
Linda, I must also take issue with some other rhetorical turn in your comment. When I pointed at the randomness of ball positions which has nothing to do with drawing numbers at random you replied
"I agree that this is what is at issue, if that wasn't clear from what I've said. In fact, that is essentially my main criticism of your set-up. The purpose of good experimental design is to create a situation where sources of bias have been eliminated a priori."
First, "if that wasn't clear from what I've said" sounds as if you had said that, it merely wasn't clear enough. No, Linda, you haven't said that, you did not refer at all to drawing numbers at random, you implied that lack of randomness of ball positions alone made this test invalid. After reading my comment you "agreed that this [numbers at random, not balls at random] is what is at issue. Why don't you admit that in your preceding criticism you simply disregarded what is at issue?
The turn you then take is even more disappointing: "In fact, that is essentially the main criticism of your set-up...". This sounds as if the point that I wanted you to agree with was the sore thumb of my approach, i.e., that sources of bias, in your view, must be eliminated a priori. But the issue - drawing numbers at random, not balls at random, has nothing to do with eliminating sources of bias a priori. A priori or a posteriori, precaution or "postcaution", is an issue which deserves attention, but not at this point of your contribution which, to me, has been guided by too much impression management.
I'm sorry. This was the fault of my assumption that we were on the same page when it came to the relevance of ball drawing to number drawing. Let me try again.
To talk about ball drawing is to talk about whether the set-up ensures randomness. If the ball drawing is random, then by definition the numbers drawn are random. To talk about the numbers is to talk about whether the end result appears random according to tests of randomness. It is the difference between evaluating your set-up (precaution) and evaluating the end result (postcaution). My statements were intended to indicate what distinction I thought you were referring to and how they matched up with what I said earlier. If you mean something else by your distinction between ball drawing and number drawing, then please elaborate.
Now , our views on this issue may be put in a nutshell. In my view, conceivable sensory and other ordinary factors Fi (F1 to Fk) upon hit rates in some psi test need not be excluded from an experimental setting in advance if their effects can be analysed, by appropriate methodical means, after data collection. If any hypothesized Fi would reveal effects then the test under analysis would prove invalid as a psi test. In your view this strategy is not satisfactory since experimenters do not know all Fi factors that might play a role, some unknown factor might be effective, and not all Fi can be ruled out. Therefore, precautions must be taken to exclude any possible Fi.
I can give you an example. You "ruled-out" the possibility of learning effects (like temperature or other tactile information) by speculating that learning effects would not be present at the start of the experiment. And you found that "the students' hit rates, those of high and low scorers alike, exceed chance expectancy already with first trials, there is no indication of increasing performance over time" (http://www.parapsych.org/papers/38.pdf). However, when I think about the problem, I wonder about the issue of optional starting whereby participants try out the procedure a few times before they start recording their trials. If they are unsuccessful, they are called "practice runs" and are unrecorded. If they are successful they count as the start of the trial. This would artificially inflate the hit rates on the first trials and would obscure a learning effect.
In your own words: "you have to be extra clever - you have to be able to think about all the ways in which the results may be biased, and you have to figure out how to appropriately model those biases. Our statistical techniques are designed to rule-in effects, not designed to rule-out effects. Which means that our standard for ruling-out the effect of a bias is far too low."
My comment on this: I am not using any "standard of ruling out bias effects", I am using transparent experimental methods to discover the presence of bias.
Yes, you are using a standard. When you report on the possible influence of a particular bias, you report a p-value and dismiss the possibility on the basis of those p-values. The problem is that your p-value reflects a Type I error, when what you are really interested in is the possibility of a Type II error.
Of course, one must be extra clever to think about leakages of various kinds that might have affected the data. But this would also be necessary for excluding bias effects in advance. You can hardly exclude inconceivable factors.
But that's the beauty of control groups. You do not have to know what bias effects may be present in advance. The control group simply blocks those biases (the ones you know about and the ones you don't) from having an influence on the results. You can exclude inconceivable factors by making it impossible for their influence to be felt. For example, the well-known use of a placebo in medical experiments blocks the influence of Hawthorne effects (among others) - people report that they feel better just because someone asks if they do - by preventing this effect from acting only on one group (treatment) and not the other (control).
The advantage of testing conceivable bias effects post hoc is that otherwise you would never know whether they do or do not exist. Moreover, and in the first place, ball drawing test conditions under natural, non-restrictive conditions have the advantage of being psi-conducive.
Why are psi-conducive conditions identical to data manipulation conditions? Does this really make sense?
Linda, your last words were:
“you have not demonstrated that this method [“post-cautionary method] allows you to discover deviations from chance that could be explained by ordinary mechanisms …”
You are right. But that’s what I expected. I thought that if psi existed I would probably not be able to discover deviations from chance that could be explained by ordinary mechanisms. This hypothesis has found strong support.
What I meant is that you have not demonstrated that this method allows you to discover ordinary mechanisms. That is, if ordinary mechanisms were present, how do you know that you would reliably be able to discover them? You are trying to conclude that because you did not discover ordinary mechanisms that ordinary mechanisms must not be present. But by doing this you may be committing a Type II error. We are used to trying to avoid a Type I error - concluding that an effect is present when it is not. And the p-value is used to represent the probability of a Type I error. But when we want to rule out a particular possibility (in medicine, we may want to rule out the possibility that a drug is less safe than an alternative) then we need to look instead at the power level - the possibility that we haven't concluded an effect is not present, when it is. This is why I brought up the example from Radin. He was attempting to avoid a Type II error, but he erroneously did so by looking at the probability of a Type I error. If you actually calculate the probability of a Type II error, you discover that it was unreasonably high. That is, there is a good chance that if an ordinary mechanism was present, he would have failed to discover it.
Linda, you continued …
“… and our previous experience with this strategy in other fields of science shows it to be misguided”.
Do you mean that my strategy must be wrong since I did not find ordinary mechanisms? Do I have to find ordinary mechanisms, otherwise you consider my approach misguided? This seems to be the logic that I find in your last para. You seem to be cautious by excluding ordinary factors in advance irrespective of detrimental effects that this precaution exerts on psi, while I prefer saying “give psi a good chance, just test all seemingly ordinary factors, give them an equally good chance to manifest themselves and come to conclusions after analyzing the data.”
Suitbert
No, I mean that the strategy you have chosen fails to find ordinary mechanisms even when they are present. This means that you must be cautious about concluding anything from your failure. If you wish for other scientists to believe that your "postcaution" is a valid means of exploration, then you would need to show how you tested the idea of postcaution and how it performed. This could involve something like introducing small, ordinary mechanisms into the procedure to see if blinded researchers discover these effects.
I understand the desire to attempt to wring useful information out of data. I look forward to seeing experiments where the information gleaned from exploratory studies is used to design large, methodologically rigorous experiments from which reliable conclusions can be drawn. What would be of most interest to me is a discussion on how to form a control group for psi experiments.
Linda
fls
7th June 2009, 06:12 AM
It's not easy for one person to convey information from two different people. But, I think this has been a surprisingly fascinating endeavor, so far.
I wasn't sure whether this would work, or whether it would even be of interest, so I appreciate hearing from you and Moochie. I will keep going with this as long as it seems to be working.
So, I appreciate the effort, fls!
I think we are gleaming more important details, from this, than my "go for the jugular" approach, on my other thread.
I agreed with you that this discussion was different from yours (which is why I thought that both would benefit from a split). But I like that there's an opportunity to observe and discuss a wide variety of approaches on the JREF forum.
Linda
fls
7th June 2009, 06:23 AM
Suitbert,
I am still curious about how you view your data in light of the fact that upon review of the video of one subject, your set-up allowed the subject to look into the bag, even under supposedly controlled conditions? Does it make sense to look for whether ball temperature influenced the results, but to ignore the influence of normal looking?
I am also still curious as to how you view those parapsychologists who prefer methodological rigour rather than the use of postcaution? Do you also consider their attitude detrimental to the study of psi, or a demonstration of the power of human disbelief?
Linda
fls
7th June 2009, 07:29 AM
From Damien Broderick and Suitbert Ertel (sent to me prior to my reply to Suitbert's previous e-mail):
Damien (06.06.2009) to Suitbert:
I've just re-read your exchanges on her new JREF thread. It looks to me as if she's confusing your methods (which use replacement, even though I'd still prefer to see someone else make the replacement each time with a ball that hasn't been touch by the subject) and the sort of thing that happened in the 1969 US draft ballot (where, I gather, there was no replacement).
Would it be fair to insist that she--or someone acting on her behalf--run at least a sample of replication trials to see what sort of patterns, if any, she can produce using 180 or more two-handed draws? Yes, you are making the claims that need to be tested--but I think she is also. The interesting aspect here is that unless if she or her Ss were significantly psychic, the results would probably be a random scatter. She would *not* replicate the pattern effects. But then she'd object that this is just because she is a good mixer/random generator. Sigh.
Suitbert to Damien (06.06.2009.).
I agree with your idea in principle that someone should try to replicate the pattern effect. But I would suggest to use psi-gifted participants in the first place whose results should be compared with results of non-gifted participants. If the gifted shows patterns and the ungifted not, then the difference of results must be explained. Linda would probably resort to mere speculation about unknown individual differences of muscular innervations or something like that.
Replacement with a ball that has not been touched by the subject? One might try that. One might use, say, five bags and the trials might be done with randomly changing the bags for successive trials without using the same bag in immediate succession. But one has to consider a Ryzl effect. Ryzl demonstrated, using his psi star Pawel Stepanek, that Pawel tended to select, among many envelopes, the one that he had selected at earlier trials, recognition of features of the envelopes was excluded as a possible explanation. This effect is similar to (not the same as) so-called psychometry. So the conditions for psi to occur might possibly be diminished.
I would rather try to convince Linda that psi exists by demonstrating some other ball test effect, the most convincing ones have been conducted by my Ukrainean psi stars. Two participants have a bag each and they draw a ball from their bags simultaneously wishing to draw the same number. They do not see each other. The number of hits is much larger than expected. Or the two participants wish to draw two numbers whose sum is 6 (1+5, 2+4, 3+3), again their results are very significant. In another test one psi-gifted participant threw a die which was kept under a mug and the other psi-gifted participant wanted to draw, from a bag, the number face-up number under the mug. Only after drawing the number the mug was lifted (if the number was 6, the trial was repeated because there were only numbers 1 to 5 written on the balls, as usual).
By the way, some suspicion might be diminished by using, instead of pingpong balls, little containers as they are used in cosmetics. I have been using them already, I call them "psi-pods". Numbers or words or whatever written on pieces of paper are put inside. The pods are mixed and one is drawn, the lid is opened and the content is recorded. For my Ukrainean pi-stars the extremely significant results of their tests with balls was replicated. I met Rupert Sheldrake some weeks ago, he showed interest in psi-pods and I gave him a set because he wanted to see whether he would manifest psi power in that test. He did not tell me yet whether he conducted this test in the meantime.
fls
7th June 2009, 08:44 AM
From Damien Broderick and Suitbert Ertel (sent to me prior to my reply to Suitbert's previous e-mail):
Damien (06.06.2009) to Suitbert:
I've just re-read your exchanges on her new JREF thread. It looks to me as if she's confusing your methods (which use replacement, even though I'd still prefer to see someone else make the replacement each time with a ball that hasn't been touch by the subject) and the sort of thing that happened in the 1969 US draft ballot (where, I gather, there was no replacement).
Replacement alters the probability that each number will be drawn - the
calculation becomes somewhat more difficult if replacement isn't used - and
it also will influence the probability of a person calling a particular
number. It also is relevant to issues like ball temperature or other
tactile/visual cues. However, what was at issue in the 1969 draft ballot
was not the issue of replacement vs. no replacement, but the issue of
inadequate mixing.
Would it be fair to insist that she--or someone acting on her behalf--run at least a sample of replication trials to see what sort of patterns, if any, she can produce using 180 or more two-handed draws? Yes, you are making the claims that need to be tested--but I think she is also. The interesting aspect here is that unless if she or her Ss were significantly psychic, the results would probably be a random scatter. She would *not* replicate the pattern effects. But then she'd object that this is just because she is a good mixer/random generator. Sigh.
This is something that should be of critical interest from the start, not
something that is suggested only when criticism is encountered. Seriously,
do you not wonder what pattern a particular procedure generates in the
absence of psi???
Suitbert to Damien (06.06.2009.).
I agree with your idea in principle that someone should try to replicate the pattern effect. But I would suggest to use psi-gifted participants in the first place whose results should be compared with results of non-gifted participants. If the gifted shows patterns and the ungifted not, then the difference of results must be explained. Linda would probably resort to mere speculation about unknown individual differences of muscular innervations or something like that.
There are many parallels to the study of psi and the study of interventions
in medicine. The effects are highly variable and easy to miss, all sorts of
opportunities arise for cognitive and other biases to influence the results,
many associations are tested, many exploratory studies are small and
uncontrolled, etc. You will notice that once we have established that an
intervention is reasonably safe and that an effect may be present (a step
that is unnecessary for psi research) we test the intervention by
identifying a group of people a priori who have a high likelihood of a
particular outcome (for example, people who have just a heart-attack are at
much higher risk of having another heart attack than the general population)
and dividing this group into two - a treatment group and a control group.
The parallel in psi research would be to find a way to identify psi-gifted
people a priori (not on the basis of the results of your intervention) and
divide those people into two groups. Then you can subject one group to your
control conditions and the other group to your control conditions plus one
small change in order to test whether that small change allows the
introduction of psi. For example, your control condition could be the use
of multiple bags in which a helper replaces the just drawn number and mixes
each bag thoroughly, and the choice of bag to use for each trial is
determined randomly (as you suggest below), numbers that are placed inside
identical opaque containers, conditions which prevent the drawer from
looking inside the bag, no feedback, no notification of which number has
been selected for matching a priori, etc. Then you could introduce one
condition at a time (feedback, for example) to see whether that allows for
psi. With a little effort, it would even be possible to do this under
conditions which would otherwise be considered psi-conducive. For example,
if your "psi-pods" can be sealed and are completely opaque, a number can be
sealed inside so that your participants can simply take the bag home with
them. If you instruct them to simply draw out the balls in order -
12345123451234... - and number each container with the order in which it was
drawn (note that this is done without replacement), the containers can be
brought back to you still sealed. Then you can open them in order to record
hits and misses without the opportunity for the participant to deliberately
manipulate the results.
Replacement with a ball that has not been touched by the subject? One might try that. One might use, say, five bags and the trials might be done with randomly changing the bags for successive trials without using the same bag in immediate succession. But one has to consider a Ryzl effect. Ryzl demonstrated, using his psi star Pawel Stepanek, that Pawel tended to select, among many envelopes, the one that he had selected at earlier trials, recognition of features of the envelopes was excluded as a possible explanation. This effect is similar to (not the same as) so-called psychometry. So the conditions for psi to occur might possibly be diminished.
But again, this could be tested and controlled - at what point does changing
the bag take away the effect of psi? Is it by using 5 bags rather than one?
Is it by having someone else mix the bags? Is it by making the choice of
bag random, rather than the choice of the participant?
I would rather try to convince Linda that psi exists by demonstrating some other ball test effect, the most convincing ones have been conducted by my Ukrainean psi stars. Two participants have a bag each and they draw a ball from their bags simultaneously wishing to draw the same number. They do not see each other. The number of hits is much larger than expected. Or the two participants wish to draw two numbers whose sum is 6 (1+5, 2+4, 3+3), again their results are very significant. In another test one psi-gifted participant threw a die which was kept under a mug and the other psi-gifted participant wanted to draw, from a bag, the number face-up number under the mug. Only after drawing the number the mug was lifted (if the number was 6, the trial was repeated because there were only numbers 1 to 5 written on the balls, as usual).
By the way, some suspicion might be diminished by using, instead of pingpong balls, little containers as they are used in cosmetics. I have been using them already, I call them "psi-pods". Numbers or words or whatever written on pieces of paper are put inside. The pods are mixed and one is drawn, the lid is opened and the content is recorded. For my Ukrainean pi-stars the extremely significant results of their tests with balls was replicated. I met Rupert Sheldrake some weeks ago, he showed interest in psi-pods and I gave him a set because he wanted to see whether he would manifest psi power in that test. He did not tell me yet whether he conducted this test in the meantime.
That is good to hear, because I have a solution to your lack of funds and
the lack of interest from other scientists. If you have not already heard
of this, James Randi offers one million dollars to anyone who can
demonstrate psi in the way that you describe. All you have to do is have
your Ukrainian psi stars demonstrate their very significant results and you
not only have the funds you need for further research, but also the
attention of other scientists as well.
Information about this test is here:
http://www.randi.org/site/index.php/1m-challenge.html
Linda
Wowbagger
7th June 2009, 10:44 AM
Would it be fair to insist that she--or someone acting on her behalf--run at least a sample of replication trials to see what sort of patterns, if any, she can produce using 180 or more two-handed draws?
If you want to re-run the experiment, here is my suggestion: Do it several different ways, with the same subjects, with increasing levels of control. Here is one example:
1. Have the participants do the first round from home, unsupervised. (The rest of the rounds will be done in a lab, with supervision.)
2. For the second trial, you can duplicate the circumstances of video. Tell the participants that they should NOT look in the bag. However, do NOT stop them when it looks like they are sneaking a peak. Just let them do what they feel they gotta do.
3. For the third round, the participants will be blind-folded. However, in this stage, they will be allowed to use balls that could be determined by feel: perhaps the numbers will be subtly raised or indented from the surface. (These same balls could also be used in the previous stages, if you wish.)
4. However, for the fourth round, every single ball must feel exactly the same. The numbers will be printed on the ball (or, if it is a glass marble, the number could be embedded into the center), but the surface for all of them should be perfectly smooth. Each ball must weigh exactly the same, and be of exactly the same size. Etc. (And, the participants are still blindfolded, by the way.)
5. For the fifth round, we do everything in the fourth; but we replace the bags with jars, that are on "paint-can-shaker"-like devices. The shakers will shake the balls in a thorough and consistent manner between every grab. And, participants will not be allowed to touch the jars, while the shaking process is going on.
My prediction is that the results will start off appearing "very significant", but will become less and less and less significant with each new control placed upon it.
fls
7th June 2009, 04:13 PM
Quoted material was written by me (Linda), text is from Damien Broderick:
Seriously, do you not wonder what pattern a particular procedure generates in the absence of psi???
I agree that control runs should be conducted, using (say) a
selection of the Ss who scored only at chance level during screening
trials. As I understand it, Suitbert trialed the procedure himself,
and since he's not "psi-gifted" his results would serve as at least a
preliminary answer.
But this question does not even start to address the more challenging
results Dr. Ertel has mentioned; e.g.,
" Two participants have a bag each and they draw a ball from their
bags simultaneously wishing to draw the same number. They do not see
each other. The number of hits is much larger than expected. Or the
two participants wish to draw two numbers whose sum is 6 (1+5, 2+4,
3+3), again their results are very significant. In another test one
psi-gifted participant threw a die which was kept under a mug and the
other psi-gifted participant wanted to draw, from a bag, the number
face-up number under the mug. Only after drawing the number the mug
was lifted (if the number was 6, the trial was repeated because there
were only numbers 1 to 5 written on the balls, as usual)."
The only non-anomalous explanation I can see here is cheating or
subtle signaling of some kind. Do you explain this sort of outcome by
inadequate mixing etc?
Damien
fls
7th June 2009, 04:15 PM
I agree that control runs should be conducted, using (say) a
selection of the Ss who scored only at chance level during screening
trials. As I understand it, Suitbert trialed the procedure himself,
and since he's not "psi-gifted" his results would serve as at least a
preliminary answer.
Actually, control runs should be performed by "psi-gifted" participants, but
under conditions where psi does not operate. It doesn't help (or make much
sense) to use participants who score at chance level as controls. It's like
putting people without any heart disease into the placebo control group when you are testing the effects of aspirin in people with heart disease.
But this question does not even start to address the more challenging
results Dr. Ertel has mentioned; e.g.,
" Two participants have a bag each and they draw a ball from their
bags simultaneously wishing to draw the same number. They do not see
each other. The number of hits is much larger than expected. Or the
two participants wish to draw two numbers whose sum is 6 (1+5, 2+4,
3+3), again their results are very significant. In another test one
psi-gifted participant threw a die which was kept under a mug and the
other psi-gifted participant wanted to draw, from a bag, the number
face-up number under the mug. Only after drawing the number the mug
was lifted (if the number was 6, the trial was repeated because there
were only numbers 1 to 5 written on the balls, as usual)."
The only non-anomalous explanation I can see here is cheating or
subtle signaling of some kind. Do you explain this sort of outcome by
inadequate mixing etc?
Damien
I find that it isn't really helpful to comment on unpublished and
unsubstantiated reports. These sorts of results would certainly be
appropriate for the James Randi Million Dollar Challenge, though.
Linda
fls
7th June 2009, 04:17 PM
From Damien Broderick:
Actually, control runs should be performed by "psi-gifted" participants, but under conditions where psi does not operate.
Since psi (to those who regard it as a real factor) is known to be
usually of low effect size and more importantly haphazard in
manifestation, rather like creativity or falling in love, how do you
propose to establish conditions under which it does not operate? With
your heart attack patients, the relevant variable is a certain
treatment, not their a priori vulnerability to MIs. (There's one
possible criterion that's still controversial within parapsychology,
sidereal time windows, and that might be worth pursuing.)
As for the "Randi Million Dollar Challenge"--don't make me laugh.
Some chewy data and analysis at
http://www.steorn.com/forum/comments.php?DiscussionID=61632&page=1#Item_0
(bearing in mind that Steorn, which claims to have discovered
over-unity power generation, is itself dubious, of course--although
the url'd discussion isn't from Steorn but by interested onlookers on
one of their rather undisciplined forums)
Note this remarkable statement from Kramer ["Randi's 1 million
challenge assistant in 2005"]
'... However, my understanding is that most applicants fail rather
gloriously, performing far below CHANCE ....'
http://forums.randi.org/archive/index.php/t-34848.html
I assume this "far below CHANCE" claim is just the typical
Randi-grade ignorance of statistics showing through rather than
evidence of massive psi-missing under stress and assault--although
that would be a very charming thing to find substantiated. :)
Damien
fls
7th June 2009, 04:39 PM
From Damien Broderick:
Since psi (to those who regard it as a real factor) is known to be
usually of low effect size and more importantly haphazard in
manifestation, rather like creativity or falling in love, how do you
propose to establish conditions under which it does not operate?
That's the key, isn't it? There have been plenty of demonstrations of
conditions under which it does not operate. A more rigorous investigation
of these conditions would be very valuable to the field. For example, if
you took my earlier suggestion - sent people home with sealed psi-pods and
told half of them to draw out numbers in the following order,
1234512345123..., labelling each psi-pod with the order in which it was
drawn - and sent half of the people home with instructions to simply draw
out one psi-pod at a time and label them as to the order in which they were
drawn, what sort of result do you think you'd get? Would you find a
difference between the two groups as to whether or not the order matches a
12345 order?
With your heart attack patients, the relevant variable is a certain
treatment, not their a priori vulnerability to MIs.
Exactly. We don't vary the a priori vulnerability (i.e. psi) between
groups, rather we vary the treatment (i.e. conditions under which psi can be
demonstrated) between groups.
As for the "Randi Million Dollar Challenge"--don't make me laugh.
Some chewy data and analysis at
http://www.steorn.com/forum/comments.php?DiscussionID=61632&page=1#Item_0
I've been reading the Challenge applications and resultant discussions since
before I've been a member of the forum. The vast majority of failures seem
to be because the claims and applications are incoherent, and because the
claim isn't suitable for testing under these conditions. Neither of those
problems apply to Dr. Ertel. He is perfectly capable of putting together a
coherent application and claim. And his description is just like those
claims which have gone on to be successfully tested.
Note this remarkable statement from Kramer ["Randi's 1 million
challenge assistant in 2005"]
'... However, my understanding is that most applicants fail rather
gloriously, performing far below CHANCE ....'
http://forums.randi.org/archive/index.php/t-34848.html
I assume this "far below CHANCE" claim is just the typical
Randi-grade ignorance of statistics showing through rather than
evidence of massive psi-missing under stress and assault--although
that would be a very charming thing to find substantiated. :)
Damien
Yeah, he obviously mis-spoke. :) I might guess that he was thinking that
at least some testees should get close to their proposed success rate, just
by chance, but none of them even got close. I wouldn't put too much weight
on an off-hand remark in any case.
Based on what I've seen accomplished already, it is definitely feasible for
Dr. Ertel to make an acceptable proposal and proceed with testing (and
presumably winning) the James Randi Million Dollar Challenge.
Linda
fls
7th June 2009, 05:00 PM
From Damien Broderick:
Based on what I've seen accomplished already, it is definitely feasible for Dr. Ertel to make an acceptable proposal and proceed with testing (and presumably winning) the James Randi Million Dollar Challenge.
I must leave it to Suitbert to comment, but I gather this is not out
of the question. (However, it's obvious that the logistics would be
dauntingly formidable.)
Damien
fls
7th June 2009, 05:03 PM
From Damien Broderick:
I must leave it to Suitbert to comment, but I gather this is not out
of the question. (However, it's obvious that the logistics would be
dauntingly formidable.)
Damien
As far as logistics go, Randi works with international groups so that the
testing can be done locally. The difficulty may lie in the availability of
the psi-stars.
Linda
fls
7th June 2009, 06:50 PM
From Damien Broderick
As far as logistics go, Randi works with international groups so that the testing can be done locally. The difficulty may lie in the availability of
the psi-stars.
Linda
I don't wish to go on about this, but that would surely be one
factor. Another is that any anomaly with the small effect size that
psi shows under strict scrutiny requires (as you know, from studies
of aspirin etc) a *huge* number of trials. My sense is that the Randi
tests, when they actually happen, are aimed at idiots and frauds who
make preposterous claims for their effectiveness. I once read a BOTE
calc by Dean Radin where he estimated that the costs involved in
satisfying the Randi criteria (which are on the order of p =
0.000001) would require advance funding approximating the million
dollars prize--assuming it actually exists and would be paid without
equivocation.
Damien
fls
7th June 2009, 07:17 PM
From Damien Broderick:
I don't wish to go on about this, but that would surely be one
factor. Another is that any anomaly with the small effect size that
psi shows under strict scrutiny requires (as you know, from studies
of aspirin etc) a *huge* number of trials. My sense is that the Randi
tests, when they actually happen, are aimed at idiots and frauds who
make preposterous claims for their effectiveness. I once read a BOTE
calc by Dean Radin where he estimated that the costs involved in
satisfying the Randi criteria (which are on the order of p =
0.000001) would require advance funding approximating the million
dollars prize--assuming it actually exists and would be paid without
equivocation.
Damien
First of all, Radin is talking about using the ganzfeld, which is quite
different from using a psi-star, and probably isn't really appropriate for
the Million Dollar Challenge. Second of all, a BOTE calc shows that the
number of ganzfeld trials needed to satisfy Randi's criteria is more like
37, which is very different from the 1000's of trials that Radin came up
with. But Radin did this by using grossly inappropriate parameters, such as
asking for odds against chance of 100,000,000 to one (Randi doesn't specify
the odds, but previous tests have required odds on the order of 100 to 1000
to one) and choosing a power level of 99%.
More about this here:
http://forums.randi.org/showthread.php?postid=4420509#post4420509
From the description Suitbert has given of his psi-stars, it should be easy
to accumulate enough trials to surpass Randi's standards within an hour or
two.
Linda
steenkh
8th June 2009, 01:16 AM
I wasn't sure whether this would work, or whether it would even be of interest, so I appreciate hearing from you and Moochie. I will keep going with this as long as it seems to be working.
Please record a "me too"! :clap:
Ivor the Engineer
8th June 2009, 04:06 AM
Please record a "me too"! :clap:
And me.:)
fls
8th June 2009, 04:42 AM
From Damien Broderick:
a BOTE calc shows that the
number of ganzfeld trials needed to satisfy Randi's criteria is more like
37, which is very different from the 1000's of trials that Radin came up
with. But Radin did this by using grossly inappropriate parameters, such as asking for odds against chance of 100,000,000 to one (Randi doesn't specify the odds, but previous tests have required odds on the order of 100 to 1000 to one)
I gather that at least two independent tests are required: a
preliminary screening and a Challenge test. If each requires p <0.001, there's the one in a million. But wait, tip of the iceberg:
http://www.skepticalinvestigations.org/exam/Dace_amazing3.htm
an article by Ted Dace :
SKEPTICAL OF THE SKEPTICS
The Amazing 3 Meeting - Las Vegas, January 13-16, 2005
"according to the rules of Randi's competition, if a psychic ability
is proven, he must pay up. Randi stated to me that a preliminary test
would have to yield a probability of one in a thousand that the
results were due to chance. After passing the preliminary, the
investigator could commence with the formal test, which would have to
yield a probability against chance of one in a million."
fls
8th June 2009, 05:29 AM
From Damien Broderick:
I gather that at least two independent tests are required: a
preliminary screening and a Challenge test. If each requires p <0.001, there's the one in a million.
It depends. The preliminary test is not always required, especially when testing someone who already has a presence in the field. For example, his test of Jacques Benveniste and his homeopathy experiments was done without preliminary screening, as was one of his more recent televised tests, on Derek Ogilvie.
He doesn't really specify p<0.001 as a requirement, that has simply been the requirement on some of the Challenge tests. Other tests have been undertaken with lower and higher p-values. I don't know where you got "one in a million" from.
But wait, tip of the iceberg:
http://www.skepticalinvestigations.org/exam/Dace_amazing3.htm
an article by Ted Dace :
SKEPTICAL OF THE SKEPTICS
The Amazing 3 Meeting - Las Vegas, January 13-16, 2005
"according to the rules of Randi's competition, if a psychic ability
is proven, he must pay up. Randi stated to me that a preliminary test
would have to yield a probability of one in a thousand that the
results were due to chance. After passing the preliminary, the
investigator could commence with the formal test, which would have to
yield a probability against chance of one in a million."
I wouldn't give much weight to some off-hand remarks that may or may not have been reported correctly. I would look instead at what the practice has been.
I didn't mean to give the impression that one can definitely win the Challenge with a single test at p=0.01 or 0.001. I simply wished to point out that there have been circumstances where a single successful test would have won the Challenge, and that the p-values have varied around levels that are considerably larger than one in a million or the one in one hundred million that Radin used.
Linda
fls
8th June 2009, 08:57 PM
From Suitbert Ertel:
Linda,
regarding your first query in your message yesterday, you argue that I
ignored the influence of normal looking. You did not ask me whether I did or
did not ignore that. Since you implied it without evidence it seems that you
want me to have ignored the influence of normal looking. Somewhat biased,
isn't it?
No doubt, in the filmed test session Kannan might have peeped into the bag
sometimes or even often in order to make ball selections aided by visual
support. Can we tell from observing him on the video whether he peeped into
the bag? Not with certainty.
I was aware at that time of the possibility that Kannan might have peeped
into the bag, because I once made a note between test sessions about the
possible peeping factor. I looked the note up and summarize here:
1. If Kannan had obtained hits by peeping into the bag he would have mixed
up similar numbers (2 and 5 for example) more often than dissimilar numbers
(1 and 5) (1 is written as a dash in India as in England). This I did not
find. The percentage of mixing up 2 with 5 when he called 2 was 14.01%, the
percentage of mixing up the two numbers was 15.3 % when he called 5. The
percentages of mixing up 1 with 5 which should be less (dissimilar numbers)
was 10.4% and 14.0%, which is somewhat less, but not significantly so. Let
me provide all mixing up cases (drawn wrong numbers in brackets). The
percent values are given because the differences among calls are evened out.
1 (2) 20.5%
1 (3) 09.4%
1 (4) 08.7%
1 (5) 10.4%
2 (1) 25.1%
2 (3) 13.5%
2 (4) 11.1%
2 (5) 14.0%
3 (1) 13.6%
3 (2) 15.3%
3 (4) 16.0%
3 (5) 16.7%
4 (1) 12.9%
4 (2) 15.7%
4 (3) 10.1%
4 (5) 16.7%
5 (1) 14.0%
5 (2) 15.3%
5 (3) 11.2%
5 (4) 17.1%
You can see that the frequencies of drawing 2 when 1 was called and drawing
1 when 2 was called are the only conspicuous cases. The reason cannot be
figural similarity. The reason is arithmetical "similarity". I am entitled
to say that, I think, because this result has been obtained on a larger
scale in my standard sample of 238 student participants, i.e. also with
calls of 3 (here 2 and 4 were more frequently drawn than 1 and 5) and with
calls of 4 (here 3 and 5 were more frequently drawn than 1 and 2). For calls
of 5 the rule of picking neighboring numbers did not apply (4 and 3 were
not drawn more frequently than 1 and 2). I am telling you this, Linda,
because this result with "good misses" which also shows up to some extent in
Kannan's data does not support the skeptics' assumption that hits above
chance can be explained by unintended or fraudulent sensory leakage.
2. I tested Kannan as experimenter in 1996, 1998, and 2001, the total number
of trials was 5880. It is unreasonable to assume that Kannan, if he cheated
by peeping into the bag, could have done this throughout without me noticing
this. Once I a while, he would have needed longer than cursory peeping
durations forgetting to keep his searching the numbers unnoticeable. I was
aware of this possibility as I said. Even Randi, I think, could not cheat
under my standard conditions for 20 hours (total testing time with Kannan).
3. All visitors to India are familiar with interruptions of current supply
once or twice almost every day. This occurred frequently when I tested
Kannan at night. We lit a candle when this occurred. Kannan could not see
the numbers during such periods which lasted for half an hour, an hour or
longer. He had to hold the drawn balls close to my eyes so that I could see
and put down the numbers. If Kannan had obtained high hit scores during the
day by fraudulent peeping, his scores would have dropped remarkably, I would
have noticed this drop of hits, I can be sure, because Kannan's hit rate was
remarkable (42.5% hits in 1996 and 44.3% in 1998 while 20% are expected. In
2001 I used the draw-from-two-bags condition which is dissimilar, but the
combined results in 2001 were also remarkable).
Linda, it is very easy for skeptics to put verbal obstacles on the road of
thoroughly working psi researchers and to attribute negligence on them by
quick conjectures. Do you think I have deserved to be suspected to have
ignored, for sure, the peeping factor? Much more do I have to endure in
discussions with other skeptics - and also with skeptical colleagues from
our research field, more about that later.
Suitbert
fls
8th June 2009, 09:02 PM
From Suitbert Ertel:
Linda,
regarding your first query in your message yesterday, you argue that I
ignored the influence of normal looking. You did not ask me whether I did or
did not ignore that. Since you implied it without evidence it seems that you
want me to have ignored the influence of normal looking. Somewhat biased,
isn't it?
I didn't mean to imply that you had ignored it altogether. You even mention at the end of the video those postcaution analyses that you mention here. But in the papers that have been referenced, you not only fail to draw attention to this, you specifically seem to deny that it could occur - "attempts to peep into the box or other such fraudulent actions would hardly go unnoticed, apparently they have not occurred." And realistically, this is a very important issue which deserves at least as much attention as the four pages you devote to the idea of "good misses", and which, once recognized, should have led to a change in procedure so that the bag opening was screened from the drawer. Instead you make a few guesses about how this would show up in other ways, tested those ideas (without any indication of the probability of Type II error), and concluded that it couldn't be so. Yet an equally valid conclusion from your failure to confirm any of your guesses is that your guesses were simply wrong to begin with.
I don't think it is "without evidence" that I suggest you have not given this the attention it deserves. You have yet to assure me that the ball-drawing procedure in your other tests is substantially different from that used by Kannan.
No doubt, in the filmed test session Kannan might have peeped into the bag
sometimes or even often in order to make ball selections aided by visual
support. Can we tell from observing him on the video whether he peeped into
the bag? Not with certainty.
I was aware at that time of the possibility that Kannan might have peeped
into the bag, because I once made a note between test sessions about the
possible peeping factor. I looked the note up and summarize here:
1. If Kannan had obtained hits by peeping into the bag he would have mixed
up similar numbers (2 and 5 for example) more often than dissimilar numbers
(1 and 5) (1 is written as a dash in India as in England). This I did not
find. The percentage of mixing up 2 with 5 when he called 2 was 14.01%, the
percentage of mixing up the two numbers was 15.3 % when he called 5. The
percentages of mixing up 1 with 5 which should be less (dissimilar numbers)
was 10.4% and 14.0%, which is somewhat less, but not significantly so. Let
me provide all mixing up cases (drawn wrong numbers in brackets). The
percent values are given because the differences among calls are evened out.
1 (2) 20.5%
1 (3) 09.4%
1 (4) 08.7%
1 (5) 10.4%
2 (1) 25.1%
2 (3) 13.5%
2 (4) 11.1%
2 (5) 14.0%
3 (1) 13.6%
3 (2) 15.3%
3 (4) 16.0%
3 (5) 16.7%
4 (1) 12.9%
4 (2) 15.7%
4 (3) 10.1%
4 (5) 16.7%
5 (1) 14.0%
5 (2) 15.3%
5 (3) 11.2%
5 (4) 17.1%
Doesn't this data simply show that he didn't mix up 2's and 5's when looking at them? Do we usually expect people to do this?
You can see that the frequencies of drawing 2 when 1 was called and drawing 1 when 2 was called are the only conspicuous cases. The reason cannot be figural similarity. The reason is arithmetical "similarity". I am entitled
to say that, I think, because this result has been obtained on a larger
scale in my standard sample of 238 student participants, i.e. also with
calls of 3 (here 2 and 4 were more frequently drawn than 1 and 5) and with
calls of 4 (here 3 and 5 were more frequently drawn than 1 and 2). For calls
of 5 the rule of picking neighboring numbers did not apply (4 and 3 were
not drawn more frequently than 1 and 2). I am telling you this, Linda,
because this result with "good misses" which also shows up to some extent in
Kannan's data does not support the skeptics' assumption that hits above
chance can be explained by unintended or fraudulent sensory leakage.
Why not?
Your "good misses" analysis (from http://www.psych.uni-goettingen.de/home/ertel/ertel-dir/downloads/ertelchapterwithfigurespdf.pdf) depends upon a variable definition of what constitutes a "good miss". For some numbers it means the number above and the number below, for other numbers it means the number above and the next number after that, and for other numbers it means the number below and the next number after that. If you redo your analysis using a consistent definition for a "good miss", such as a miss for #2 is #3 and #4, and a miss for #3 is #4 and #5, the significant differences disappear. You cannot draw reliable conclusions from data-dredging after the fact whereby you keep grouping numbers together in various combinations until you find some groups that show a significant difference. At best, it can form the basis for exploratory research, but not if you can't even form a definition of a good miss a priori. How are you supposed to know, in advance, for each number, whether a good miss means the number on each side or the two numbers below or the two numbers above?
2. I tested Kannan as experimenter in 1996, 1998, and 2001, the total number
of trials was 5880. It is unreasonable to assume that Kannan, if he cheated
by peeping into the bag, could have done this throughout without me noticing
this. Once I a while, he would have needed longer than cursory peeping
durations forgetting to keep his searching the numbers unnoticeable. I was
aware of this possibility as I said. Even Randi, I think, could not cheat
under my standard conditions for 20 hours (total testing time with Kannan).
According to what you have stated and the information provided on the video, 94% of the trials were performed before you were aware that he could peep. It was only during the last 360 trials that you were aware of the possibility and were presumably on the alert for this. The last 360 trials also show a dramatic decline in his hit rate, from a previous effect size of 0.60 to an effect size of 0.10.
3. All visitors to India are familiar with interruptions of current supply
once or twice almost every day. This occurred frequently when I tested
Kannan at night. We lit a candle when this occurred. Kannan could not see
the numbers during such periods which lasted for half an hour, an hour or
longer. He had to hold the drawn balls close to my eyes so that I could see
and put down the numbers. If Kannan had obtained high hit scores during the
day by fraudulent peeping, his scores would have dropped remarkably, I would
have noticed this drop of hits, I can be sure, because Kannan's hit rate was
remarkable (42.5% hits in 1996 and 44.3% in 1998 while 20% are expected. In
2001 I used the draw-from-two-bags condition which is dissimilar, but the
combined results in 2001 were also remarkable).
Wouldn't the low light conditions simply make it harder to notice other ways of manipulating the draw, like simply palming balls drawn on prior rounds to serve as guesses on subsequent rounds? If you can't even see the number unless it is close to your eyes, how can you be sure he put the balls back in the bag?
Linda, it is very easy for skeptics to put verbal obstacles on the road of
thoroughly working psi researchers and to attribute negligence on them by
quick conjectures. Do you think I have deserved to be suspected to have
ignored, for sure, the peeping factor? Much more do I have to endure in
discussions with other skeptics - and also with skeptical colleagues from
our research field, more about that later.
Suitbert
I don't know what to say. I sincerely think that psi conducive conditions shouldn't be the same as "conditions under which manipulation can occur". There must be a way for these two things to be different.
Linda
fls
8th June 2009, 09:14 PM
From Suitbert Ertel:
But again, this could be tested and controlled - at what point does changing the bag take away the effect of psi? Is it by using 5 bags rather than one? Is it by having someone else mix the bags? Is it by making the choice of bag random, rather than the choice of the participant?
I appreciate your idea to put hypotheses to a test. I did not claim that using 5 bags rather than one would take the effect of psi away. Among the long list of my experimental to-do's the changing of bags across trials has been included. If hit scores would decline under the change-bags condition, however, sensory leakage effects under the no-bags-change condition would not be strictly confirmed, that's what I wanted to say.
That is good to hear, because I have a solution to your lack of funds and the lack of interest from other scientists. If you have not already heard
of this, James Randi offers one million dollars to anyone who can
demonstrate psi in the way that you describe. All you have to do is have
your Ukrainian psi stars demonstrate their very significant results and you
not only have the funds you need for further research, but also the
attention of other scientists as well.
I am well aware of Randi's million dollar campaign. There are several hurdles to overcome. One is Randi's demand of so-called precautions which, if fulfilled, intimidate the participants by artificial and, partly non-sensical procedures. Since psi effects can be demonstrated only under appropriate and fair conditions the precautions demanded by skeptics tend to turn into the opposite of precaution regarding effect-producing conditions.
This I can say because I tried to run this test using a psi-gifted student (Silke) under the supervision of a German skeptic group (GWUP) in Würzburg. My test conditions were grossly altered by GWUP representatives, but I had to concur, otherwise this project would have been brought to a close. This has been documented in an article that I published in German:
Ertel, S. (2007). "Außersinnliche Wahrnehmung unter der Kontrolle organisierter Skeptiker." [Extrasensory perception under control of organized skeptics] Zeitschrift für Anomalistik 7(3): 236-269.
Despite authoritarian changes of the test conditions by GWUP (money is power!) and despite intimidations of participant Silke by additional subtle means, the observed hit score obtained in roughly two hours was very significant (p = .002). This, however, did not reach the skeptics' (Randi's and the GWUP's) effect size. Thus, Randi, a person uneducated in matters of statistics, and GWUP, who organized themselves without institutional control by science organizations, define which p-values are to be considered as highly significant irrespective of what the science community considers as highly significant.
In the year following the year of my first attempt with GWUP (pre-test for JREF’s million dollar test) I intended to make another attempt in Würzburg. But unfortunately Silke did not want to continue this project because of her bad first impressions the year before.
I hoped another very psi-gifted student would cooperate, but he refused to serve as participant in Würzburg at the last moment. He was afraid of endangering his scientific career if his extraordinary psi performance would become public.
One of the three Ukrainean participants (a boy of 10 years, Vanya is his nickname, he needed to travel with his mother) failed to get a visa in time. There were two other psi-gifted students left and I persuaded them, with much effort, to cooperate. Their mental conditions, however, were awful at that time (one was in divorce and the other was overstressed physically, overslept and almost missed the train). For this reason, most probably, and due to those unnatural suppressive GWUP conditions, they did not obtain hit surpluses above chance.
In future, I would like to let my Ukrainean boy Vanya be tested under my standard conditions because Vanya had shown extraordinary psi power under a skeptic's control (not GWUP, but also skeptical) and subsequently under my own control in Germany where I had invited him. But his increasing time problems (high school achievement goals etc.) his mother's chronical health problems together with visa problems (Germany raised obstacles because of masses of Ukraineans who were flooding Germany) have thwarted this idea. I asked GWUP whether they are aware of any skeptical organization in Ukraine so that Vanya could be tested by some skeptic in his home town, but GWUP could not help me in this regard. I myself will travel to Ukraine (Kiev) in August to test Vanya. For the first time in my psi research career, I obtained some financial support (from an open-minded non-scientific company (4,000 $ has been awarded). I will conduct various tests with Vanya, also with his sick mother and his handicapped grandmother. But I would prefer as experimenter some skeptic.
In order to avoid violations of precaution regarding fair conditions I would attempt to convince the skeptic to first agree with the ball test conditions as I define them. The skeptic should merely act as observer, a mini-video camera might also be used. My hypothesis is that the members of Vanya’s family would obtain highly significant hit scores, say p = .0001 within roughly 1-2 hours testing time and that the observer would not be able to explain the result.
Later on, the skeptical observer or some conjurer of his choice should take the ball test under the same condition. I would again act as experimenter. My hypothesis is that the conjurer’s hit score would not deviate from chance. If the score would exceed significantly the level of chance expectation, though, or, I expect to notice, within 1-2 hours testing time, a trick that the applies. If I would not discover a trick the conjurer would have to disclose, when the test is done, which trick he applied. He would also have to make plausible that the members of the Kiev family used his own or some other trick. If the conjurer confessed under oath that he did not apply any trick then I would engage him as another psi-gifted participant for further research.
These many words, Linda, that I have to use in order to give you an idea of what I did for that project and what I am willing to do in the future despite lack of funds (except 2000 US on one occasion which is 3% of my total expenditures of a decade of psi research, those 4,000 USD that I mentioned will not cover all my future expenses) are somehow apt to discourage my endeavour to find out whether psi exists and whether the positive results of 10 years of research that I obtained are genuine or not. My main obstacles are not empirical givens, but dogged presuppositions of disbelievers. In your case, however, I am beginning to feel that I am communicating with a sufficiently open-minded person who might become skeptical, by challenging empirical and trustful information, to her own beliefs. This would show that you are truly skeptical.
I am also still curious as to how you view those parapsychologists who prefer methodological rigour rather than the use of postcaution? Do you also consider their attitude detrimental to the study of psi, or a demonstration of the power of human disbelief?
A good question. Yes, I consider the behavior of those who reject discussions of my approach in public (in professional Journals) as detrimental to the study of psi. These seemingly critical members of the parapsychological community forego the chance to demonstrate or to let readers of their Journals demonstrate, by empirical evidence, that Ertel results are due to errors, sensory factors or even fraud. Ertel-contradicting results would even raise the reputation of the Parapsychological Association.
How do I view the parapsychologists who prefer methodological rigor rather than the use of post-caution. My impression, as a newcomer to the parapsychological community, is that in this community the fear of criticism by disbelievers, above all of organized skeptics, is a strong motivating force. Members of the Parapsychological Association tend to succumb to often irrational demands and contentions of their opponents and to ignore demands and insights of common sense and common logic.
An example: Psychologists are aware of the fact that some participants of their questionnaire studies do not adequately respond to their questionnaire items (response sets etc), some participants even cheat. Never has any scientist doubted that the Big 5 factorial dimensions of personality are real, that they are not due to fraudulent or other result-spoiling factors. If fraud or other result-spoiling influence occurred, with empirical evidence, in my ball test this test would have to be abolished at one. Even without such evidence which I have been searching in vain for a decade, the mere speculation that fraud and sensory leakage might occur leads “rational” scientists to keep my research at bay. Damien Broderick is an exception.
...if you had understood the relevance of Type II error to my criticisms, I could simply move on instead of bringing it up again below.
I think I understand the relevance of the distinction between Type I and Type II error in this context. I hope you agree that both errors must be considered. Excluding one of these two errors by all means increases the probability of an occurrence of the opposite error. We have to aim at a compromise between the two probabilities. Linda, don’t you disregard Type I errors?
In parapsychological research, type I errors are diminished by looking at possible artifacts so as to avoid attributing effects to psi if artifacts are effective – a dangerous tendency of incautious believers.
Type II errors, on the other hand, are diminished by looking at possible genuine psi factors that might have caused effects while incautious psi-disbelievers tend to attribute them to ordinary (sensory or motor) factors.
Both parties, a priori believers and a priori disbelievers, are committed to support their beliefs, by evidence. An important point here is that prevailing beliefs of other members of one’s community are no safeguard. Prevailing beliefs in science may be wrong. Linda, you disbeliever need not exert much effort to gain acclaim from the majority of the science community. I myself and other researchers who disregard the sociological power of majorities are in a less happy situation.
Suitbert
Wowbagger
8th June 2009, 10:20 PM
From Damien Broderick:
"according to the rules of Randi's competition, if a psychic ability
is proven, he must pay up. Randi stated to me that a preliminary test
would have to yield a probability of one in a thousand that the
results were due to chance. After passing the preliminary, the
investigator could commence with the formal test, which would have to
yield a probability against chance of one in a million."
Hey, you know what? If psi phenomena was really as real as you think it is, then this should not be a problem! Real phenomena have a way of being reliably reproduced.
One more quote from that article I want to briefly mention, because it gets a lot of attention:
Randi agreed he might have to pay up someday. But Dawkins had a trick up his sleeve. If a “psychic” phenomenon turns out to be real, then by definition it is physical and therefore not really psychic after all, and thus Randi still shouldn’t have to pay. This line, though well balley-hooed by paranormalists, is simply not relevant:
Once the testing protocol contract is signed, they will get the Million Dollars if they actually do what was required to pass the test; even if it was later discovered that their abilities could be understood by science. Randi indicated as much, during his interview with Dawkins, right after Dawkins' presentation. It's a legally binding contract, and if Randi (or his advisors) missed something, that lead to such an opening, then Randi would just have to eat it.
So far, he has not needed to.
Wowbagger
8th June 2009, 10:26 PM
From Suitbert Ertel:
I am well aware of Randi's million dollar campaign. There are several hurdles to overcome. One is Randi's demand of so-called precautions which, if fulfilled, intimidate the participants by artificial and, partly non-sensical procedures. Since psi effects can be demonstrated only under appropriate and fair conditions the precautions demanded by skeptics tend to turn into the opposite of precaution regarding effect-producing conditions. Instead of complaining that the test is "too hard", you should focus on how you can overcome its hurdles.
What can you do, to make psi phenomena more reliable, up to the point of "self-evident"?
fls
9th June 2009, 09:01 AM
From Suitbert Ertel:
I am well aware of Randi's million dollar campaign. There are several hurdles to overcome. One is Randi's demand of so-called precautions which, if fulfilled, intimidate the participants by artificial and, partly non-sensical procedures. Since psi effects can be demonstrated only under appropriate and fair conditions the precautions demanded by skeptics tend to turn into the opposite of precaution regarding effect-producing conditions.
This I can say because I tried to run this test using a psi-gifted student (Silke) under the supervision of a German skeptic group (GWUP) in Würzburg. My test conditions were grossly altered by GWUP representatives, but I had to concur, otherwise this project would have been brought to a close. This has been documented in an article that I published in German:
Ertel, S. (2007). "Außersinnliche Wahrnehmung unter der Kontrolle organisierter Skeptiker." [Extrasensory perception under control of organized skeptics] Zeitschrift für Anomalistik 7(3): 236-269.
Despite authoritarian changes of the test conditions by GWUP (money is power!) and despite intimidations of participant Silke by additional subtle means, the observed hit score obtained in roughly two hours was very significant (p = .002). This, however, did not reach the skeptics' (Randi's and the GWUP's) effect size. Thus, Randi, a person uneducated in matters of statistics, and GWUP, who organized themselves without institutional control by science organizations, define which p-values are to be considered as highly significant irrespective of what the science community considers as highly significant.
In the year following the year of my first attempt with GWUP (pre-test for JREF’s million dollar test) I intended to make another attempt in Würzburg. But unfortunately Silke did not want to continue this project because of her bad first impressions the year before.
I hoped another very psi-gifted student would cooperate, but he refused to serve as participant in Würzburg at the last moment. He was afraid of endangering his scientific career if his extraordinary psi performance would become public.
One of the three Ukrainean participants (a boy of 10 years, Vanya is his nickname, he needed to travel with his mother) failed to get a visa in time. There were two other psi-gifted students left and I persuaded them, with much effort, to cooperate. Their mental conditions, however, were awful at that time (one was in divorce and the other was overstressed physically, overslept and almost missed the train). For this reason, most probably, and due to those unnatural suppressive GWUP conditions, they did not obtain hit surpluses above chance.
In future, I would like to let my Ukrainean boy Vanya be tested under my standard conditions because Vanya had shown extraordinary psi power under a skeptic's control (not GWUP, but also skeptical) and subsequently under my own control in Germany where I had invited him. But his increasing time problems (high school achievement goals etc.) his mother's chronical health problems together with visa problems (Germany raised obstacles because of masses of Ukraineans who were flooding Germany) have thwarted this idea. I asked GWUP whether they are aware of any skeptical organization in Ukraine so that Vanya could be tested by some skeptic in his home town, but GWUP could not help me in this regard. I myself will travel to Ukraine (Kiev) in August to test Vanya. For the first time in my psi research career, I obtained some financial support (from an open-minded non-scientific company (4,000 $ has been awarded). I will conduct various tests with Vanya, also with his sick mother and his handicapped grandmother. But I would prefer as experimenter some skeptic.
I'm sorry. I did not realize that you had been through all this already or I would not have suggested it. I don't know if it would have helped, but it is always advised that if the procedure is changed (and it always seems to be to add controls) that you try it out before coming to the preliminary test. This may have told you that Silke was going to be affected and may have given you the opportunity to figure out how to overcome this. The other thing that I have noticed with applicants is that they don't given themselves enough of a chance to succeed - they choose a number of trials with a low power. It sounds like if you had chosen a larger number of trials, the requisite p-value may have been reached. But maybe you ended up with not much say in the matter.
However, I have said many times on the JREF forum that the Challenge cannot be thought of as a scientific test. The standards used are different, and it's not suitable as a form of investigation/exploration.
There may be people on the JREF forum who know of an organization or even an individual in the Ukraine that could test Vanya.
In order to avoid violations of precaution regarding fair conditions I would attempt to convince the skeptic to first agree with the ball test conditions as I define them. The skeptic should merely act as observer, a mini-video camera might also be used. My hypothesis is that the members of Vanya’s family would obtain highly significant hit scores, say p = .0001 within roughly 1-2 hours testing time and that the observer would not be able to explain the result.
Later on, the skeptical observer or some conjurer of his choice should take the ball test under the same condition. I would again act as experimenter. My hypothesis is that the conjurer’s hit score would not deviate from chance. If the score would exceed significantly the level of chance expectation, though, or, I expect to notice, within 1-2 hours testing time, a trick that the applies. If I would not discover a trick the conjurer would have to disclose, when the test is done, which trick he applied. He would also have to make plausible that the members of the Kiev family used his own or some other trick. If the conjurer confessed under oath that he did not apply any trick then I would engage him as another psi-gifted participant for further research.
I don't really understand the idea of asking someone to duplicate an effect through trickery. It shows that the effect could be done through trickery, but we already know these things. It makes more sense to me to set up conditions (and this is where a conjurer comes in handy) so that any forms of deception (deliberate or not) are difficult to impossible and simply allow an opportunity for psi to act. Also, one of the other JREF members had some suggestions about running a test with gradually stricter controls - if you did not see this, I will copy it and send it to you. If this has the effect of not allowing an opportunity for psi to act, then you will have found your control group conditions (i.e. conditions under which psi is absent) for further research. If psi is still present under strictly controlled conditions, then you have found a way to win the Challenge.
This comes back to a point I made earlier, though. Control groups should not consist of non-psi-gifted people making an attempt to accomplish what psi-gifted people do. Control groups should consist of psi-gifted people who are working under conditions where psi does not have an opportunity to act. For example, you could start with your psi-gifted students and randomly divide them into two groups. You would give both groups your sealed psi-pods and tell them to draw out one at a time, labelling them with the order in which they were drawn. You would choose an order to draw out the psi-pod numbers (preferably randomly from a few choices, like 1234512345... or 1111122222...) and tell one group the order and tell the other group nothing. Then you could see if there was any difference between the two groups.
A good question. Yes, I consider the behavior of those who reject discussions of my approach in public (in professional Journals) as detrimental to the study of psi. These seemingly critical members of the parapsychological community forego the chance to demonstrate or to let readers of their Journals demonstrate, by empirical evidence, that Ertel results are due to errors, sensory factors or even fraud. Ertel-contradicting results would even raise the reputation of the Parapsychological Association.
In what way would publishing your results be beneficial? Would the demonstration that your results are due to errors, sensory factors or even fraud teach us something new about psi research? Or is it that you expect your results to be replicated under more rigorous conditions? Would you consider performing that work yourself (replicating your results under more rigorous conditions) in order to engage the interests of others?
How do I view the parapsychologists who prefer methodological rigor rather than the use of post-caution. My impression, as a newcomer to the parapsychological community, is that in this community the fear of criticism by disbelievers, above all of organized skeptics, is a strong motivating force. Members of the Parapsychological Association tend to succumb to often irrational demands and contentions of their opponents and to ignore demands and insights of common sense and common logic.
An example: Psychologists are aware of the fact that some participants of their questionnaire studies do not adequately respond to their questionnaire items (response sets etc), some participants even cheat. Never has any scientist doubted that the Big 5 factorial dimensions of personality are real, that they are not due to fraudulent or other result-spoiling factors. If fraud or other result-spoiling influence occurred, with empirical evidence, in my ball test this test would have to be abolished at one. Even without such evidence which I have been searching in vain for a decade, the mere speculation that fraud and sensory leakage might occur leads “rational” scientists to keep my research at bay. Damien Broderick is an exception.
This is because the Big 5 has already gone through the type of testing that I earlier encouraged you to try. Tools such as the NEO PI-R have been tested to determine whether or not they are actually able to identify characteristics such as Agreeableness a priori compared to other ways of identifying these characteristics. Construct, content and criterion validity have been established. No such testing is available for psi - it can only be determined a posteriori as a deviation from chance expectation. And it lacks independent confirmation as to validity. For example, has it ever been established that telepathy would manifest itself in highly local card-guessing scenarios?
I think I understand the relevance of the distinction between Type I and Type II error in this context. I hope you agree that both errors must be considered. Excluding one of these two errors by all means increases the probability of an occurrence of the opposite error. We have to aim at a compromise between the two probabilities. Linda, don’t you disregard Type I errors?
I don't consider an alpha of 0.05 and a beta of 0.40 a "compromise". Also, if you are asking a question where the avoidance of Type II error is critical and that of Type I error is almost irrelevant, your alpha and beta really should reflect that question.
In parapsychological research, type I errors are diminished by looking at possible artifacts so as to avoid attributing effects to psi if artifacts are effective – a dangerous tendency of incautious believers.
Type II errors, on the other hand, are diminished by looking at possible genuine psi factors that might have caused effects while incautious psi-disbelievers tend to attribute them to ordinary (sensory or motor) factors.
Now you are talking about something else. You are talking about preventing the possibility of giving up a fruitful area of investigation too early. Yet, power analyses of parapsychological research show that the probability of Type II error is low when it comes to missing an obvious effect (and we are looking for something that should be obvious since our ideas about psi come from informal observation). However, that is not relevant to why I brought up Type II error. I was talking about missing the effect of bias (i.e. missing a small to very small effect, corresponding to the effect sizes you have measured), not missing the effect of psi.
Both parties, a priori believers and a priori disbelievers, are committed to support their beliefs, by evidence. An important point here is that prevailing beliefs of other members of one’s community are no safeguard. Prevailing beliefs in science may be wrong. Linda, you disbeliever need not exert much effort to gain acclaim from the majority of the science community. I myself and other researchers who disregard the sociological power of majorities are in a less happy situation.
Suitbert
Yet many other scientists have overcome this less happy situation simply by providing evidence. Instead of investing time and effort into repeating tests which non-believers do not find persuasive and then trying to find ways to manipulate the data a posteriori, why not simply invest your time and effort into developing adequate precautions which address the concerns of other scientists?
Linda
fls
11th June 2009, 09:21 AM
From Suitbert Ertel:
And realistically, this is a very important issue which deserves at least as much attention as the four pages you devote to the idea of "good misses", and which, once recognized, should have led to a change in procedure so that the bag opening was screened from the drawer. Instead you make a few guesses about how this would show up in other ways, tested those ideas (without any indication of the probability of Type II error), and concluded that it couldn't be so. Yet an equally valid conclusion from your failure to confirm any of your guesses is that your guesses were simply wrong to begin with.
I never noticed as experimenter, in thousands of trials, that participants
peeped into the bag before they picked a ball from the bag. Peeping into the
bag is cheating. I claim that I would have noticed in at least some such
cases that a participant cheated that way and I also claim that students are
not inclined to cheat in the lab and not motivated to cheat. Even if some
pathological or criminal participant existed who was clever enough to cheat
in a way that I did not notice, the bulk of hit surpluses in my data base
cannot overall be explained by cheating (peeping into the bag).
This I can say because I also tested some high scoring participants using
balls with ill-written numbers on them (very small, in disorder etc). Their
hit scores did not decline.
I tested high-scoring participants, high-scoring with Arabic numbers written
on the balls, with number words written on the balls. The increased
difficulty of reading what was written on the balls had no effects on their
hit scores.
I also asked some very successful participants to wear blinds, their hit
scores did not drop.
I asked some participants to try to cheat at home in such a way that I, as
experimenter, might gain the impression that they are psi-gifted. But they
should be careful, I told them, so that their fraud would not be noticed. In
the great majority of these cases (roughly ten cases, as far as I remember)
were the hit scores excessively high or showed typical indications of
non-randomness of their seemingly "drawn" numbers thus revealing their
manipulation.
In other words, since I did not find indications of such fraudulent
manipulation in the students' home test data I think I am entitled to
conclude that the bulk of even home test data of my participants is clean.
I mentioned some studies which you, Linda, would probably call control
group studies. I should remind you also of the recent psi pod studies that I
have started with written numbers under lids of small containers so that the
numbers cannot be seen even by attentively looking into the bag. With using
psi pods the hit scores of my Ukrainean participants did not drop.
Linda, you demand to change the procedure so that the bag opening is
screened from the drawer. In view of the just reported results which clarify
the peeping question I am not inclined to screen the bag opening from the
drawer. This would render the situation artificial, the participants would
feel that they are distrusted. We have to be careful to avoid type 2 errors.
On that occasion, Linda, I would like to ask you to be more careful with
your critical claims as long as you are unaware of the many control
experiments that I actually conducted. I am prepared to ask further
questions, but I think the time needed for refuting unnecessary and
unsubstantiated claims by true skeptical believers should be diminished.
If you redo your analysis using a consistent definition for a "good miss", such as a miss for #2 is #3 and #4, and a miss for #3 is #4 and #5, the significant differences disappear.
A good miss of one's choice is a choice which was "almost" a hit. When you
reach the desired goal a little too early or a little too late, when you
throw the ball a little too near or a little too far etc. you had "almost" a
hit. Your objection is arbitrary: "You cannot draw reliable conclusions from
data-dredging after the fact whereby you keep grouping numbers together in
various combinations until you find some groups that show a significant
difference."
At best, it can form the basis for exploratory research, but not if you can't even form a definition of a good miss a priori. How are you supposed to know, in advance, for each number, whether a good miss means the number on each side or the two numbers below or the two numbers above?
Among the possible numerical differences between hits (1,2,3,4,5) and four
misses two are smaller and two are larger. This I consider as a clear-cut
definition of "good misses". Why do you want to restrict good misses to
drawing the larger adjacent number? Even if you do that the effect does not
disappear, it is even stronger. By restricting good misses to drawing the
smaller adjacent number - why should one do that? - the effect is still
apparent, but less so. The good misses result is destroyed by your arbitrary
and untenable redefinition of a "good miss" ("almost hit").
Wouldn't the low light conditions simply make it harder to notice other ways of manipulating the draw, like simply palming balls drawn on prior rounds to serve as guesses on subsequent rounds? If you can't even see the number unless it is close to your eyes, how can you be sure he put the balls back in the bag?
You seem to deem it likely that Kannan had two tricks at his disposal. Did
you notice any such "palming balls" behavior in the filmed test session?
Certainly not. Participants are told by the instruction anyway to let the
just drawn ball bounce back into the bag.
Let me generalize: The general strategy of convinced skeptics is to explain
surprising observations by unsurprising processes or mechanisms of ordinary
science (psychology, physics) even if the processes or mechanisms claimed in
their objections are extremely improbable and sometimes even inconceivable.
Rarely feel skeptics challenged to put their improbable ordinary
explanations to empirical tests. In this regard, Linda, I occasionally
noticed in your comments hopeful deviations, at least insofar as you
suggested to me or to the parapsychological community to test your
conjectures. Why don't you do parapsychological experiments? Or are you
conducting such project?
Suitbert
fls
11th June 2009, 09:27 AM
From Suitbert Ertel:
I never noticed as experimenter, in thousands of trials, that participants
peeped into the bag before they picked a ball from the bag. Peeping into the
bag is cheating. I claim that I would have noticed in at least some such
cases that a participant cheated that way and I also claim that students are
not inclined to cheat in the lab and not motivated to cheat. Even if some
pathological or criminal participant existed who was clever enough to cheat
in a way that I did not notice, the bulk of hit surpluses in my data base
cannot overall be explained by cheating (peeping into the bag).
This I can say because I also tested some high scoring participants using
balls with ill-written numbers on them (very small, in disorder etc). Their
hit scores did not decline.
I tested high-scoring participants, high-scoring with Arabic numbers written
on the balls, with number words written on the balls. The increased
difficulty of reading what was written on the balls had no effects on their
hit scores.
I also asked some very successful participants to wear blinds, their hit
scores did not drop.
I asked some participants to try to cheat at home in such a way that I, as
experimenter, might gain the impression that they are psi-gifted. But they
should be careful, I told them, so that their fraud would not be noticed. In
the great majority of these cases (roughly ten cases, as far as I remember)
were the hit scores excessively high or showed typical indications of
non-randomness of their seemingly "drawn" numbers thus revealing their
manipulation.
In other words, since I did not find indications of such fraudulent
manipulation in the students' home test data I think I am entitled to
conclude that the bulk of even home test data of my participants is clean.
Thank you for recognizing that this is an important issue. It may be useful to include details of this information in any subsequent articles you write, rather than providing this information in bits and pieces after much discussion.
I think that you can conclude that when asked to deliberately cheat, and when told in advance that they will not suffer consequences from their cheating, that they will cheat in ways that may or may not look different from less deliberate cheating. Your description does not sound like you were blind as to where to look for fraud (something which would be a necessary pre-condition in order to drawn valid conclusions).
Can you provide references for the research, or show me the results of your own research which shows the extent to which words or small numbers cannot be read by the average participant (necessary information before valid conclusions can be drawn)? I hope that you are aware that blind-folds, unless applied in a specific manner, do not prevent people from seeing (especially at the angle which would be used for the ball-drawing). Did you tightly tape the bottom of the blindfolds around the nose and cheeks?
I attempted the following analysis from the results you have provided. It is possible that I misunderstood the information presented in the tables, since it was not designed for this type of analysis, so I apologize in advance if I am missing some key information.
It seems that the hit surplus (in the uncontrolled setting) is mostly accounted for by a small number of people. Using your data from table 1 (http://www.psych.uni-goettingen.de/home/ertel/ertel-dir/downloads/ertelchapterwithfigurespdf.pdf) only 3 people are responsible for almost all of the hit surplus (79% of the hit surplus). Of those 3, 2 were retested and one was not. Retesting of one of them (Katarina H.) led to chance results. Only after testing her over and over again, under conditions where some of the controls were dropped (e.g. video-taping), did her hits begin to climb past chance level. This may be consistent with her becoming comfortable with the test conditions, but it is also consistent with learning deception (not necessarily deliberate). We don't know how the one person responsible for more of the surplus hits than anyone else (Barbara F. was responsible for 33% of the hit surplus) would have done under controlled conditions. And you don't give the details of the retesting on the third person (Gabriela G.). So while you reference extensive testing for cheating, it is only relevant if that testing was performed on those three people. It didn't matter if you couldn't detect cheating in anyone else, because they didn't really contribute to the significance of your results.
You don't provide the same sort of details for the results under controlled conditions, however you do give the details of three people (http://www.parapsych.org/papers/38.pdf). And we can see that 2 of the people (Amelie J. and Silk T.) account for almost half (47%) of the hit surplus under controlled conditions for one set of experiments. And one person (Ahmed K.) accounts for 38% of the hit surplus under controlled conditions for the other set of experiments. Again, it looks like the bulk of your hit surplus is accounted for by a few people, so tests designed to look for cheating cannot show a difference if applied to anyone other than those few people. Can you provide those details for us? Is that the same Silke that was unable to perform above chance under the strictly controlled conditions for the GWUP prize?
I mentioned some studies which you, Linda, would probably call controlgroup studies. I should remind you also of the recent psi pod studies that I have started with written numbers under lids of small containers so that the numbers cannot be seen even by attentively looking into the bag. With using psi pods the hit scores of my Ukrainean participants did not drop.
Linda, you demand to change the procedure so that the bag opening is
screened from the drawer. In view of the just reported results which clarify
the peeping question I am not inclined to screen the bag opening from the
drawer. This would render the situation artificial, the participants would
feel that they are distrusted. We have to be careful to avoid type 2 errors.
On that occasion, Linda, I would like to ask you to be more careful with
your critical claims as long as you are unaware of the many control
experiments that I actually conducted. I am prepared to ask further
questions, but I think the time needed for refuting unnecessary and
unsubstantiated claims by true skeptical believers should be diminished.
You agree that this is important information if you went to the trouble to look for these sorts of things. Therefore, I don't understand why you are chastizing me for wasting your time by asking for this same information. And I am sorry that I remain unconvinced until I have been provided with this information, but would you really be any different? If someone made a claim that had not been validated independently, would you not also ask for details and for the answers to critical questions before allowing yourself to be persuaded?
A good miss of one's choice is a choice which was "almost" a hit. When you reach the desired goal a little too early or a little too late, when you throw the ball a little too near or a little too far etc. you had "almost" a
hit. Your objection is arbitrary: "You cannot draw reliable conclusions from
data-dredging after the fact whereby you keep grouping numbers together in
various combinations until you find some groups that show a significant
difference."
I understand what you mean by a "good miss", but what you have done is sometimes counted when the ball was a little too near as a "good miss" and the ball was a little too far as an "other miss" and sometimes when the ball was a little too far as a "good miss" and the ball was little too near as an "other miss", and it is this uneven counting that accounts for the "significant difference". If you always count the ball that was a little too far as a "good miss", that significant difference disappears. And you can't tell beforehand (i.e. before you start throwing the ball), whether you will be better off counting near or far balls as a "good miss". It is only by looking at the pattern after you have thrown the balls that you can pick out which particular combination of misses (near on trial 1,2,5 and far on trial 3, 4, for example) will give you a "statistically significant difference".
Among the possible numerical differences between hits (1,2,3,4,5) and four misses two are smaller and two are larger. This I consider as a clear-cut definition of "good misses". Why do you want to restrict good misses to
drawing the larger adjacent number? Even if you do that the effect does not
disappear, it is even stronger. By restricting good misses to drawing the
smaller adjacent number - why should one do that? - the effect is still
apparent, but less so. The good misses result is destroyed by your arbitrary
and untenable redefinition of a "good miss" ("almost hit").
Why is a good miss the larger adjacent number for one number, but not a good miss for another number? How can a good miss be the number on either side for one number, but it isn't a good miss for another number? My complaint isn't that I think there is only one possible definition for a good miss. My complaint is that there are many possible definitions for a good miss. And that not only do you not tell us how to choose between these various possibilities *a priori*, your choice isn't even consistent from one number to the next. How can I tell what is a "good miss" if it can be changed at will? Because if I change it, then your idea of "good misses" as a secondary effect disappears when it is discovered that a different combination of "good misses" does not lead to a statistically significant difference (and subsequently, one of your arguments against other explanations of the effects - forged data - disappears).
You seem to deem it likely that Kannan had two tricks at his disposal. Did you notice any such "palming balls" behavior in the filmed test session? Certainly not. Participants are told by the instruction anyway to let the just drawn ball bounce back into the bag.
Are you serious? Why couldn't Kannan have a dozen tricks at his disposal? Why would he palm balls when peeking would do the trick? I've watched magicians very carefully up close and not detected palming when it clearly had to have happened. How could I detect palming from that video even if it were present?
Let me generalize: The general strategy of convinced skeptics is to explain surprising observations by unsurprising processes or mechanisms of ordinary science (psychology, physics) even if the processes or mechanisms claimed in their objections are extremely improbable and sometimes even inconceivable. Rarely feel skeptics challenged to put their improbable ordinary
explanations to empirical tests. In this regard, Linda, I occasionally
noticed in your comments hopeful deviations, at least insofar as you
suggested to me or to the parapsychological community to test your
conjectures. Why don't you do parapsychological experiments? Or are you
conducting such project?
Suitbert
Why do you characterize my objections as extremely improbable and sometimes even inconceivable *when you also admit to the possibility of peeking or other methods of cheating*?
When I am used to working with people who take care to address issues of validity or reliability in their own research, it irks me to see people asking others to do it for them.
Linda
fls
11th June 2009, 09:34 AM
From Damien Broderick:
Instead of investing time and effort into repeating tests which non-believers do not find persuasive and then trying to find ways to manipulate the data a posteriori, why not simply invest your time and effort into developing adequate precautions which address the concerns of other scientists?
This is my sentiment also, as an onlooker. But I can see the value,
for those who already feel convinced that psi is real, of
process-oriented research that seeks ways to enhance psychic
effectiveness. Obviously if that risks compromising the results with
unreliable or fraudulent data points, it might prove misleading or
time-wasting. But I can understand why using methods that don't
alienate or bore the purportedly "psi-gifted" would be worth trying,
then gradually tightening the conditions as the parameters come into
focus. Granted, the more slack there is in such methodologies, the
less an experimenter can complain if skeptics refuse to accept the
results as evidence of psi--but then that's the trade-off. I've never
met Suitbert nor seen his ball-drawing method applied (except on a
video) but I've always found him open to serious non-captious
critique, ready to close loopholes if possible or try variants that
seem more secure or more interesting. And it's obvious to me that his
test is more likely to elicit whatever evolved competences we have
for psi--the haptic character of feeling, touching, seizing, in
situations that are at least somewhat more engaging than the dreary
task of hitting keys or guessing cards, or even (for some) lying in
the half-dark trying to visualize a distant scene.
Damien
Ivor the Engineer
11th June 2009, 10:40 AM
And it's obvious to me that his test is more likely to elicit whatever evolved competences we have for psi--the haptic character of feeling, touching, seizing, in situations that are at least somewhat more engaging than the dreary task of hitting keys or guessing cards, or even (for some) lying in the half-dark trying to visualize a distant scene.
Yet casinos have not had to worry about vast swathes of gamblers having their psi-abilities fine-tuned and amplified because of the interest they have in the game they are playing.
Uncayimmy
11th June 2009, 01:40 PM
Thanks for the effort taken in this thread.
I finally got around to watching the video. The first thing that popped into my mind was that the balls were not shuffled adequately. He just kind of bounces the bag up and down. It doesn't appear that it was sufficient to cause the balls to switch positions very much. It's hard to imagine that a ball all the way on one side would make it all the way to the other.
If I wanted to cheat on that test, I would attempt to get the balls loaded into the bag so that the numbers were grouped. By that I mean the 1-balls would be in the left corner, the 2-balls next to them and so forth with the 5-balls being on the right hand side. When preparing the experiment, it would only be natural to group the balls by number first to make sure you have the right counts. Putting the balls in the bag as I describe would not be difficult at all if that happened. We didn't see the balls being loaded. And as we have seen, the "shuffling" was trivial at best.
Under this scenario, I would expect that when I call a number, I would draw that number or an adjacent number more frequently than numbers farther away because I would be reaching into the right general area. More importantly, I would also expect that 1-balls and 5-balls would be more likely to stay in place due to friction against the bag. The three remaining balls would probably shuffle around more. I haven't verified this through experiment, but it seems reasonable.
Thus, my prediction is that highest accuracy will be 1-balls and 5-balls. I also predict that when a 1-ball or 5-ball is called, I will be least likely to draw a ball from the opposite end. In other words I call a 1-ball and reach into the left side of the bag. Very few of the 5-balls on the right side of the bag will have migrated to the left due to poor shuffling. This means that when calling a 1-ball and reaching to the left, I would expect more 1-balls. I also expect the distribution of 2, 3, and 4-balls to show a slight bias in that order. And most importantly, I expect to get relatively few 5-balls. Of course, when calling 5-balls, I expect the same distribution in reverse. When calling the three balls in the middle, I would expect a slight bias for the adjacent balls.
I made this predictions without looking at the data. Let's see if I am right. Looking at the chart on page 17 of http://www.psych.uni-goettingen.de/home/ertel/ertel-dir/downloads/ertelchapterwithfigurespdf.pdf it looks like the results match my ad hoc prediction.
http://forums.randi.org/imagehosting/thum_281604a315cdf48ad2.jpg (http://forums.randi.org/vbimghost.php?do=displayimg&imgid=16650)
http://forums.randi.org/imagehosting/thum_281604a315ceda6bca.jpg (http://forums.randi.org/vbimghost.php?do=displayimg&imgid=16651)
http://forums.randi.org/imagehosting/thum_281604a315cf6e3ae7.jpg (http://forums.randi.org/vbimghost.php?do=displayimg&imgid=16652)
http://forums.randi.org/imagehosting/thum_281604a315d0807404.jpg (http://forums.randi.org/vbimghost.php?do=displayimg&imgid=16653)
http://forums.randi.org/imagehosting/thum_281604a315d1929a62.jpg (http://forums.randi.org/vbimghost.php?do=displayimg&imgid=16654)
It looks like my predictions for the 1-ball and the 5-ball were spot on. The middle bias show a slight bias, but I don't think there's enough data to conclude either way.
jasonpatterson
11th June 2009, 04:15 PM
It looks like my predictions for the 1-ball and the 5-ball were spot on. The middle bias show a slight bias, but I don't think there's enough data to conclude either way.
I'm not sure that that is what the data show at all. The scales on which you have graphed the data are deceptive to say the least. If you shrink any of the scales to the level that you have shrunk the 1 and 5 balls, they would look much more impressive, and the 'correct' choice would appear to have been chosen much more strongly. Basically, in each case the called ball was chosen about 400 times more often than the next higher ball, and all of the remaining 4 balls were chosen about equally often. You've graphed 2,3, and 4 on a scale that is 4000 high and 1 and 5 on a scale that is only 1000 high. Any differences would be visually magnified by 4 times. Sorry, this has always been a pet peeve of mine... Just looking at the data it would appear that people were most successful at calling the 2.
Uncayimmy
11th June 2009, 04:59 PM
I'm not sure that that is what the data show at all. The scales on which you have graphed the data are deceptive to say the least. If you shrink any of the scales to the level that you have shrunk the 1 and 5 balls, they would look much more impressive, and the 'correct' choice would appear to have been chosen much more strongly. Basically, in each case the called ball was chosen about 400 times more often than the next higher ball, and all of the remaining 4 balls were chosen about equally often. You've graphed 2,3, and 4 on a scale that is 4000 high and 1 and 5 on a scale that is only 1000 high. Any differences would be visually magnified by 4 times. Sorry, this has always been a pet peeve of mine... Just looking at the data it would appear that people were most successful at calling the 2.
Mea culpa. It's a peeve of mine too, yet when I use Excel to make a chart, I always forget to check their automatic scaling. I will post revised charts in a little bit.
Uncayimmy
11th June 2009, 05:55 PM
These are the revised charts with the scale corrrected along with a chart showing the percentage each ball was judged correct versus the percentage each ball was called. Also, I didn't make it clear that these numbers were for all the tests, not just the one in question. We have no idea what pattern, if any, that the kid in the video used.
http://forums.randi.org/imagehosting/thum_281604a3198d7c9694.jpg (http://forums.randi.org/vbimghost.php?do=displayimg&imgid=16661)
http://forums.randi.org/imagehosting/thum_281604a3198e130955.jpg (http://forums.randi.org/vbimghost.php?do=displayimg&imgid=16662)
http://forums.randi.org/imagehosting/thum_281604a3198e916aa3.jpg (http://forums.randi.org/vbimghost.php?do=displayimg&imgid=16663)
http://forums.randi.org/imagehosting/thum_281604a3198ef93650.jpg (http://forums.randi.org/vbimghost.php?do=displayimg&imgid=16664)
http://forums.randi.org/imagehosting/thum_281604a3198f6a1af6.jpg (http://forums.randi.org/vbimghost.php?do=displayimg&imgid=16665)
http://forums.randi.org/imagehosting/thum_281604a319900183dc.jpg (http://forums.randi.org/vbimghost.php?do=displayimg&imgid=16666)
Tricky
11th June 2009, 08:28 PM
I've put a number of posts into "unapproved" status because there may be some issues here. I hope to have this resolved quickly with input from more knowledgeable Mods/Admins. Sorry about the inconvenience.
fls
11th June 2009, 09:43 PM
I did ask Dr. Ertel for his okay to post his e-mails here, and he said yes. But he made an offhand remark just now that looks like he is under the impression that his subsequent e-mails haven't been posted. I asked Tricky to remove most of my posts until I have clarified this with Dr. Ertel, as I don't want to make public anything that he intended to be private. I apologize to Dr. Ertel, and to Tricky for making him go through the trouble of fixing this (although, that is why he is paid in big bucks).
Linda
Pixel42
12th June 2009, 02:58 AM
Is there any particular reason why Dr Ertel doesn't just post here himself? :confused:
Mashuna
12th June 2009, 03:12 AM
Yet casinos have not had to worry about vast swathes of gamblers having their psi-abilities fine-tuned and amplified because of the interest they have in the game they are playing.
The casinos know all about the psi- effect, and deliberately use psi-dampening lights, or feng shui, or noise, or something else so that it doesn't work.
Probably.
steenkh
12th June 2009, 03:43 AM
Is there any particular reason why Dr Ertel doesn't just post here himself? :confused:
In that case he would have to deal with many opponents, few of which as worthy, or polite as Linda.
fls
12th June 2009, 04:51 AM
I have heard from Dr. Ertel and he wishes for his emails to remain here, publicly accessible.
Phew. :)
I have linked to some of our posts here thinking it might get him interested in the rest of the forum. It just may not be his cup of tea. I think the conversation would be quite different if it was conducted here instead of through e-mail. It's an interesting experience, so far, so you'll have to forgive me if I selfishly don't encourage him to join up and post himself. :)
Linda
Tapio
12th June 2009, 05:21 AM
I think the conversation would be quite different if it was conducted here instead of through e-mail. It's an interesting experience, so far, so you'll have to forgive me if I selfishly don't encourage him to join up and post himself. :)
I would totally encourage that "selfish non-encouraging". Although I'm just a puny newbie here, I've seen enough brilliant sprouts of discussion be trampled, roasted, and made sure to be never brought up again (at least by those originally "planting the seeds") because of some members not capable of keeping it civil*. Your ability to hold this conversation in a friendly, yet challenging and educating context is truly something we can all learn from. Respect!
*Or maybe it's just the "shyness effect" (http://www.skepdic.com/shynesseffect.html) ;)
Ivor the Engineer
12th June 2009, 09:16 AM
<snip>
Your ability to hold this conversation in a friendly, yet challenging and educating context is truly something we can all learn from. Respect!
<snip>
It is interesting to read Linda's analysis of the experiments and the problems she has found with them. However...
Unfortunately I think what we will learn is it is no more effective than being polite for a while, then being blunt.
fls
13th June 2009, 10:30 AM
> Linda:
> It may be useful to include details of this information in any subsequent
> articles you write,
> rather than providing this information in bits and pieces after much
> discussion.
>
> Suitbert:
> Sure
>
> Linda:
> I think that you can conclude that when asked to deliberately cheat, and
> when told in advance that they will not suffer consequences from their
> cheating, that they will cheat in ways that may or may not look different
> from less deliberate cheating. Your description does not sound like you
> were blind as to where to look for fraud (something which would be a
> necessary pre-condition in order to draw valid conclusions).
>
> Suitbert:
> I asked these students later to describe in detail how they tried to
> deceive
> me.
How is that relevant to what I said? The issues I raised were, how would
you know that less deliberate deception would look the same as deliberate
cheating, and how would you know that you would discover cheating if you
didn't know in advance which students' data might show signs of cheating?
> Linda:
> Can you provide references for the research, or show me the results of
> your
> own research which shows the extent to which words or small numbers cannot
> be read by the average participant (necessary information before valid
> conclusions can be drawn)?
>
> Suitbert:
> Of all this I will give an account in a future monograph
>
> Linda:
> I hope that you are aware that blind-folds,
> unless applied in a specific manner, do not prevent people from seeing
> (especially at the angle which would be used for the ball-drawing). Did
> you
>
> tightly tape the bottom of the blindfolds around the nose and cheeks?
>
> Suitbert:
> No I did not. So individuals with cheating intentions and pertinent
> training
> might have deceived me. But the question is not whether some individual
> might have cheated me successfully without being detected. The question is
> whether the majority or a considerable minority of participants were
> skilled
> deceivers so that they could raise hit scores of the total sample above
> chance, even despite wearing blindfolds. Don't you think this question
> would
> be answered by the great majority of rational people: Certainly not,
> skilled
> deceivers in such test situations, if existent at all, would be rare
> exceptions.
As I pointed out, you only needed 3 people to be deceivers under
uncontrolled conditions, and none of them needed to be skilled deceivers
since you didn't test any of them (if I interpret your statements correctly)
for deception under controlled conditions. Your data under controlled
conditions showed far fewer hit surpluses and again seemed to come mainly
from a few people who logged the bulk of those hit surpluses. The question
*is* whether or not a few individuals might have cheated you successfully,
because that's all it would take to account for your hit surpluses.
> Linda:
> I attempted the following analysis from the results you have provided. It
> is possible that I misunderstood the information presented in the tables,
> since it was not designed for this type of analysis, so I apologize in
> advance if I am missing some key information.
>
> It seems that the hit surplus (in the uncontrolled setting) is mostly
> accounted for by a small number of people. Using your data from table 1
> (http://www.psych.uni-goettingen.de/home/ertel/ertel-dir/downloads/ertelchap
> terwithfigurespdf.pdf)
> only 3 people are responsible for almost all of the hit surplus (79% of
> the
> hit surplus). Of those 3, 2 were retested and one was not. .....
>
> Suitbert:
> All 25 participants listed in Table 1 had significant hit surpluses
> individually, 12 hits per run was expected by chance, number 25 of this
> rank
> order of participants (Michael L.) obtained 16 hits on average across four
> runs which is still very significant. So I think your question is no
> longer
> relevant.
But you had 1294 hit surpluses and Michael L. only contributed 16 hits to
that surplus, whereas Barbara F. contributed 423 hits to that surplus. It
doesn't matter whether or not Michael L., or the other 21 people on that
list who contributed only a handful of hits to that surplus, cheated,
because it wouldn't have led to a significant amount of excess hits. It
only really matters if the 3 top contributors cheated, because if you take
away their hit surpluses, you are left with results that aren't particularly
remarkable.
> Linda:
> You don't provide the same sort of details for the results under
> controlled
> conditions, however you do give the details of three people
> (http://www.parapsych.org/papers/38.pdf). And we can see that 2 of the
> people (Amelie J. and Silk T.) account for almost half (47%) of the hit
> surplus under controlled conditions for one set of experiments. And one
> person (Ahmed K.) accounts for 38% of the hit surplus under controlled
> conditions for the other set of experiments. Again, it looks like the
> bulk
> of your hit surplus is accounted for by a few people, so tests designed to
> look for cheating cannot show a difference if applied to anyone other than
> those few people.
>
> Suitbert:
> The tests designed to look for deliberate secret cheating were conducted
> with participants who obtained hit scores on chance levels under ordinary
> home test conditions.
That statement doesn't make any sense. Is there a typo in there? You
aren't worried about cheating in those people who are performing at chance
levels, so of course tests looking for cheating will be negative. Also you
stated earlier that it was high-scorers that you tested with some measures
you guessed would reduce the number of hits if they were cheating.
> Linda:
> Is that the same
> Silke that was unable to perform above chance under the strictly
> controlled
> conditions for the GWUP prize?
>
> Suitbert:
> No, this is a different Silke.
>
> Linda:
> You agree that this is important information if you went to the trouble to
> look for these sorts of things. Therefore, I don't understand why you are
> chastizing me for wasting your time by asking for this same information.
> And I am sorry that I remain unconvinced until I have been provided with
> this information, but would you really be any different? If someone made
> a
> claim that had not been validated independently, would you not also ask
> for
> details and for the answers to critical questions before allowing yourself
> to be persuaded?
>
> Suitbert:
> Please read carefully what I wrote:
> "I am prepared to ask further questions, but I think the time needed for
> refuting unnecessary and
> unsubstantiated claims by true skeptical believers should be diminished."
> What I disapprove of is making unsubstantiated claims without asking
> questions. Unsubstantiated claims are claims like
> " you can't even form a definition of a good miss a priori"
What I said was, "At best, it can form the basis for exploratory research,
but not if you can't even form a definition of a good miss a priori. How are
you supposed to know, in advance, for each number, whether a good miss means the number on each side or the two numbers below or the two numbers above?"
How is that an unsubstantiated claim? Do you not agree that testing
hypotheses depends upon being able to form the hypothesis a priori?
> " The practical problem with your idea is that it has been demonstrated to
> be wrong."
And then I proceeded to give you examples where the idea had been
demonstrated to be wrong. You didn't address my claim. Specifically, you
did not discuss my claim, nor did you discuss whether or not my explanations
and examples substantiated my claim.
> " .. the reason for the increased hit rate is chance and mundane sensory
> input, such as was demonstrated by the Indian boy ..."
What I said was, "If the reason for the increased hit rate is chance and
mundane sensory input, such as was demonstrated by the Indian boy..."
How can you say this is unsubstantiated when you also agreed that it looked
like Kannan could peek into the bag?
> Linda:
> I understand what you mean by a "good miss", but what you have done is
> sometimes counted when the ball was a little too near as a "good miss" and
> the ball was a little too far as an "other miss" and sometimes when the
> ball
>
> was a little too far as a "good miss" and the ball was little too near as
> an
>
> "other miss", and it is this uneven counting that accounts for the
> "significant difference".
>
> Suitbert:
> You apparently misunderstood what I wrote, admittedly I wrote it in a way
> that may be misunderstood without further thought. I wanted to give
> examples
> for "almost hits" in everyday life: one might miss a goal by throwing,
> say,
> a dart, a little too far, a little too near etc. I could have added "a
> little too left, a little too right" etc.
>
> In the ball test situation, " if drawn numbers are "a little too large or
> a
> little too small" they are "almost hits" or "good misses". So for numbers
> 2,
> 3 and 4 this stricter definition applies straightforwardly.
>
> In order to include misses for numbers 1 and 5 in a similar way, I made
> the
> distinction between two "hit-near" misses and two "hit-distant" misses.
> If
> 1 is called then drawn numbers 2 and 3 are nearer to 1 than drawn numbers
> 4
> and 5. If 5 is called, then drawn numbers 4 and 3 are nearer to 5 than
> drawn
> numbers 2 and 1. If a good miss is defined by "the two hit-nearer numbers"
> then neighboring numbers for called 2, 3, and 4 are classified as "good
> misses" as before and "the two "nearer misses" for called 1 and 5 are
> included.
>
> I decided to extend the definition because I did not want to exclude 1 and
> 5
> as calls being afraid that skeptics would take issue with excluding 1 and
> 5,
> and I felt justified to do that because including 1 and 5 did not increase
> the "good misses" effect. So critics could not suspect that I made this
> decision because I wanted to take advantage of a result that I discovered
> post hoc.
I understand why you picked that particular pattern, but what you have not
discussed is whether or not that particular pattern adequately represents
the underlying idea. What you are saying is that #3 is a "good miss" for #1
and for #5, but #1 and #5 are not "good misses" for #3. Regardless of
whether or not your idea about misses is correct, it doesn't look like you
have accurately represented your idea by your choices.
Also, what happens if you apply your idea about secondary patterns to other
data? If near misses are more likely, and drawing #3 counts as a near miss
on every draw, shouldn't #3 be drawn the most often on the data you provided
earlier where you asked the participants to draw the balls out in the
following order - 1234512345...? What about the "preference paradox" you
mention whereby you have higher hit proportions with less preferred calls?
The "good miss" data contradicts this secondary pattern. How do you account
for that?
I can find secondary patterns in medical studies as well. One of the more
famous examples is the pattern found in the ISIS-2 trial that while aspirin
improved outcomes in acute heart attacks for most, Geminis and Libras did
worse. The point is not whether you *can* find secondary patterns, the
point is whether or not one can draw valid and reliable conclusions when one
is found.
> Since the "good misses" effect has been replicated by two undergraduate
> students of Professor Chris French (see my earlier message of today), I
> regard this phenomenon as one of the most compelling indications that psi
> processes exist.
I wasn't able to open the file that you sent (my computer is limited in its
capabilities). Can you describe the experiment?
> Objections to my ball test procedure referring to lack of
> precautions, thus to not excluding increased hit probabilities by sensory
> and fraudulent support do not apply at all in this case.
>
>
> Linda:
> My complaint is that there are many possible definitions for a good miss.
> And that not only do you not tell us how to choose between these various
> possibilities *a priori*, your choice isn't even consistent from one
> number
> to the next. How can I tell what is a "good miss" if it can be changed at
> will? Because if I change it, then your idea of "good misses" as a
> secondary effect disappears when it is discovered that a different
> combination of "good misses" does not lead to a statistically significant
> difference (and subsequently, one of your arguments against other
> explanations of the effects - forged data - disappears).
>
> Suitbert:
> How could you believe that I was so inconsistent and stupid. Couldn't you
> rather think that you needed some more information, or reread what I wrote
> on Monday?
My statements are based on what you wrote. As I mentioned above, how is it
consistent to propose that #3 is a "good miss" for #1, but #1 is an "other
miss" for #3? I don't think you did this because you were stupid, but
because it was convenient when your statistical analysis demands the equal
formation of groups. You even admit to me above that your choices were at
least partly based on convenience, rather than adherence to theory.
> Linda:
> Are you serious? Why couldn't Kannan have a dozen tricks at his disposal?
> Why would he palm balls when peeking would do the trick? I've watched
> magicians very carefully up close and not detected palming when it clearly
> had to have happened. How could I detect palming from that video even if
> it
> were present?
>
> Suitbert:
> This reminds me of Lenin's and other authoritarian leaders principle
> "Trust
> is good, control is better" and of their rigorous means of enforcing the
> deemed truth of their ideology. Apparently, you do not only distrust
> Kannan,
> you even do not trust me and my ability and inclination to gather all
> relevant information. Should I give you now a detailed account of my
> experience with Kannan, his enthusiastic wish to have my company, of wish
> to
> show me the temples, of his pious Brahmin family, of the sincerity of his
> father and grandfather who prayed PUJA every day in their home, and much
> more. All such background information about people plays a role for
> developing trust in them - or distrust, it depends, but in this case there
> was no experience leading to distrust. But you, Linda, ignorant of all
> this,
> you apparently blame me for not considering that Kannan might have known a
> number of tricks as practiced by magicians. Are YOU serious, I might ask
> you. Will you not rather try to maintain a little more trust in me as
> scientist? Haven't I provided sufficient background information relevant
> for assessing trustworthiness.
I'm sorry, but your description makes your evaluation even less reliable.
Personal involvement with a participant will influence your judgement. As a
physician, I have to recognize that my personal involvement with a patient
is grounds for severing the patient-physician relationship, because I can no
longer be trusted to provide them with impartial and adequate care.
Physicians have been reprimanded and have lost their license (depending upon
the actual circumstances) for attempting to maintain a professional
relationship in addition to the type of personal relationship you describe.
In particular, you have provided Kannan with what may be an overwhelming
incentive to fudge. Even if it isn't true (and can you guarantee that it
isn't?), the fulfillment of Kannan's enthusiastic wish to have your company,
to show you the temples and his family, depends upon his ability to hold
your attention with his performance. How can that not influence the extent
to which he ensures that his performance is remarkable?
I'm sorry that these things have to be taken into consideration. It would
be nice to think that we can be trusted to act as impartial judges and not
to be fooled by our desires and beliefs, but we already know that it is
foolish to make this assumption.
Linda
fls
14th June 2009, 06:50 AM
Suitbert:
In a crucial experiment, the hypothetical presence of leaking tactile information was examined by letting 20 participants touch and probe, using one hand only, one ping pong ball with numbers written on its surface and, using the other hand, a blank ball, both balls being presented in opaque bags. Despite such favorable conditions for tactile perceptions, the participants were [unable to tell, above chance, which bag, left or right, contained the numbered ball].
Damien:
This is rather baffling. If they can usually tell "above chance" which number is on a numbered ball, why can't they distinguish numbered from unnumbered balls?
Suitbert:
Yes, I was surprised because my hypothesis was only that the participants could not draw any advantage from increased sensory input. The fact that their result was poorer, i.e. random, and that the result under standard ball test conditions was significantly larger, requires an explanation.
My tentative explanation is that psi effects are diminished by mental sets favoring an application of the human senses and rational and analytical tools for making distinctions in one's material environment while psi effects are more likely to occur with mental sets favoring an application of human intuitions and less rational, more holistic and more dreamlike attitudes towards the world and its experience. One is reminded here of the series of extremely strong psi effects under the Maimonides dream conditions and of the apparently psi-favorable mental conditions of participants in the ganzfeld which tends to keep them back from applying their senses and from rational, analytical tools for making distinctions in their environments.
I am considering to conduct more experiments under these two apparently bipolar or opposed mental conditions using psi-gifted participants. I would hope to replicate hit-diminishing effects by increased analytical, rational, and sensory approaches of participants to their test material and their environments. Until now, parapsychologists used to look for increased psi effects only by cultivating one side of the mental "coin", including meditation, yoga-like states of the human mind etc.. The other side with possibly reverse effects has been neglected.
This idea makes sense. If one looks at research, like this:
http://www.psych.ubc.ca/~tchandy/Source%20materials%20for%20309a/Schooler%20(1993).pdf
where interrupting problem-solving with verbalization impairs insight problem-solving, but not non-insight problems. And we know from fMRI studies that different areas of the brain are used for reasoned vs. intuitive decision-making.
I've wondered whether the verbalization and subsequent review in the ganzfeld studies actually impairs the judging process. Even if one tries to agree that the tiny effect found on the ganzfeld studies is psi, the ganzfeld obviously does a poor job of detecting it.
Linda
EGarrett
14th June 2009, 07:11 AM
I watched the google video.
If you let me call the numbers AND "mix" the bag myself, I'll bet I can do even better than that kid.
Wowbagger
14th June 2009, 09:14 AM
These are the revised charts with the scale corrrected along with a chart showing the percentage each ball was judged correct versus the percentage each ball was called. Not that it matters much, but I tend to prefer 2D charts for science work. It is much less "deceptive looking" than 3D charts.
But, you don't have to redo them, again. We get the picture. That was just some advice for the future.
Wowbagger
14th June 2009, 09:18 AM
Quoted material is from Suitbert. Unquoted is my response.
Suitbert's reply to Wowbagger:
Your idea of increasing control is'nt bad, but not yet good enough because
in your protocol increase of control is correlated with increase of
psi-inhibiting factors. You skeptics are generally unaware of a psi
researcher's obligation to optimize, for any psi test, the psi-conduciveness
of the experimental conditions. Psi inhibiting is a participant's
experiencing physical or mental strain at carrying out the required task. In
addition, a tense, suspicious, distrusting social environment, screens and
other vision-protecting devices including video cameras have psi-inhibiting
effects. Admittedly, my protocols were meant to test my own hypothesis more-so than yours: That the results of psi research will only be viewed as significant by those who choose to believe in it.
How do you know what conditions are more conductive to psi, if you do not have any idea what the mechanism or nature of psi really is, yet?
In the eyes of mainstream science, they will see your response as just making excuses for holes in the controls, whether you think that is fair or not.
If you want to impress them, you have to show them something new and plausibly reliable. That means building a hypothesis that will strengthen the ability to demonstrate the powers. (And, by extension, will also empirically explain why certain conditions are conductive to it, or not.)
However, in the eyes of other parapsychologist researchers, you are probably doing a fine job. So, keep up the good work!
So I would rather suggest a protocol with two independent series of
experiments, one series with decreasing psi-conduciveness across test
repetitions - but with constant and sufficient control, and another series
with increasing control but with constant and sufficient psi-conduciveness.
All tests should be conducted with participants selected as successful in my
standard ball test. Sounds good, so far. You can also have a third group that starts with low conduciveness, which then increases across test repetitions.
Or, better yet, have several groups, each with an isolated factor being tested. Have one group with video cameras perfectly visible in one round, and completely hidden in the other. Have another group be watched over by "suspicious skeptics" in one round, and keep the skeptics out, in the other. Etc.
Though, I still don't understand how things like video cameras would effect psi ability.
My prediction is that hit scores decrease with decreasing psi-conduciveness
of the testing conditions and that hit scores are significant and remain so
without change despite increasing control across conditions.
Constant control for the first test series should be conducted without the
participant's being aware that control is optimal throughout, and the second
test series should be conducted without the participant's being aware that
across test repetitions the experimenter's control increases. Both protocols
are difficult to put into practice, but some experimental variance of
control and variance of psi-conduciveness has already been put into practice
in a wide-ranging ball test series.
I conducted quite a few studies aiming at finding out whether hit surpluses
in the ball test were due to ordinary factors (due to easily accessible
sensory or mnestic information instead of to unexplainable psi factors). I
gave an account of this offline to Linda. I did not find indications of
increased hit scores due to improved sensory information. See the abstract
of an as yet unpublished study below.
What I did find were decreased hit scores due to psi-inhibiting
circumstances.
Suitbert
I don't have much else to say on the rest of the response. I figured I would post it all, in case it was of interest to anyone else.
P. S. Article:
Psi effect or sensory leakage: A crucial experiment probing the ball test.
Abstract
Mainstream parapsychologists have been reluctant to welcome the author's
ball drawing test which, in his view, is a helpful tool for assessing psi
ability. Observers adhering to traditional methodology believed it to be
flawed by insufficient precautions that should prevent sensory leakage. An
overview of research shows that precautionary procedures, suffering from
loss of psi-conducive conditions, may be effectively replaced with
post-cautionary procedures that preserve favorable conditions. In a
crucial experiment, the hypothetical presence of leaking tactile information
was examined by letting 20 participants touch and probe, using one hand
only, one ping pong ball with numbers written on its surface and, using the
other hand, a blank ball, both balls being presented in opaque bags. Despite
such favorable conditions for tactile perceptions, the participants were
unable to tell, above chance, which ball in the bag, left or right,
contained the ball with the number script. They were also unable to tell
which number was written on that ball. The number of hits under standard
ball test procedures whose conditions are less susceptible to sensory
leakage was significantly larger than the number of hits under conditions
favoring sensory leakage. The ball test may therefore be recommended as an
acceptable, objective, efficient and easy tool for selecting psi-gifted
individuals." There are other forms of sensory leakage, than tactile information on the balls. But, this is a good contribution. (And, of course, explaining psi would be a better one.)
fls
14th June 2009, 09:53 AM
Suitbert:
Linda, I would be happy if the remaining controversial issues of our
discussion would be clarified one by one. In the present mail I restrict
myself to the good misses issue. I maintain that a good misses effect
occurred in the Göttingen students' data and that it was replicated in the
London students' data.
Can you provide the London students' data?
Suitbert:
The reasoning underlying the good misses idea is as follows: A participant
missing the desired number draws a non-desired number. Of the four
non-desired (wrong) numbers each should have the same probability to be
drawn, if randomness alone would determine the participant's number
selections.
It turned out, however, that when the participants missed desired numbers
they tended to draw numbers more frequently which were less distant from
the desired number compared with numbers which were more distant from the
desired number.
For targets 2, 3, and 4, two wrong numbers have distance 1 (they are less
distant) and two wrong numbers have distance 2 and/or 3 (more distant). For
targets 1 and 5, two wrong numbers are again less distant and two are more
distant, even though less distant in this case means difference 1 and 2 and
more distant means difference 3 and 4.
I do not find any fault in this simple arithmetic, do you?
Yes. As I said before, it is not consistent. If it were a consistent representation of your idea, you would be comparing numbers which have distance 1 with numbers which have distance 2 or more for all cases.
Suitbert:
You said
"What you are saying is that #3 is a "good miss" for #1 and for #5, but #1
and #5 are not "good misses" for #3."
For inattentive readers this "but" clause sounds as if you would point at
some contradiction in my good misses concept, you deem this concept possibly "not correct" in another sentence. But there is no contradiction, the
arithmetic used in my good misses definition is correct. I would be happy if
you explicitly agreed that it is correct.
I really do not understand why you expect me to agree with this. In one case a distance of 2 is a good miss and in other cases a distance of 2 is an other miss. How is this not a contradiction?
Suitbert:
Instead of using the term "good misses" you use the term "secondary pattern"
which I would recommend not to use because there is no "primary pattern"
against which a "secondary" pattern could be contrasted...
The primary pattern is the pattern of "calls" to "draws".
Suitbert:
...and because
"secondary" implies arbitrariness with some deprecatory evaluation. You said
I "picked" this pattern as if I had a choice among various "patterns".
"Secondary" is the term that you used. I cannot speak to the issue of whether you are now unhappy with your choice of name. I thought secondary was a reasonable choice on your part, but if you wish to use a different name, that is of course up to you.
Suitbert:
Next you give an example of "secondary patterns" from medical studies where,
apparently, results were mixed. This seems to be another strategy of yours
which I would recommend to renounce: Argument by analogy - the case under discussion here is devalued by generalizing from some negative instance
elsewhere.
There are many valid criticisms of sub-group and post-hoc analysis. If I give an example, it is only meant to help clarify the issue. However, the "negative instance" applies to the use of sub-group and post-hoc analyses, not to the example. The only parallel that I am drawing between my example and your study is that they both involve post-hoc analyses.
Suitbert:
You close the issue by saying:
"The point is not whether you *can* find secondary patterns, the point is
whether or not one can draw valid and reliable conclusions when one is
found."
Here you seem to imply that the effect that I found is not valid and not
reliable.
Regarding reliability: The effect IS reliable because I replicated it, you
discount my information by implying the effect that I found is NOT reliable.
Can you provide the data where you replicate the effect?
Also, your analysis of the effect of psi-power on "good misses" does not replicate the expected pattern. Instead, it shows that psi-power is unrelated to "good misses" (despite your post-hoc convoluted explanation). According to that either "good misses" does not reflect psi, or "hit %" does not reflect psi. One or the other may have to be discarded.
Suitbert:
Regarding validity: You do not explain why you deem the pattern of drawn
wrong numbers (almost-hits) not to be valid. I am entitled, I think, to
maintain that the validity of this finding can hardly be overstated. How can
this "pattern" of drawn wrong numbers be explained? Apparently, some
"intelligent" and clearly "goal-oriented", but unconscious process becomes
manifest.
Most importantly, it lacks convergent validity - it fails to correlate with psi-power. It lacks criterion validity, mostly because it suffers from the same problem that all psi research suffers from - there is no independent measure of psi. It lacks construct validity - you haven't established how psi considers these categories (ordered, directional, symbolic, etc.). Simply put, all you have established is that in this particular set of data there is a pattern, but you haven't established that it is "psi" that explains the pattern.
Suitbert:
Another point: Good misses cannot reasonably be explained by sensory leakage or fraud, because being misses, they do not increase the participants' hit scores. In addition, since "good misses" is a phenomenon for which only psi appears to yield a final "explanation", the "ordinary" effect of heightened hit proportions is also more likely to be psi-dependent. Psi is functioning, so why should psi be limited to goal-oriented misses, why should it not also be effective with goal-oriented hits.
Did you mean to say that the other way around?
Here is the problem. You are analyzing an uncontrolled data set, which means that ordinary sensory leakage or fraud accounts for the hits. So why wouldn't ordinary sensory leakage or fraud, in addition to chance or other artefacts, account for this "good misses" pattern?
Suitbert:
Linda, you wrote: " If near misses are more likely, and drawing #3 counts as
a near miss on every draw, shouldn't #3 be drawn the most often on the data
you provided earlier where you asked the participants to draw the balls out
in the following order - 1234512345...?"
My reply: This is a sensible prediction. Number 3 is closer to the target
four times, numbers 2 and 4 two times, and numbers 1 and 5 only once. The
following table shows calls, draws, and hits for the five targets, summed
across all participants:
Calls Draws Hits
----------------
1 1632 1627 384
2 1632 1600 379
3 1632 1676 407
4 1632 1638 420
5 1632 1619 417
You can see that the frequency of draws of number 3 is larger (1676) than
that of the other numbers. So your prediction was sound.
Ah, so the data you gave earlier for these draws contained a typo - you had 1699 draws for #2.
Suitbert:
I did not yet check whether the good misses effect occurred in the
1234512345 data base also. The following Table shows the frequencies of good and bad misses for the five targets:
Good Bad Difference
---------------
1 634 614 20
2 652 601 51
3 615 610 5
4 626 586 40
5 598 617 -19
----------------
3125 3028 97
----------------
Good misses are more frequent than bad misses for numbers 1,2,3,and 4. The
misses for number 5 are an exception, as they were in the main student
sample.
You are aware (I hope) that those slight differences in the frequency of #3 and in the "good misses" vs "other misses aren't statistically significant.
Suitbert:
I also checked the misses in the data base of my three psi-gifted Ukrainian
participants (with standard ball test data). Good misses: 1000 (exactly
1000), bad misses 904. The difference is significant (Chi2=4.74. df=1, p =
.02).
In other words, the reliability of this effect is considerable. And if you
can't explain it, it can be called valid, valid not as something that can be
expected (expected phenomena are boring rather than valid), but valid as a
challenge.
It cannot be called a valid demonstration of psi just because you have failed to look for an explanation. It could equally be said to be a valid demonstration that invisible pink unicorns prefer to nudge the #3 ball into the ball-drawer's hand.
I may propose that hyperglycemia is a valid consequence of having a last name that starts with the letter "H". And among certain populations I would reliably discover, with a high level of statistical significance, that a larger proportion of people whose last name begins with "H" do have hyperglycemia. Can I legitimately claim that this demonstrates the phenomenon of Alphabetopathy, whereby people are susceptible to those conditions which start with the same letter as their last name?
Linda
fls
14th June 2009, 10:27 AM
However, in the eyes of other parapsychologist researchers, you are probably doing a fine job. So, keep up the good work!
Actually, based on the information Suitbert has provided, plus other other information he sent me, I don't think he is finding acceptance among other parapsychological researchers. As he pointed out earlier (I think I quoted it here somewhere), he suspects that parapsychologists have become overly concerned with addressing criticisms from skeptics, to the detriment of psi-conducive research.
Linda
fls
14th June 2009, 11:02 AM
Suitbert:
Your idea of increasing control is'nt bad, but not yet good enough because in your protocol increase of control is correlated with increase of psi-inhibiting factors. You skeptics are generally unaware of a psi researcher's obligation to optimize, for any psi test, the psi-conduciveness of the experimental conditions. Psi inhibiting is a participant's experiencing physical or mental strain at carrying out the required task. In addition, a tense, suspicious, distrusting social environment, screens and other vision-protecting devices including video cameras have psi-inhibiting effects.
What is your experience and opinion on online psi-tests? It seems like
these tests would be as psi-conducive as your ball-drawing tests - the
participant is relaxed, the social environment is not tense, suspicious or
distrusting, there are no screens or cameras, immediate feedback is
available, etc.
Linda
Uncayimmy
14th June 2009, 02:17 PM
What is the actual psi claim? That the person drawing the ball is able to predict what ball he will pull out of the bad?
Might we have also learned this: Given ample time to practice before being tested, people given control of the shuffling of a bag of balls and immediate feedback are able to locate a numbered ball of their choosing about 3% to 6% of the time?
Wowbagger
14th June 2009, 07:44 PM
Actually, based on the information Suitbert has provided, plus other other information he sent me, I don't think he is finding acceptance among other parapsychological researchers.
Well, that's probably worse news for Damien Broderick than it is for anyone else. He was the one promoting Suitbert's work as the "future of psi", in his book, after all.
.....But, then, why do I feel saddened? :(
fls
15th June 2009, 09:39 AM
Linda:
What is your experience and opinion on online psi-tests? It seems like these tests would be as psi-conducive as your ball-drawing tests - the participant is relaxed, the social environment is not tense, suspicious or distrusting, there are no screens or cameras, immediate feedback is available, etc.
Suitbert:
I asked my three Ukrainian psi stars to run an online psi-test (Zener symbol guessing). I told them to complete a certain number of trials in order to avoid optional stopping. The data were automatically recorded by computer.
Here are the results (MCE .20):
Galina: 278 hits of 1220 trials, surplus 13.9%, Z = 2.40 p = .008
Tanya : 275 hits of 1240 trials, surplus 10.9%, Z = 1.88 p = .03
Vanya : 291 hits of 1300 trials, surplus 11.9%, Z = 2.11 p = .02
Is it really reasonable to perform one-tailed testing when you clearly have a two-tailed hypothesis? You try to claim that you are only interested in excess when it suits you (i.e. it enables you to claim "significant" results), yet you take full advantage of two-tailed differences in your explanations and in your use of Chi-square and Z-square statistics.
I see that the effect sizes are tiny, so if it is measuring psi, one would be forced to say that it does a poor job of it. Why should this be a poor test of psi?
On the one hand, it is interesting to see that these participants who obtained high hit scores in the ball test, also succeeded in a computer-programmed psi test. This I regard as another "convergent" indication that ball test results are due to psi and not to invisible pink unicorns who prefer to nudge the balls into the ball-drawer's hand.
How can you regard this as convergent? The effect size is so different, it would be very difficult to persuade anyone that you are measuring the same thing. It is, however, consistent with the IPU idea, since one would expect that they wouldn't operate under these conditions.
On the other hand, the effect size of the computer test results are considerably lower than the ball test results. Under standard ball test conditions the results were
Galina: 370 hits of 960 trials, surplus 92.7%, Z = 15.9 p < .000001
Tanya : 298 hits of 960 trials, surplus 55.2%, Z = 8.2 p < .000001
Vanya : 440 hits of 960 trials, surplus 129,2%, Z = 23.8 p < .000001
By standard do you mean controlled or uncontrolled?
The question is why is the effect size much lower in this online psi-test than in the ball test? Since you are an expert in unicorns and prepared, I think, to admire achievements of the fraudulent subspecies of these invisible animals in particular, you might have an answer to this question. I don't have experience with unicorns and would rather conclude, tentatively, that the natural manual activity which the ball test requires is the main cause for larger hit percentages in the ball test.
The most obvious conclusion would be that you are measuring two different things.
I expected a decline of psi by PC-psi tests before I obtained the online test results. Otherwise I would have hardly devised the ball test procedure at all.
By the way, my Ukrainian participants showed also considerably lower effect size in an ordinary Zener card test and in a dice throwing test, even though all three obtained significant results in these tests as well.
How did the three Ukrainian participants come to your attention?
One difference is that your ball-drawing test increases the amount of sensory information available.
One has to consider more psi-conducive conditions aside from that "the participant is relaxed, the social environment is not tense, suspicious or distrusting, that there are no screens or cameras, immediate feedback is available, etc."
Other than "increase the opportunities for sensory input", what other psi-conducive conditions would you consider?
Linda
Ivor the Engineer
15th June 2009, 12:43 PM
It seems psi-conducive conditions are those in which it is both possible and more straightforward to cheat.
steenkh
16th June 2009, 04:43 AM
It is interesting to read Linda's analysis of the experiments and the problems she has found with them. However...
Unfortunately I think what we will learn is it is no more effective than being polite for a while, then being blunt.
So there is really no advantage in stopping being polite?
Ivor the Engineer
16th June 2009, 05:21 AM
So there is really no advantage in stopping being polite?
Depends if you've got other stuff to do. Being blunt tends to get to the inevitable conclusion faster than being unendingly polite.
steenkh
16th June 2009, 06:54 AM
Depends if you've got other stuff to do. Being blunt tends to get to the inevitable conclusion faster than being unendingly polite.
If it is a question of time wasted, it is even faster not to engage in the discussion in the first place!
fls
16th June 2009, 07:00 AM
Linda:
As I said before, it is not consistent. If it were a consistent representation of your idea, you would be comparing numbers which have distance 1 with numbers which have distance 2 or more for all cases.
Suitbert:
For each target I compared two misses with less distance from the target with two misses with more distance from the target. All four misses were thus equally considered for the five targets. This is consistent and balanced since all misses are considered.
Your definition may be called "inconsistent" insofar as you consider two misses with distance 1 for targets 2, 3, and 4, and only one miss with distance 1 for targets 1 and 5. In addition, you get three misses with distance "2 or more" for targets 1 and 5 and only two misses with distance 2 for targets 2, 3, and 4. Furthermore, for distances "2 or more" distance 2 occurs six times, distance 3 twice and distance 4 also twice. Finally, for an analysis with your definition you have to take different routes for calculating expectancies. This is confusing and objectionable. If you maintain your point, please tell me how you calculate expectancies. For my way of comparing two frequencies I take proportion .5 as expectancy for misses with less distance and .5 as expectancy for misses with more distance throughout for all five target cases. This is consistent, simple, transparent.
As you point out, the opportunity to miss and to miss at a greater distance
varies from number to number. It is inconvenient to provide a proper model
which takes this into consideration, so instead you chose a model which does
not take this into consideration but is convenient. It is consistent with
convenience, but it is not consistent with your own description of your
ideas about distance.
Linda:
"Secondary" is the term that you used. I cannot speak to the issue of whether you are now unhappy with your choice of name. I thought secondary was a reasonable choice on your part, but if you wish to use a different name, that is of course up to you.
Suitbert:
Please tell me where I used "secondary pattern" in my correspondence with you. I did NOT use the term "secondary pattern" or even "secondary" with some other noun.
You use the term "secondary effect" or "secondary psi-effect" six times in
the paper in which you discuss the "good misses" effect.
Linda:
The only parallel that I am drawing between my example and your study is that they both involve post-hoc analyses.
Suitbert:
The term "post-hoc analysis" with its negative connotation is used here, as often in criticisms of skeptics, to cast doubt on unwelcome results. I did not claim that I expected the "good miss" phenomenon at the outset. If I had, one might blame me for pretending to have expected it instead of discovering it by inspecting the data. But when I found the same pattern in several other data sets and I was prepared beforehand to find them again. I feel your attribution of "post hoc" to my analysis is another derogatory ploy.
I have two questions for you. If you had not found the "good miss"
phenomenon, would you have included that in your results? And, if the "good
misses" phenomenon had been absent, would you have considered that evidence that psi is absent, or rather would you have concluded that psi does not always miss in that manner?
Linda:
Can you provide the data where you replicate the effect?
Suitbert:
Yes, I can. Do you mean the original trial by trial data? Which format do you need? I would send them if you would really do an analysis and report the result, otherwise the extra time for that work might better be spared.
I misspoke. I meant whatever write-up you have available. It may be
interesting to look at the original data as well, but not if it requires
effort on your part.
Linda:
Also, your analysis of the effect of psi-power on "good misses" does not replicate the expected pattern. Instead, it shows that psi-power is unrelated to "good misses" (despite your post-hoc convoluted explanation).
Suitbert:
Again your derogatory "post-hoc"! All explanations of phenomena are "post hoc" because the phenomena always precede their explanations.
You provided a reasonable explanation of what you expected to see. When you didn't find what you expected to see, all of a sudden the explanation
changed to a convoluted pattern whereby psi-missing and psycho-analytic
defense mechanisms are brought into it. This was entirely post-hoc.
This shows up again in the write-up you sent me, where the data shows
patterns that are the opposite of those you expected or those you had
previously demonstrated. In that write-up you also show that this effect is
not found in some of your datasets. It looks like whatever you find, good,
bad, or indifferent, will be taken to be consistent with psi.
Linda:
According to that either "good misses" does not reflect psi, or "hit %" does not reflect psi. One or the other may have to be discarded.
Suitbert:
I do not understand your logic. A very good dart thrower might hit the centre of the target with concentric circles (12) very frequently, but when he misses 12, his dart will land on 11 more often than on 10, on 10 more often than on 9 etc. This is no case for "either or" separations, but for "as well as" inclusions.
And that is the logic you started with - it makes sense. But you discovered
that very good dart throwers don't land on 11 or 10 more often when they
miss 12 - they land all over the place. And very poor dart throwers, while
they can't hit the actual target, are better at hitting 11 or 10 than the
very good dart throwers. Sometimes. The more obvious conclusion is that
whatever it is that leads you to miss the target, it's different from dart
throwing accuracy. If there were a transparent shield in front of the
target, such that darts thrown directly at 11 or 10 were deflected away from
those numbers, while darts thrown further away were deflected towards those numbers, then the unexpected results would make sense. But the presence of a shield (i.e. a "secondary effect") doesn't really have anything to do with dart-throwing skill - i.e. you're not measuring the same thing.
Suitbert (earlier):
Regarding validity: You do not explain why you deem the pattern of drawn wrong numbers (almost-hits) not to be valid. I am entitled, I think, to maintain that the validity of this finding can hardly be overstated. How can this "pattern" of drawn wrong numbers be explained? Apparently, some "intelligent" and clearly "goal-oriented", but unconscious process becomes manifest.
Linda:
Most importantly, it lacks convergent validity - it fails to correlate with psi-power. It lacks criterion validity, mostly because it suffers from the same problem that all psi research suffers from - there is no independent measure of psi. It lacks construct validity - you haven't established how psi considers these categories (ordered, directional, symbolic, etc.). Simply put, all you have established is that in this particular set of data there is a pattern, but you haven't established that it is "psi" that explains the pattern.
Suitbert:
This reminds me of drumfire... with empty rounds, not dangerous.
Good misses does not lack "convergent validity" because good misses are correlated with hit percentage.
No they don't. In 2 of the 3 datasets you have provided information on,
they don't correlate, and in the 3rd, there is a correlation in the opposite
direction you predicted.
Good misses do correlate with psi-power, even though their psi power is less, otherwise they would not correlate with hit percentage.
The graph you presented does not show correlation with psi-power.
"Criterion validity"? This term does not apply here. The ball test does not assess good misses tendencies, good misses are an additional effect whose validity need not be based on other measured criteria just as, say, response sets affecting questionnaire results can be found without additional assessment.
Criterion validity in this case means that if we had a way of detecting
whether or not psi was present, we would be able to show that good misses
were present when psi was present and absent when psi was absent.
The terms "construct validity" and "categories" (ordered, directional, symbolic, etc.) remain unexplained.
Construct validity means that you have a theoretical/hypothetical basis by
which psi would miss in this particular manner - i.e. that psi considers
numbers as occurring in a particular order and that there is no direction to
that order. None of that has been established. Psi could treat them as
symbols (the claim of "good misses" for Zener symbols, for examples), it
could order them but there is a direction to that order, or it could order
them in a non-linear fashion.
I HAVE established that the good misses phenomenon is psi-related, because it is correlated with hits in the ball test. In addition, we find similar phenomena in earlier studies with the Zener symbol test, they were called "displacement effect" in the literature. Some symbols were more easily selected than other symbols when targets were missed. Displacements with numbers are easier to assess because "similarity" in this case is a matter of numerical difference, not of subjective judgment.
That psi is interested in a numerical difference is a subjective judgement
on your part. How do you know it's not something like even vs. odd numbers, straight vs. curvy lines, mirror symmetry vs. rotational symmetry vs. asymmetry, etc.?
Linda:
Here is the problem. You are analyzing an uncontrolled data set, which means that ordinary sensory leakage or fraud accounts for the hits. So why wouldn't ordinary sensory leakage or fraud, in addition to chance or other artefacts, account for this "good misses" pattern?
Suitbert:
Chance? No, the probability of chance has been calculated and must be considered, after several replications, as extremely low.
But in some of the datasets you have mentioned, the effect isn't replicated.
And I didn't mean *only* chance - I should have phrased it as "sensory
leakage and fraud and chance and other artefacts".
Artifact? This is merely a word in this context without any substance (can Linda construe any artifact here, I doubt it)
And example of an artefact would be some feature on the #3 balls which made it more likely that they would be drawn - someone accidentally added an extra set of #3 balls, they ran out of the regular balls and bought a new
batch which feel slightly different, a different marker was used which left
a different texture, etc. Since #3 draws were always a "good miss", an
excess of #3 draws would make it look like there was an excess of "good
misses".
Sensory leakage? If it occurred at all, it might cause mixing up graphically similar digits, not numerically adjacent numbers.
How do you know?
Fraud? A thief selects from a bunch of keys the key to the safe and, if he misses this key, he will not select preferably a key to a neighboring door.
How do you know?
Suitbert (earlier):
I did not yet check whether the good misses effect occurred in the 1234512345 data base also. The following Table shows the frequencies of good and bad misses for the five targets:
Good Bad Difference
---------------
1 634 614 20
2 652 601 51
3 615 610 5
4 626 586 40
5 598 617 -19
----------------
3125 3028 97
----------------
Good misses are more frequent than bad misses for numbers 1,2,3,and 4. The misses for number 5 are an exception, as they were in the main student sample.
Linda:
You are aware (I hope) that those slight differences in the frequency of #3 and in the "good misses" vs "other misses aren't statistically significant.
Suitbert:
Yes, I am aware of that. The point here is that the pattern of differences
does not differ significantly from the earlier pattern of other samples.
Linda:
It cannot be called a valid demonstration of psi just because you have failed to look for an explanation. It could equally be said to be a valid demonstration that invisible pink unicorns prefer to nudge the #3 ball into the ball-drawer's hand.
Suitbert:
This statement, if valid, would make all demonstrations of psi phenomena impossible because psi is defined, at present at least, as unexplainable phenomena of transfer of information and mental effectance on objects or living matter without motor and other tools.
Exactly. You have no positive definition of psi. The best you have is "I
can't really think of anything else". It's a "psi of the gaps" sort of
definition. The problem is that you haven't put any constraints on what the
gap can be filled with other than "has not yet been demonstrated to exist".
Linda
Wowbagger
16th June 2009, 07:24 AM
So there is really no advantage in stopping being polite?
Depends if you've got other stuff to do. Being blunt tends to get to the inevitable conclusion faster than being unendingly polite. It is quite possible to be both blunt and polite, at the same time. The two are not necessarily mutually exclusive.
Not that I claim to be good at doing that, yet. But, I have seen it done, and I am striving to improve my skills.
fls
16th June 2009, 10:41 AM
What is the actual psi claim? That the person drawing the ball is able to predict what ball he will pull out of the bad?
Might we have also learned this: Given ample time to practice before being tested, people given control of the shuffling of a bag of balls and immediate feedback are able to locate a numbered ball of their choosing about 3% to 6% of the time?
Damien:
I wonder how often *the same balls* are drawn? I can't immediately think of any way to find out that wouldn't mark a ball and thus provide a possible cue for later draws, short of having a transponder inside each ball that triggers a (blinded) recorder when the ball is held, or taken outside the bag's opening, or the like. Alternatively, a fresh ball of the same number might be used as the replacement, although there's room for errors to creep in there. Better yet, as mentioned before, a large number of bags could be used in random order.
Suitbert:
I have no objection to using a larger number of bags, this I suggested myself to the German skeptics (GWUP) when they demanded more precaution. They were not satisfied with this precaution, they insisted that each ball should be used only once. The practical consequences were annoying for the participant, a skeptical assistant was running around busy with replacing the balls used, thus intimidating the innocent participant.
Instead of using possibly unnecessary precautions out of hand one should make sure, by appropriate experiments and/or data analysis, whether a precaution of using several bags successively is or is not necessary at all. An intent to merely calm prejudiced skeptics' down should NOT be regarded as sufficient grounds.
Several empirical attempts of mine did not confirm skeptical hypotheses:
1. The probability of hits for just drawn numbers, if called again after putting them back into the bag, is not increased compared with the probability of calling other numbers.
2. Hit rates of drawing assigned numbers 11111111111122222222222233333333333344444444444455 5555555555 that would facilitate an ordering of numbers in the bag, if such thing occurred, were not larger than drawing assigned numbers 12345123451234512345123451234512345123451234512345 1234512345 that hinders an ordering of numbers in the bag.
3. In one test conducted with my Ukrainian participants I had added alphabetical codes on each individual ball, the experimenter was instructed to also record the ball code, not only the drawn numbers. I counted the frequencies of draws in between for each repetition of individual ball draws and compared this with a computer simulation. It turned out that repetitions of draws of individual BALLS (not only numbers) were significantly delayed somewhat, for the three participants alike. This showed that the participants tended to put the just drawn balls deeper or farther away into the bag than was necessary for an objective ball randomization. The just drawn NUMBERS had no lesser chance to be drawn again, certainly because of the large number of balls used (50). If you would use, say, 250 balls and put just drawn balls on the bottom of the large bag containing 250 balls this would have an effect on ball repetition similar to using, say, five different bags with 50 balls successively.
This is an interesting discovery. This would have an effect similar to drawing balls without replacement and providing feedback after every draw - a strategy which increases the probability of correctly guessing the next ball as it changes the probabilities from combinations to permutations.
4. Since recognizing drawn balls and a helpful ordering of them in the bag would be a case of learning, hit rates should increase during runs of 60 trials. But they do not increase, neither low hitters nor high hitters showed an indication of a learning curve.
5. As I said earlier, hit rates with using "psi-pods", instead of ping pong balls, where numbers written on paper are put inside the pods and covered by a lid, were not smaller than hit rates with using balls.
Do you have a write-up on this data that you can send me? It sounds promising.
Using five different bags successively would certainly exclude an ordering the numbers in the bag subconsciously and exploiting cues of temperature and "unicorn stimulation". But you would never know whether this was necessary at all, and you would risk arousing psi-inhibiting psychological reactions.
It seems like it would be valuable to figure out how to make this a fairly psi-neutral procedure, as it would help relieve several concerns. It seems like the helpful passing of a fresh bag shouldn't be an anxiety-producing situation, especially if you presented it as the testing of additional psi-conducive ideas, rather than as an anti-sensory-leakage precaution. For example, you could tell the participants that it has been found that treating the bags with specific frequencies of electro-magnetic radiation is psi-conducive and that you are testing bags which have been treated with different frequencies.
Precaution by avoidance at any event is comparable to excessive irrational avoidance of patients suffering from chronic anxiety. They will never learn to know that their avoidance behavior is needless. My research may be compared with desensibilisation therapy, an adaptive benefit is unquestionable.
That's why I think that some attention should simply be paid to providing a non-anxiety-producing explanation for the various controls so there is no perception that psi will be inhibited. I do agree that interruptions should be avoided.
Linda
Beth
16th June 2009, 10:48 AM
Sensory leakage? If it occurred at all, it might cause mixing up graphically similar digits, not numerically adjacent numbers.
How do you know?
I've been trying to follow along with this discussion, but haven't had time to participate. I think Linda has done an excellent job of bring up potential problems and I haven't thought of anything she hasn't already mentioned.
However, I would like to mention that I think this is going a bit too far. If sensory leakage occurs, it's going to be graphically similar digits getting confused because that's how they might have perceived the numbers with either vision or touch. Hearing, smelling or tasting seem an unlikely route for sensory leakage in this situation. If they are perceiving the number with some sense that isn't graphically based, why wouldn't that qualify as the psi-effect we're looking for?
fls
17th June 2009, 09:09 AM
Linda:
Is it really reasonable to perform one-tailed testing when you clearly have a two-tailed hypothesis? You try to claim that you are only interested in
excess when it suits you (i.e. it enables you to claim "significant"
results), yet you take full advantage of two-tailed differences in your
explanations and in your use of Chi-square and Z-square statistics.
Suitbert:
This question does not have one simple answer. For my Ükrainian participants, I have to consider previous results. In eight years of psi test
participation, after conducting over 50 different tests with each one, I did not observe any negative deviation from chance expectancy. So I cannot expect negative deviations in any further such test. Hence, a one-tailed test is appropriate whenever a choice between one- and two-tailed tests is given. For Chi2 and Z2 tests, no such choice exists.
But you don't limit your one-tailed testing to your Ukrainian participants. You perform this testing on other datasets which do contain negative deviations. This issue is critical when attempting to study psi, because it creates a situation which increases the number of false-positive tests. And when the true-positive rate is probably very low, it aggravates the problem that any "significant" results are more likely to be false than true. If anything, the standard for statistical testing should be held higher, rather than lower than for other research, in order to increase your chances of finding true relationships.
See this article for an explanation of the problem.
http://www.plosmedicine.org/article/info%3Adoi%2F10.1371%2Fjournal.pmed.0020124
Chi-square and z-square tests take into account deviations in both directions. It is your *choice* of those particular tests/measures which dictates the inclusion of both tails.
Suitbert:
Yes, this psi test online does a poor job, and I assume that all psi tests online do a poor job. Why this is so? My tentative answer is it’s the amount of technological conditions, lack of naturalness, lack of visual and motor activity. I quote Damien: “Yes, this intuitively appeals me from my analysis of the likely evolved competences relevant to psi--that it's an adjunct to on-going sensorimotor processes of being-in-the-world.”
Linda:
How can you regard this as convergent? The effect size is so different, it
would be very difficult to persuade anyone that you are measuring the same
thing. It is, however, consistent with the IPU idea, since one would expect
that they wouldn't operate under these conditions.
Suitbert:
A low effect size of test X does not make the validity of test X different from test Y which has a larger effect size. Ski jumpers may show different jump distances depending on good or bad jump conditions.
I agree, but does that really apply here? You are claiming convergence - people who do well in one test do similarly well in another. But without testing the non-gifted ball-drawers at the same time for comparison, you don't know if this relationship holds.
Suitbert:
IPU? An acronym for what?
Sorry - Invisible Pink Unicorn
Suitbert:
In this case I mean “controlled” because all tests of the three Kiev participants were done under the supervision of experimenter Tanya. She was also her own experimenter. I myself did many experiments myself, so I was my own experimenter. In the parapsychological literature many cases of self-administered tests have been reported. Even Röntgen’s discovery of X-rays was the result of a test that he conducted on himself.
What do you think would have happened if it was only ever Rontgen's self-tests which demonstrated X-rays? Are you familiar with the story of N-rays? (http://en.wikipedia.org/wiki/N_ray) It isn't a matter of whether or not self-tests can serve as a starting point, it's a matter of whether or not it can ever progress beyond that point.
The question is why is the effect size much lower in this online psi-test than in the ball test? Since you are an expert in unicorns and prepared, I think, to admire achievements of the fraudulent subspecies of these invisible animals in particular, you might have an answer to this question. I don't have experience with unicorns and would rather conclude, tentatively, that the natural manual activity which the ball test requires is the main cause for larger hit percentages in the ball test.
Linda:
The most obvious conclusion would be that you are measuring two different things.
Suitbert:
I am reluctant to acknowledge two independent psi powers.
You don't have to. Since you have provided no constraints, other than "unexplained", this effect can be due to Hyperdimensional Dragons - another entity which is invoked to 'explain' the unexplained.
Linda:
One difference is that your ball-drawing test increases the amount of sensory information available.
Suitbert:
Yes, but I showed in various tests that the ball test condition does not provide USEFUL sensory information. Psi effects might be facilitated when many sensory channels are used with no avail. Psi might have a “compensatory” function in the first place. Who knows.
How do you know it does not provide USEFUL sensory information? I already referred to research which shows that when we are asked to explicitly describe our reasoning, we perform poorly compared to allowing for implicit processes. You tested for explicit reasoning - the kind of reasoning that we DO NOT think is taking place when determining whether or not a ball feels 'familiar'.
Linda:
Other than "increase the opportunities for sensory input", what other
psi-conducive conditions would you consider?
Suitbert:
As I just said, an increase of sensory input without useful information might be psi-conducive.
How would you go about achieving that state?
Suitbert:
Social factors have also been found to be either psi-conducive or psi-inhibitory. Remember my experiments with students wíth significant hit surplus at home whose hit scores were either lowered by my presence as experimenter or were considerably increased. This difference of social influence on psi processes I cannot yet explain.
How would you go about distinguishing this from chance a priori, as you would expect performance to vary due to chance as well?
Linda
fls
17th June 2009, 11:44 AM
Linda:
For example, you could tell the participants that it has been found that treating the bags with specific frequencies of electro-magnetic radiation is psi-conducive and that you are testing bags which have been treated with different frequencies.
Damien:
Terrible idea. If psi is real, there is always an enhanced
possibility of what might loosely be called empathic entanglement
between ostensible Subject or Participant and ostensible
Experimenter. There is ample evidence of experimenter effects that
cannot be explained away as better scrutiny on the part of those who
get null results. Lying would seem to me an especially corrosive
basis for a relationship that needs to open and trustworthy on both
sides. I realizes this flies in the face of a century of deceitful
psych and medical protocols... :)
One doesn't even need to invoke "empathic entanglement" in order to know that lying doesn't work - what is the point of blinding, after all? I'm not suggesting that you need to lie. Simply make whatever it is that you use to implement a psi-conducive control true.
Linda
fls
17th June 2009, 12:51 PM
Linda:
This is an interesting discovery. This would have an effect similar to drawing balls without replacement and providing feedback after every draw - a strategy which increases the probability of correctly guessing the next ball as it changes the probabilities from combinations to permutations.
Suitbert:
You are right, in principle, but the effect is much smaller than in studies without replacement, and for such studies it is already small. Even if you would make a just drawn number inaccessible and even if you would instruct the participant not to call the just drawn number, the expectancies for the four remaining numbers would be 10/49 (0.204) instead of 10/50 (0.20). For one total test series consisting of 360 trials the number of hits would be 73.4 instead of 72, i.e. 1.4 surplus hits by artifact – under unrealistically favorable conditions!
This is an advantage that would accumulate. Sure, after the first draw it would be 0.204, but after the 6th draw it might be 0.227, which is higher than your measured hit rates.
Suitbert:
I have not written up the results, I just received the first data from Kiev. I tabulate the observed hit rates from the ball test (conducted in 2003) and the psi-pods test (conducted in 2009). The total number of trials for each participant was 480 for the psi-pods test and 960 for the ball test. Expected hit rate .20.
Balls Pods Diff
Galina .385 .392 +.007
Tanya .310 .267 -.043
Vanya .458 .360 -.098
Galina’s pod results do hardly differ from her ball results, the pod results of Tanya and Vanya are lower, but still highly significant.
Were these experiments controlled or uncontrolled?
I keep forgetting to ask you...how did the 3 Ukrainian stars come to your attention?
Linda:
It seems like it would be valuable to figure out how to make this a fairly psi-neutral procedure, as it would help relieve several concerns. It seems like the helpful passing of a fresh bag shouldn't be an anxiety-producing situation, especially if you presented it as the testing of additional psi-conducive ideas, rather than as an anti-sensory-leakage precaution. For example, you could tell the participants that it has been found that treating the bags with specific frequencies of electro-magnetic radiation is psi-conducive and that you are testing bags which have been treated with different frequencies.
Suitbert:
I never lie in my experiments (see Damien’s comment), but changing bags would probably not have a strong negative effect on participants. One might ask participants to change bags, for each trial she should make a decision among four (?) bags so that she maintains some control, subjectively. One might also tell her that the bags should be selected equally often.
I agree that it's a good idea to not lie. One would not need to lie in order to implement my suggestion.
Letting the participants select the bags would dissipate the effect of the control. Why is it necessary to allow the participant the opportunity to gain an ordinary advantage in order for them to demonstrate what you are calling an unexplainable advantage?
Linda
fls
17th June 2009, 10:22 PM
Suitbert:
Yes, I said „secondary effect“, or „secondary psi-effect”. When you speak of “secondary pattern” that the experimenter “picks” then you raise the idea that what is being analysed here is dependent on arbitrary decisions.
But isn't it? You dismiss that your model is inconsistent, you admit that only those tested relationships that are positive get any mention, and that any result (good, bad or indifferent) would be considered to be consistent with psi.Linda:
I have two questions for you. If you had not found the "good miss"
phenomenon, would you have included that in your results?
Suitbert:
Of course not, I would not have searched it.
Testing multiple relationships and reporting only on those which are positive falsely increases the number apparent relationships. It makes associations discovered in this manner unreliable.
Linda:
And, if the "good misses" phenomenon had been absent, would you have considered that evidence that psi is absent, or rather would you have concluded that psi does not miss in that manner?
Suitbert:
My conclusion that psi is present is based on hit scores, not on miss scores. I would not have missed a secondary psi effect.
If any answer (good, bad or indifferent) about the presence of secondary effects leaves your conclusions about the presence of psi unchanged, how can a secondary psi effect be used as evidence for psi?
Linda:
You provided a reasonable explanation of what you expected to see. When you didn't find what you expected to see, all of a sudden the explanation changed to a convoluted pattern whereby psi-missing and psycho-analytic defense mechanisms are brought into it. This was entirely post-hoc.
This shows up again in the write-up you sent me, where the data shows patterns that are the opposite of those you expected or those you had previously demonstrated. In that write-up you also show that this effect is not found in some of your datasets. It looks like whatever you find, good, bad, or indifferent, will be taken to be consistent with psi.
Suitbert:
It is quite sensible to try to explain phenomena by using previous experience and existing knowledge, the phenomena should be consistent with one’s knowledge. That’s what you and other skeptics do as you try to explain results of parapsychological research by sensory leakage or fraud or artifact.
I myself am trying to interpret the observations of my research within a system of knowledge in which psi plays a role. So it is sensible to seek consistency with this knowledge.
Can you give me an example of something which would be inconsistent with this idea?
You are right, the results with bad misses do not replicate as I thought they would and a correlation had even a reverse direction. Anyway, there was a correlation which demands an explanation. I don’t think YOU can provide an explanation here. So why don’t you let others figure something out.
Can you give me an example of a situation where psi cannot be the explanation?
Suitbert:
The main point is to acknowledge that the good misses phenomenon, if it exists, shows that it is based on some rational tendency or intention, without being guided by the participants’ conscious intention.
Even if that were acknowledged, it doesn't tell you that "good misses" and psi are related.
Linda:
No they don't. In 2 of the 3 datasets you have provided information on, they don't correlate, and in the 3rd, there is a correlation in the opposite direction you predicted.
Suitbert:
There are correlations that should be explained. My explanation is tentative. Tentative explanations of observed phenomena are common in scientific discourse.
I was simply pointing out that your explanation didn't explain the observed results.
Linda:
The graph you presented does not show correlation with psi-power.
Suitbert:
The graph shows a correlation between double-hit count and good misses.
I was referring to your analysis of psi-power and good misses described in this paper:
http://www.psych.uni-goettingen.de/home/ertel/ertel-dir/downloads/ertelchapterwithfigurespdf.pdf
Suitbert:
Double hit count I regard as a measure of psi ability. You don’t?
What is the gold-standard measure of psi ability against which you measure double hit count? For example, if I want to regard spiral CT as a valid measure of the presence of pulmonary emboli, I measure it against pulmonary angiogram.
Suitbert:
What is, in your view, the validity of the double hit count? The significance tests show that you cannot take this variable as a random variable. So what are the crucial factors giving rise to surplus hit rates in the ball test?
If you wish for me to consider it a measure of psi, then you should be able to point me to research where it was compared to independent tests of psi.
Linda:
Criterion validity in this case means that if we had a way of detecting
whether or not psi was present, we would be able to show that good misses
were present when psi was present and absent when psi was absent.
Suitbert:
I consider the correlation (.40) shown in Figure 1 as an indication of criterion validity in your sense.
That is simply a correlation between double hit count and good misses. Are you suggesting that every time a double hit is obtained it indicates the presence of psi?
Linda:
Construct validity means that you have a theoretical/hypothetical basis by
which psi would miss in this particular manner - i.e. that psi considers
numbers as occurring in a particular order and that there is no direction to
that order. None of that has been established. Psi could treat them as
symbols (the claim of "good misses" for Zener symbols, for examples), it
could order them but there is a direction to that order, or it could order
them in a non-linear fashion.
Suitbert:
Numerical distance is the theoretical basis by which psi would prefer misses in this particular manner.
What is the name of the theory which establishes the basis by which psi misses in this particular manner?
Linda:
That psi is interested in a numerical difference is a subjective judgement
on your part. How do you know it's not something like even vs. odd numbers, straight vs. curvy lines, mirror symmetry vs. rotational symmetry vs. asymmetry, etc.?
Suitbert:
These factors are irrelevant because they cannot give rise to the pattern that was found. Even if any of your alternative factors was effective, this would have to be explained. Do you have an explanation for an effect by even vs. odd numbers, straight vs. curvy lines, mirror symmetry vs. rotational symmetry?
The question isn't which explanation is consistent with the pattern. The question is what happens in the presence of psi?
Linda:
And example of an artefact would be some feature on the #3 balls which made it more likely that they would be drawn - someone accidentally added an extra set of #3 balls, they ran out of the regular balls and bought a new batch which feel slightly different, a different marker was used which left a different texture, etc. Since #3 draws were always a "good miss", an excess of #3 draws would make it look like there was an excess of "good misses".
Suitbert:
This I consider as an example for skeptical strategies in general: Inventing post hoc alternative conditions without any evidence and even without showing any obligation to provide plausible indications for their existence.
This wasn't meant as a possible explanation for the results. I was simply providing you with an example of an artefact, as you indicated that you did not understand what I meant.
Suitbert:
Sensory leakage? If it occurred at all, it might cause mixing up graphically similar digits, not numerically adjacent numbers.
Linda:
How do you know?
Suitbert:
Because numerical distance has nothing to do with graphical similarity.
How do you know sensory leakage mixes up graphically similar digits? This should be something which is determined empirically. For example, in memory tests, people mix up words which sound similar or they mix up words which have similar meaning, depending upon whether you are talking about short or long-term memory.
Linda:
How do you know?
Suitbert:
Because this is an either laughable idea or a pathological case. The thief avoiding the safe in favor of a neighboring door might have to be transferred to a psychiatric hospital.
How do you know it isn't simply your choice of analogy which is laughable or pathological?
Suitbert:
Lack of explanation of a phenomenon does not diminish available evidence of its existence.
But that's entirely irrelevant, isn't it? The issue isn't whether or not these observations exist, but whether any particular idea serves as a roughly necessary, sufficient and useful explanation for those observations. An explanation which doesn't constrain the results, i.e. any result will be taken as consistent with the idea, is not useful. Contrast this with the idea of Gravity which explains only a very specific set of results. If we observed a different set of results, then Gravity would need to be changed or discarded.
Linda
fls
17th June 2009, 10:39 PM
Linda:
This is an advantage that would accumulate. Sure, after the first draw it would be 0.204, but after the 6th draw it might be 0.227, which is higher than your measured hit rates.
Suitbert:
Your calculation is wrong since you presuppose what is improbable with astronomical degrees that the participant would draw the same number six times successively, with decreasing numbers of balls carrying number 6, in any event.
After six draws it is easily possible that one of the numbers has not yet been drawn. The probability for that number would then be 10/44, or 0.227.
Linda:
Were these experiments controlled or uncontrolled?
I keep forgetting to ask you...how did the 3 Ukrainian stars come to your attention?
Suitbert:
Tanya (Tatyana) Kolesnik, one of the three members of the Ukrainian family, was the experimenter. She obtained a degree in informatics, I learned to know her and her family during a stay in Kiev preparing for a scientific conference in Ukraine. I myself and independent skeptical experimenters tested her own and the other family’s psi powers during a visit of the family in Germany.
It seems that this method was more successful at identifying psi stars than your ball-drawing at home method? How did they come to realize their own psi abilities?
Suitbert:
“Dissipate the effect of the control” and “gain ordinary advantage” are ways of opposing to a prolific concrete idea by abstract semantics. As long as you do not spell out what you mean by “effect of control” and a “gain of ordinary advantage” I can hardly consider this seeming objection as worth any reply.
"Effect of control" would be the use of methods which reduce bias, provide an empirical basis for comparisons, and/or reduce the chance of intentional manipulation. Letting the participants select the bags can re-introduce a pattern to the selections and can allow for them to take advantage of sensory leakage - something which the controls are meant to reduce or eliminate.
"Gain of ordinary advantage" would be the gaining of information or knowledge (conscious or not) through ordinary means - vision, touch, hearing, memory, etc.
Linda
Uncayimmy
19th June 2009, 12:19 AM
Linda forwarded one of my posts to Suitbert, who requested this be posted on the forum.
"UncaYimmy:
If I wanted to cheat on that test, I would attempt to get the balls
loaded into the bag so that the numbers were grouped. By that I mean
the 1-balls would be in the left corner, the 2-balls next to them and
so forth with the 5-balls being on the right hand side. When
preparing the experiment, it would only be natural to group the balls
by number first to make sure you have the right counts. Putting the
balls in the bag as I describe would not be difficult at all if that
happened. We didn't see the balls being loaded. And as we have seen,
the "shuffling" was trivial at best.
Suitbert replies:
UncaYimmy's ball ordering idea would indeed explain the "good misses"
effect, if such ball ordering in the bag would take place. A priori this is
very improbable, shuffling effects are NOT trivial. At each trial the bags
must be turned around like a pancake when its sides on the pan are changed.
This can hardly be done with keeping the balls in place. Supposing the balls
would not get mixed, then the participant must always remember the changing
positions of 1 and 5 in the bag because they are reversed at each trial.
Another point is that at the beginning of each test series the numbers are
not in numerical order, UncaYimmy's ordering of 50 balls in the bag needs
time. An increase of hit numbers would be observable only towards middle or
end of one test session. But my analysis showed that individual hit rate
levels do not increase within test sessions. Hit probabilities of the last
10 trials of one run of 60 trials are not larger than for the first 10
trials.
We disagree on how effective the shuffling is. I can't see how flipping the bag like a pancake will do any shuffling at all considering that centrifugal force and friction will keep the balls from moving much. As for remembering which side is which, that's not hard at all.
But in the video in the OP, the subject never flips the bag like a pancake. He picks up the bag by the top. He shakes it up and down. There's no room for much shuffling. Had this been a box with lots of room for the balls to bounce around, I would consider that adequate shuffling. The sides of the bag keeps the balls in place. I find it hard to imagine that a ball would migrate from bottom to top or vice-versa.
As for the order of the balls in the bag at the start, I didn't see that happen. My conjecture is that the experimenter will arrange the balls in order before the test to make sure none are missing. It seems only prudent to ensure there is a full set. Perhaps the experimenter only counted to ensure there were 50 balls, which would presume that the balls were always kept secure from tampering.
UncaYimmy continues:
"I would also expect that 1-balls and 5-balls would be more likely to stay
in place due to friction against
the bag. The three remaining balls would probably shuffle around more. I
haven't verified this through experiment, but it seems reasonable. Thus, my
prediction is that highest accuracy will be 1-balls and
5-balls."
Suitbert's reply:
Thus, UncaYimmy predicts more hits when 1 or 5 is called compared with hits
when 2, 3, and 4 are called. This is not what occurs. Here are the
percentages of hits for 1 through 5 (from my standard sample of 238
participants):
1: 22.7% hits of 15,370 calls
2: 23.2% hits of 19,613 calls
3: 22.7% hits of 19,156 calls
4: 23.2% hits of 17,008 calls
5: 22.4% hits of 14,053 calls
I made a mistake in my spreadsheet. His numbers are correct. Mine are incorrect. I apologize for the mistake. Below are the correct numbers:
Ball|Correct|Called|Drawn
1|22.7|18.0|19.7
2|23.2|23.0|20.6
3|22.7|22.5|20.2
4|23.1|20.0|19.9
5|22.4|16.5|19.6
However, it's important to note that we are looking at aggregate numbers, not just the subjects with high scores, which is what my theory is based on. I suggested a possible way of cheating based with *one* particular way of loading the balls into the bag. A subject looking to cheat would not have to put the balls in the bag in the order 1 through 5. They could be placed in any order so long as decent semblance of grouping remained.
The "trick" is that if the balls are loaded with any type of grouping, the subject need only remember that grouping and call/draw accordingly. The shuffling appears woefully inadequate, and it hasn't been demonstrated to be effective. It is only assumed.
* How are the balls determined to be a full set before testing?
* How are the balls shuffled before being placed into the bag?
* Can the subject see the balls as they are being loaded into the bag?
* Did the subject ever load the bag?
* What happens between runs? Are the balls removed from the bag?
* Is this information recorded with each trial? If so, then it might be possible to see a correlation between these actions and the accuracy.
UncaYimmy also predicts: "I also predict that when a 1-ball or 5-ball is
called, I will be least likely to draw a ball from the opposite end. In
other words I call a 1-ball and reach into the left side of the bag. Very
few of the 5-balls on the right side of the bag will have migrated to the
left due to poor shuffling. This means that when calling a 1-ball and
reaching to the left, I would expect more 1-balls. ... And most
importantly, I expect to get relatively few 5-balls. Of course, when calling
5-balls, I expect the same distribution in reverse."
Suitbert: The results do not confirm UncaYimmy's idea who expected more
number 2, 3, and 4 than number 5 draws when number 1 is called and more
number 4, 3, and 2 than number 1 draws when number 5 is called.
In my post with the charts I said that it didn't appear to be statistically significant. Again, we are looking at aggregate numbers, not the numbers for high scorers. And my predictions presumed just one order in which they could be loaded. Any order would work.
UncaYimmy's explanation of the good misses effect thus does not hold. Can
anyone out there propose another explanation? I like tentative explanations
that can be tested empirically. Mere speculation is less productive.
My speculation error was compounded by errors in my spreadsheet that coincidentally reinforced my theory. Mea culpa.
Looking at the numbers, it appears there is consistent accuracy across all 5 balls. By my reckoning the effect size was 3% to 4% accuracy. The only thing unusual is that the 1 and 5 balls were called much less frequently than balls 2 and 3. Ball 4 was called about 20% of the time. Then again, humans are not known for being good random number generators.
All that said, there is insufficient data available to me (as far as I know) to form any theories or even to reject my theory about ball placement. What we're looking at is all of the data at once. What I want to see is the data for the people with effect sizes greater than the average.
For example, in the video the subject got 23 out of 60. That effect size is around 20% to 25%. That subject was not shuffling the balls adequately at all - he merely bounced the bag up and down a couple of times. I would want to see the results of his performance and those with bigger effect sizes in order to form a theory. Not only that, I'd like to see the actual data recorded (trial #, called, drawn).
Is that data available?
In the one video I saw with the inadequate shuffling, it looks to me like the subject is trying to peer into the bag. See the animated GIF below. If he only got a glimpse on occasion, that could explain the effect size.
http://picasion.com/pic10/f118f622762a3f75380bdd2f8ce3ad4d.gif
fls
30th June 2009, 03:16 PM
The pause in our discussion was entirely my fault. To continue:
Linda:
I have two questions for you. If you had not found the "good miss"
phenomenon, would you have included that in your results?
Suitbert:
Of course not, I would not have searched it. Why do you ask? You know the answer in advance.
Linda:
Testing multiple relationships and reporting only on those which are positive falsely increases the number of apparent relationships. It makes associations discovered in this manner unreliable.
Suitbert: You imply that I do that. Is it fine to put one’s criticism in a seemingly harmless dressing ? I do report missing relationships , whenever I expected them, above all relationships which come out with an unexpected direction, as you know.
You mention 5 'secondary psi effects' including the preference paradox (higher hit proportions with less preferred calls), deviant succession of drawn numbers (repetition avoidance), deviant sequences of hits and misses, etc. But you don’t report on all of them. For example, the data that you perform your 'good misses' analysis on does not demonstrate the preference paradox, but you do not mention that.
Linda:
You provided a reasonable explanation of what you expected to see. When you didn't find what you expected to see, all of a sudden the explanation changed to a convoluted pattern whereby psi-missing and psycho-analytic
defense mechanisms are brought into it. This was entirely post-hoc.
Suitbert:
„all of a sudden“…? Why do you apply such rhetorical spices?
I am trying to communicate the seriousness of my concerns. Sometimes the use of rhetorical devices helps to convey mood.
Linda:
What is the gold-standard measure of psi ability against which you measure double hit count? For example, if I want to regard spiral CT as a valid measure of the presence of pulmonary emboli, I measure it against pulmonary angiogram.
That isn’t a good example. I used a diagnostic test because the example was more clear, but then it became less relevant. An example that is more relevant is 'health'. One can compare something like "self-perceived health status" with the mortality rate to determine the validity of that measurement - the two measures are otherwise quite independent. If you need an example from behavioural sciences, then it would be something like 'intelligence'.
What I am looking for are things which are determined a priori - a particular measurement or value of a variable predicts the presence or value of another variable. The former is an independent variable and the latter is dependent. I think that what you are saying is that a prior deviation from MCE predicts a subsequent deviation from MCE? In that case, what I am interested in is the data which demonstrates this association - a positive predictive value or likelihood ratio or a Cronbach alpha (it seems that reliability speaks to validity in this case).
Linda:
Criterion validity in this case means that if we had a way of detecting whether or not psi was present, we would be able to show that good misses were present when psi was present and absent when psi was absent.
Suitbert:
I consider the correlation (.40) shown in Figure 1 as an indication of criterion validity in your sense.
Linda:
That is simply a correlation between double hit count and good misses. Are you suggesting that every time a double hit is obtained it indicates the presence of psi?
Suitbert: Why “every time”? You know that this is not the case. We are dealing with hit rates, not with indicidual hits. Why do you ask such question. Do you want me to say “no” to your question which sounds as if I had to acknowledge the truth of some of your criticisms?
I say this to make it clear that your approach does not demonstrate criterion validity. You use the correlation as though the presence of a double hit indicates the presence of psi. But you have not established the reliability and validity of hit rates as a measure of psi.
Ideally, if you wished to use hit rates as a measure of psi, you would set about establishing the scale. Under different circumstances, one could form ROC (receiver operating characteristics) curves for different hit rates using a direct measure. But we have no direct measure of psi. However, I think your idea of using subsequent deviations from chance as a way to confirm your choice is a reasonable place to start. As I mentioned earlier, the idea of reliability overlaps considerably with validity, when it comes to psi.
I don't get the impression that you have developed a scale to measure psi based on the test-retest reliability of various hit rates. If you have, then it is that scale which would form the basis for your criterion validity when assessing the validity of 'good misses'. By simply comparing 'good misses' to hits, you are probably comparing the two over a range which has no potential to discriminate. I would suspect that you would discover that something like a deviation of 10% from chance has a poor test-retest reliability, whereas a deviation of 50% has a moderate amount of reliability
Linda:
Construct validity means that you have a theoretical/hypothetical basis by which psi would miss in this particular manner - i.e. that psi considers numbers as occurring in a particular order and that there is no direction to that order. None of that has been established. Psi could treat them as symbols (the claim of "good misses" for Zener symbols, for examples), it could order them but there is a direction to that order, or it could order them in a non-linear fashion.
Suitbert:
This could be, yes. Am I responsible that we do not observe such things? Do I have to deal with missing results that no one expected except Linda post hoc who does not like the real observation that she, as I myself, had not expected.
I ask you these questions because you have simply made an observation – that people sometimes miss in a particular pattern and sometimes do not. You haven’t provided any particular hypothetical or theoretical reason to connect that to psi. There are other patterns in the data. For example, numbers 1 and 5 are called and drawn less often than the other numbers. If I hypothesize that Invisible Pink Unicorns consider odd numbers unlucky, does the relative paucity of 1’s and 5’s confirm my idea that IPU’s are responsible for the results?
Linda:
That psi is interested in a numerical difference is a subjective judgement on your part. How do you know it's not something like even vs. odd numbers, straight vs. curvy lines, mirror symmetry vs. rotational symmetry vs. asymmetry, etc.?
Suitbert:
These factors are irrelevant because they cannot give rise to the pattern that was found. Even if any of your alternative factors was effective, this would have to be explained. Do you have an explanation for an effect by even vs. odd numbers, straight vs. curvy lines, mirror symmetry vs. rotational symmetry?
Linda:
The question isn't which explanation is consistent with the pattern. The question is what happens in the presence of psi?
Suitbert:
You replace my concrete question with some deteriorating abstract question.
Is that not the question though? You shouldn’t be wondering whether you can design experiments which easily generate patterns. You should be wondering whether you can design experiments which pick up on psi.
Linda:
How do you know sensory leakage mixes up graphically similar digits? This should be something which is determined empirically. For example, in memory tests, people mix up words which sound similar or they mix up words which have similar meaning, depending upon whether you are talking about short or long-term memory.
Suitbert:
If the ball drawing process were supported by sensory leakage sensory discriminations were taking place, not memory processes.
Again, this was just an example of the type of information or research that is available to answer a different question, to help clarify what kind of information I am asking for from you.
Linda
© 2001-2009, James Randi Educational Foundation. All Rights Reserved.
vBulletin® v3.7.7, Copyright ©2000-2012, Jelsoft Enterprises Ltd.