JREF Homepage Swift Blog Events Calendar $1 Million Paranormal Challenge The Amaz!ng Meeting Useful Links Support Us
James Randi Educational Foundation JREF Forum
Forum Index Register Members List Events Mark Forums Read Help

Go Back   JREF Forum » General Topics » Science, Mathematics, Medicine, and Technology
Click Here To Donate

Notices


Welcome to the JREF Forum, where we discuss skepticism, critical thinking, the paranormal and science in a friendly but lively way. You are currently viewing the forum as a guest, which means you are missing out on discussing matters that are of interest to you. Please consider registering so you can gain full use of the forum features and interact with other Members. Registration is simple, fast and free! Click here to register today.

Tags statistics

Reply
Old 16th July 2009, 10:57 AM   #1
Deetee
Illuminator
 
Deetee's Avatar
 
Join Date: Jul 2003
Posts: 3,790
Statistics help please

I need guidance/help regarding some comparisons. Any advice much appreciated.

I have a small group of 9 patients with an uncommon disease X.
In 8 of them it seemed to be associated/triggered by a problem (Y) but in one case it seemed to be linked with a different problem (Z).
Now Y happens quite commonly in the general "at risk" population of 600,000 people (its incidence is 500,000), but Z is rare (100).

However, my sample is incomplete, and I don't know how many other cases of disease X are out there.

Can I determine whether having Y or Z is a greater risk factor for developing disease X?

What is the best way to compare, and what confidence limits would there be?
__________________
"Reci bobu bob a popu pop." - Tanja
"Everything is physics. This does not mean that physics is everything." - Cuddles
"The entire practice of homeopathy can be substituted with the advice to "take two aspirins and call me in the morning." - Linda
"Homeopathy: I never knew there was so little in it." - BSM

Last edited by Deetee; 16th July 2009 at 10:58 AM.
Deetee is offline   Quote this post in a PM   Nominate this post for this month's language award Copy a direct link to this post Reply With Quote Back to Top
Old 16th July 2009, 02:03 PM   #2
ZeeGerman
Muse
 
ZeeGerman's Avatar
 
Join Date: Jan 2003
Location: Just far enough from Detroit that it's OK
Posts: 784
It has really been a long time since I last used statistics but I think that you are looking for Bayes theorem:

P(X|Y) = [P(Y|X)*P(X)]/P(Y) which gives you the probablility that a person will have disease X given that she shows problem Y. Substitute Z for Y and you could compare.



You have two problems:
From your sample, you could set P(Y|X) to 8/9 and P(Y) to 5/6 and P(Z|X) to 1/1 and P(Z) to 1/6000

But You don't show show data about P(X) i.e. how common disease X is given no other information.

Your second problem is you sample size, especially the 1 case with Z.
You simply can't do statistics with samples sizes of one

Zee
__________________
Wenn die Katze ein Pferd wäre, könnte man die Bäume raufreiten.

Afta ol, ve arr frrom ze lend of tschoklet (The Simpsons "Das Kraftwerk")
ZeeGerman is offline   Quote this post in a PM   Nominate this post for this month's language award Copy a direct link to this post Reply With Quote Back to Top
Old 16th July 2009, 02:08 PM   #3
ZeeGerman
Muse
 
ZeeGerman's Avatar
 
Join Date: Jan 2003
Location: Just far enough from Detroit that it's OK
Posts: 784
ETA: P(Z|X) should be 1/9, not 1/1 but this doesn't help you either

Come to think of it...
Since P(X) is the same in both equations, you could substitute and get a relative comparison of P(X|Y) and P(X|Z)
If I did my quick caluculation correctly, P(X|Y) = 625 times higher than P(X|Z) but again, you sample size makes this meaningless
__________________
Wenn die Katze ein Pferd wäre, könnte man die Bäume raufreiten.

Afta ol, ve arr frrom ze lend of tschoklet (The Simpsons "Das Kraftwerk")

Last edited by ZeeGerman; 16th July 2009 at 02:20 PM.
ZeeGerman is offline   Quote this post in a PM   Nominate this post for this month's language award Copy a direct link to this post Reply With Quote Back to Top
Old 16th July 2009, 03:58 PM   #4
Deetee
Illuminator
 
Deetee's Avatar
 
Join Date: Jul 2003
Posts: 3,790
Originally Posted by ZeeGerman View Post
It has really been a long time since I last used statistics but I think that you are looking for Bayes theorem:

P(X|Y) = [P(Y|X)*P(X)]/P(Y) which gives you the probablility that a person will have disease X given that she shows problem Y. Substitute Z for Y and you could compare.



You have two problems:
From your sample, you could set P(Y|X) to 8/9 and P(Y) to 5/6 and P(Z|X) to 1/1 and P(Z) to 1/6000

But You don't show show data about P(X) i.e. how common disease X is given no other information.

Your second problem is you sample size, especially the 1 case with Z.
You simply can't do statistics with samples sizes of one

Zee
Bit confused still....
The estimates are that disease X occurs in about 1 in every 125 of the overall population. Does that help?
__________________
"Reci bobu bob a popu pop." - Tanja
"Everything is physics. This does not mean that physics is everything." - Cuddles
"The entire practice of homeopathy can be substituted with the advice to "take two aspirins and call me in the morning." - Linda
"Homeopathy: I never knew there was so little in it." - BSM
Deetee is offline   Quote this post in a PM   Nominate this post for this month's language award Copy a direct link to this post Reply With Quote Back to Top
Old 16th July 2009, 07:16 PM   #5
Jorghnassen
Illuminator
 
Jorghnassen's Avatar
 
Join Date: Nov 2004
Location: The realm of ideas
Posts: 3,881
Are you trying to make inference on incidence using only prevalent cases (because that's complicated...)? Wouldn't you need undiseased exposed (i.e. with Y or Z) to make odds ratios and things like that? And yeah, the sample is just too small for anything beyond unreliable point estimates.
__________________
"Help control the local pet population: teach your dog abstinence." -Stephen Colbert
"My dad believed laughter is the best medicine. Which is why several of us died of tuberculosis."- Unknown source, heard from Grey Delisle on Rob Paulsen's podcast
Jorghnassen is offline   Quote this post in a PM   Nominate this post for this month's language award Copy a direct link to this post Reply With Quote Back to Top
Old 17th July 2009, 05:27 AM   #6
fls
Penultimate Amazing
 
fls's Avatar
 
Join Date: Jan 2005
Posts: 10,236
Originally Posted by Deetee View Post
I need guidance/help regarding some comparisons. Any advice much appreciated.

I have a small group of 9 patients with an uncommon disease X.
In 8 of them it seemed to be associated/triggered by a problem (Y) but in one case it seemed to be linked with a different problem (Z).
Now Y happens quite commonly in the general "at risk" population of 600,000 people (its incidence is 500,000), but Z is rare (100).

However, my sample is incomplete, and I don't know how many other cases of disease X are out there.

Can I determine whether having Y or Z is a greater risk factor for developing disease X?

What is the best way to compare, and what confidence limits would there be?
What you want is a case-control study. Find 18 comparable controls from your patient population and then measure for the presence of Y and Z. Calculate the odds-ratio for each factor:

CaseNo Case
Exposedab
Not Exposedcd

OR = ad/bc

You then convert this to a z-score by taking the ln of the OR and dividing by the SE (sqrt of (1/a+1/b+1/c+1/d)) and use the usual tests for statistical significance. The confidence interval is formed using the ln, but you can then take the anti-log to convert it to an interval that makes sense.

That tells you whether one or the other factor is associated with X and whether one or the other is significant.

If you want to compare the relative influence of Y and Z, use logistic regression (I presume you have a stats program?).

Linda
__________________
God:a capricious creative or controlling force said to be the subject of a religion.
Evidence is anything that tends to make a proposition more or less true.-Loss Leader
SCAM will now be referred to as DIM (Demonstrably Ineffective Medicine)
Look how nicely I'm not reminding you you're dumb.-Happy Bunny
When I give an example, do not assume I am excluding every other possible example. Thank you.

Last edited by fls; 17th July 2009 at 05:31 AM.
fls is offline   Quote this post in a PM   Nominate this post for this month's language award Copy a direct link to this post Reply With Quote Back to Top
Old 17th July 2009, 10:08 AM   #7
Deetee
Illuminator
 
Deetee's Avatar
 
Join Date: Jul 2003
Posts: 3,790
Originally Posted by fls View Post
What you want is a case-control study. Find 18 comparable controls from your patient population and then measure for the presence of Y and Z. Calculate the odds-ratio for each factor:

CaseNo Case
Exposedab
Not Exposedcd

OR = ad/bc

You then convert this to a z-score by taking the ln of the OR and dividing by the SE (sqrt of (1/a+1/b+1/c+1/d)) and use the usual tests for statistical significance. The confidence interval is formed using the ln, but you can then take the anti-log to convert it to an interval that makes sense.

That tells you whether one or the other factor is associated with X and whether one or the other is significant.

If you want to compare the relative influence of Y and Z, use logistic regression (I presume you have a stats program?).

Linda
You guys are just too much. Why can't I have some of your spare brain capacity?

If I make 2x2 tables can I not just run a Chi square with correction for small nos? I am afraid I'm rather lost with ORs and z-scores and the like.

And what if with the control samples one of the boxes comes up with a zero?
__________________
"Reci bobu bob a popu pop." - Tanja
"Everything is physics. This does not mean that physics is everything." - Cuddles
"The entire practice of homeopathy can be substituted with the advice to "take two aspirins and call me in the morning." - Linda
"Homeopathy: I never knew there was so little in it." - BSM
Deetee is offline   Quote this post in a PM   Nominate this post for this month's language award Copy a direct link to this post Reply With Quote Back to Top
Old 17th July 2009, 10:38 AM   #8
Jorghnassen
Illuminator
 
Jorghnassen's Avatar
 
Join Date: Nov 2004
Location: The realm of ideas
Posts: 3,881
Originally Posted by Deetee View Post
If I make 2x2 tables can I not just run a Chi square with correction for small nos? I am afraid I'm rather lost with ORs and z-scores and the like.
If you're only testing for Y as a factor, probably. Not for Z, because at 1 in 6000 in the general population, there's 95% chance you won't have any Z in 18 controls, as fls suggested.

Quote:
And what if with the control samples one of the boxes comes up with a zero?
Technically, there's Fisher's exact test, but again, for Z you don't have the sample size to make any inference.

Finally, because you have prevalent cases (subjects already have the disease at recruitment, thus have aged since onset), any association with between Y or Z and the disease might not be representative of incident cases. That is, prevalent cases tend to live longer than incident cases (the longer you have the disease, the more likely it will eventually be detected thus the more likely such a subject will be included in the sample), and what you observe may be association with longer disease duration rather than increased incidence.
__________________
"Help control the local pet population: teach your dog abstinence." -Stephen Colbert
"My dad believed laughter is the best medicine. Which is why several of us died of tuberculosis."- Unknown source, heard from Grey Delisle on Rob Paulsen's podcast
Jorghnassen is offline   Quote this post in a PM   Nominate this post for this month's language award Copy a direct link to this post Reply With Quote Back to Top
Old 18th July 2009, 05:28 AM   #9
fls
Penultimate Amazing
 
fls's Avatar
 
Join Date: Jan 2005
Posts: 10,236
Originally Posted by Deetee View Post
You guys are just too much. Why can't I have some of your spare brain capacity?

If I make 2x2 tables can I not just run a Chi square with correction for small nos? I am afraid I'm rather lost with ORs and z-scores and the like.
You can. The OR gives a useful measure of the strength of that association, which is the information that you are looking for.

Quote:
And what if with the control samples one of the boxes comes up with a zero?
I misinterpreted the part about the incidence of Z and missed that it was so low. As Jorghnassen pointed out, you probably won't find any incident examples of Z in your controls, which won't make it possible to do a case-control analysis for Z (the numbers should work for Y). You can do a Fisher's exact test (instead of a Chi-square test) when any of your cells have less than 5 cases (you can do a Fisher's exact test in any case, it's just that it moves you away from the realm of 'pencil and paper').

A rough rule of thumb, when you are dealing with anything rare, is to focus on collecting a group with the rarest factor. Is it possible to collect a group of people with Z for a retrospective cohort study? Alternatively, if you already have a good measure of the underlying incidence of these factors in your population, you could simply advertise for people with X and Z. If you get any additional cases, it gives you enough ammunition to make a more involved study worthwhile, since you really shouldn't have more than one person with both to begin with (specifically, 3 or more people with both would occur with a less than 5% probability based on the numbers you gave). If you don't get any additional cases, then it suggests you can drop the idea.

Linda
__________________
God:a capricious creative or controlling force said to be the subject of a religion.
Evidence is anything that tends to make a proposition more or less true.-Loss Leader
SCAM will now be referred to as DIM (Demonstrably Ineffective Medicine)
Look how nicely I'm not reminding you you're dumb.-Happy Bunny
When I give an example, do not assume I am excluding every other possible example. Thank you.

Last edited by fls; 18th July 2009 at 05:34 AM. Reason: clarification
fls is offline   Quote this post in a PM   Nominate this post for this month's language award Copy a direct link to this post Reply With Quote Back to Top
Old 18th July 2009, 04:18 PM   #10
Beth
Philosopher
 
Beth's Avatar
 
Join Date: Dec 2004
Location: Flatland
Posts: 5,307
Originally Posted by Deetee View Post
You guys are just too much. Why can't I have some of your spare brain capacity?

If I make 2x2 tables can I not just run a Chi square with correction for small nos? I am afraid I'm rather lost with ORs and z-scores and the like.

And what if with the control samples one of the boxes comes up with a zero?


Linda's right about the 5 units per cell needed for the chi-square test to be reliable, but that rule only applies to the table of expected values. If your actual sample has zero that's not a problem. But when you compute the expected values, you need a value of least 5 per cell under the assumption of the null hypothesis.
__________________
Beth
"You are not the stuff of which you are made."
Richard Dawkins, July 2005, 10:45

http://www.ted.com/talks/richard_daw..._universe.html
Beth is offline   Quote this post in a PM   Nominate this post for this month's language award Copy a direct link to this post Reply With Quote Back to Top
Reply

JREF Forum » General Topics » Science, Mathematics, Medicine, and Technology

Bookmarks

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump


All times are GMT -7. The time now is 07:38 AM.
Powered by vBulletin. Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.
© 2001-2012, James Randi Educational Foundation. All Rights Reserved.

Disclaimer: Messages posted in the Forum are solely the opinion of their authors.