Forum Index Register Members List Events Mark Forums Read Help

 Welcome to the JREF Forum, where we discuss skepticism, critical thinking, the paranormal and science in a friendly but lively way. You are currently viewing the forum as a guest, which means you are missing out on discussing matters that are of interest to you. Please consider registering so you can gain full use of the forum features and interact with other Members. Registration is simple, fast and free! Click here to register today.

 Tags statistics

 16th July 2009, 10:57 AM #1 Deetee Illuminator     Join Date: Jul 2003 Posts: 3,790 Statistics help please I need guidance/help regarding some comparisons. Any advice much appreciated. I have a small group of 9 patients with an uncommon disease X. In 8 of them it seemed to be associated/triggered by a problem (Y) but in one case it seemed to be linked with a different problem (Z). Now Y happens quite commonly in the general "at risk" population of 600,000 people (its incidence is 500,000), but Z is rare (100). However, my sample is incomplete, and I don't know how many other cases of disease X are out there. Can I determine whether having Y or Z is a greater risk factor for developing disease X? What is the best way to compare, and what confidence limits would there be? __________________ "Reci bobu bob a popu pop." - Tanja "Everything is physics. This does not mean that physics is everything." - Cuddles "The entire practice of homeopathy can be substituted with the advice to "take two aspirins and call me in the morning." - Linda "Homeopathy: I never knew there was so little in it." - BSM Last edited by Deetee; 16th July 2009 at 10:58 AM.
 16th July 2009, 02:03 PM #2 ZeeGerman Muse     Join Date: Jan 2003 Location: Just far enough from Detroit that it's OK Posts: 784 It has really been a long time since I last used statistics but I think that you are looking for Bayes theorem: P(X|Y) = [P(Y|X)*P(X)]/P(Y) which gives you the probablility that a person will have disease X given that she shows problem Y. Substitute Z for Y and you could compare. You have two problems: From your sample, you could set P(Y|X) to 8/9 and P(Y) to 5/6 and P(Z|X) to 1/1 and P(Z) to 1/6000 But You don't show show data about P(X) i.e. how common disease X is given no other information. Your second problem is you sample size, especially the 1 case with Z. You simply can't do statistics with samples sizes of one Zee __________________ Wenn die Katze ein Pferd wäre, könnte man die Bäume raufreiten. Afta ol, ve arr frrom ze lend of tschoklet (The Simpsons "Das Kraftwerk")
 16th July 2009, 02:08 PM #3 ZeeGerman Muse     Join Date: Jan 2003 Location: Just far enough from Detroit that it's OK Posts: 784 ETA: P(Z|X) should be 1/9, not 1/1 but this doesn't help you either Come to think of it... Since P(X) is the same in both equations, you could substitute and get a relative comparison of P(X|Y) and P(X|Z) If I did my quick caluculation correctly, P(X|Y) = 625 times higher than P(X|Z) but again, you sample size makes this meaningless __________________ Wenn die Katze ein Pferd wäre, könnte man die Bäume raufreiten. Afta ol, ve arr frrom ze lend of tschoklet (The Simpsons "Das Kraftwerk") Last edited by ZeeGerman; 16th July 2009 at 02:20 PM.
 16th July 2009, 03:58 PM #4 Deetee Illuminator     Join Date: Jul 2003 Posts: 3,790 Originally Posted by ZeeGerman It has really been a long time since I last used statistics but I think that you are looking for Bayes theorem: P(X|Y) = [P(Y|X)*P(X)]/P(Y) which gives you the probablility that a person will have disease X given that she shows problem Y. Substitute Z for Y and you could compare. You have two problems: From your sample, you could set P(Y|X) to 8/9 and P(Y) to 5/6 and P(Z|X) to 1/1 and P(Z) to 1/6000 But You don't show show data about P(X) i.e. how common disease X is given no other information. Your second problem is you sample size, especially the 1 case with Z. You simply can't do statistics with samples sizes of one Zee Bit confused still.... The estimates are that disease X occurs in about 1 in every 125 of the overall population. Does that help? __________________ "Reci bobu bob a popu pop." - Tanja "Everything is physics. This does not mean that physics is everything." - Cuddles "The entire practice of homeopathy can be substituted with the advice to "take two aspirins and call me in the morning." - Linda "Homeopathy: I never knew there was so little in it." - BSM
 16th July 2009, 07:16 PM #5 Jorghnassen Illuminator     Join Date: Nov 2004 Location: The realm of ideas Posts: 3,881 Are you trying to make inference on incidence using only prevalent cases (because that's complicated...)? Wouldn't you need undiseased exposed (i.e. with Y or Z) to make odds ratios and things like that? And yeah, the sample is just too small for anything beyond unreliable point estimates. __________________ "Help control the local pet population: teach your dog abstinence." -Stephen Colbert "My dad believed laughter is the best medicine. Which is why several of us died of tuberculosis."- Unknown source, heard from Grey Delisle on Rob Paulsen's podcast
17th July 2009, 05:27 AM   #6
fls
Penultimate Amazing

Join Date: Jan 2005
Posts: 10,236
Originally Posted by Deetee
I need guidance/help regarding some comparisons. Any advice much appreciated.

I have a small group of 9 patients with an uncommon disease X.
In 8 of them it seemed to be associated/triggered by a problem (Y) but in one case it seemed to be linked with a different problem (Z).
Now Y happens quite commonly in the general "at risk" population of 600,000 people (its incidence is 500,000), but Z is rare (100).

However, my sample is incomplete, and I don't know how many other cases of disease X are out there.

Can I determine whether having Y or Z is a greater risk factor for developing disease X?

What is the best way to compare, and what confidence limits would there be?
What you want is a case-control study. Find 18 comparable controls from your patient population and then measure for the presence of Y and Z. Calculate the odds-ratio for each factor:

 Case No Case Exposed a b Not Exposed c d

You then convert this to a z-score by taking the ln of the OR and dividing by the SE (sqrt of (1/a+1/b+1/c+1/d)) and use the usual tests for statistical significance. The confidence interval is formed using the ln, but you can then take the anti-log to convert it to an interval that makes sense.

That tells you whether one or the other factor is associated with X and whether one or the other is significant.

If you want to compare the relative influence of Y and Z, use logistic regression (I presume you have a stats program?).

Linda
__________________
God:a capricious creative or controlling force said to be the subject of a religion.
Evidence is anything that tends to make a proposition more or less true.-Loss Leader
SCAM will now be referred to as DIM (Demonstrably Ineffective Medicine)
Look how nicely I'm not reminding you you're dumb.-Happy Bunny
When I give an example, do not assume I am excluding every other possible example. Thank you.

Last edited by fls; 17th July 2009 at 05:31 AM.

17th July 2009, 10:08 AM   #7
Deetee
Illuminator

Join Date: Jul 2003
Posts: 3,790
Originally Posted by fls
What you want is a case-control study. Find 18 comparable controls from your patient population and then measure for the presence of Y and Z. Calculate the odds-ratio for each factor:

 Case No Case Exposed a b Not Exposed c d

You then convert this to a z-score by taking the ln of the OR and dividing by the SE (sqrt of (1/a+1/b+1/c+1/d)) and use the usual tests for statistical significance. The confidence interval is formed using the ln, but you can then take the anti-log to convert it to an interval that makes sense.

That tells you whether one or the other factor is associated with X and whether one or the other is significant.

If you want to compare the relative influence of Y and Z, use logistic regression (I presume you have a stats program?).

Linda
You guys are just too much. Why can't I have some of your spare brain capacity?

If I make 2x2 tables can I not just run a Chi square with correction for small nos? I am afraid I'm rather lost with ORs and z-scores and the like.

And what if with the control samples one of the boxes comes up with a zero?
__________________
"Reci bobu bob a popu pop." - Tanja
"Everything is physics. This does not mean that physics is everything." - Cuddles
"The entire practice of homeopathy can be substituted with the advice to "take two aspirins and call me in the morning." - Linda
"Homeopathy: I never knew there was so little in it." - BSM

 17th July 2009, 10:38 AM #8 Jorghnassen Illuminator     Join Date: Nov 2004 Location: The realm of ideas Posts: 3,881 Originally Posted by Deetee If I make 2x2 tables can I not just run a Chi square with correction for small nos? I am afraid I'm rather lost with ORs and z-scores and the like. If you're only testing for Y as a factor, probably. Not for Z, because at 1 in 6000 in the general population, there's 95% chance you won't have any Z in 18 controls, as fls suggested. Quote: And what if with the control samples one of the boxes comes up with a zero? Technically, there's Fisher's exact test, but again, for Z you don't have the sample size to make any inference. Finally, because you have prevalent cases (subjects already have the disease at recruitment, thus have aged since onset), any association with between Y or Z and the disease might not be representative of incident cases. That is, prevalent cases tend to live longer than incident cases (the longer you have the disease, the more likely it will eventually be detected thus the more likely such a subject will be included in the sample), and what you observe may be association with longer disease duration rather than increased incidence. __________________ "Help control the local pet population: teach your dog abstinence." -Stephen Colbert "My dad believed laughter is the best medicine. Which is why several of us died of tuberculosis."- Unknown source, heard from Grey Delisle on Rob Paulsen's podcast
 18th July 2009, 04:18 PM #10 Beth Philosopher     Join Date: Dec 2004 Location: Flatland Posts: 5,307 Originally Posted by Deetee You guys are just too much. Why can't I have some of your spare brain capacity? If I make 2x2 tables can I not just run a Chi square with correction for small nos? I am afraid I'm rather lost with ORs and z-scores and the like. And what if with the control samples one of the boxes comes up with a zero? Linda's right about the 5 units per cell needed for the chi-square test to be reliable, but that rule only applies to the table of expected values. If your actual sample has zero that's not a problem. But when you compute the expected values, you need a value of least 5 per cell under the assumption of the null hypothesis. __________________ Beth "You are not the stuff of which you are made." Richard Dawkins, July 2005, 10:45 http://www.ted.com/talks/richard_daw..._universe.html

JREF Forum

 Bookmarks Digg del.icio.us StumbleUpon Google Reddit