| JREF Homepage | Swift Blog | Events Calendar | $1 Million Paranormal Challenge | The Amaz!ng Meeting | Useful Links | Support Us |
![]() |
|
|
|
|||||||
| Notices |
| Welcome to the JREF Forum, where we discuss skepticism, critical thinking, the paranormal and science in a friendly but lively way. You are currently viewing the forum as a guest, which means you are missing out on discussing matters that are of interest to you. Please consider registering so you can gain full use of the forum features and interact with other Members. Registration is simple, fast and free! Click here to register today. |
|
|
#1 |
|
Philosopher
Join Date: Jun 2005
Location: Hyperion
Posts: 6,668
|
The Rationality Behind Ockham's Razor
So I was reading through Artificial Intelligence: A Modern Approach and I came across a proof of the rationality of the razor. If you are like me, and you have wondered why the simplest solution is usually the best choice, you might be interested in this:
Suppose we have a set of hypotheses, all of which match the data perfectly. In fact this set is infinite, but that is not important. Furthermore, suppose there is some metric that can be used to measure the "complexity" of the hypotheses, which loosely defined would be something like the number of terms in a polynomial for a data point fitting problem, or the number of steps in an alogrithm, etc. Thus, there is some discretization that can be applied and "complexity" can be reported as an integer. For some number n, there is a hypothesis of complexity n, that is the simplest hypothesis, I.E. for all hypotheses in the set their complexity is greater than or equal to n. Mathematically, it can be shown that as complexity increases, the number of possible hypothesis does not decrease. Mathematically, |{hypothesis with complexity m}| <= |{hypothesis with complexity m+}|. Now, out of an infinite set of hypotheses, which should we choose? Because as complexity increases the number of "apparently" correct hypotheses (those that match the data) increases, the chances of choosing the actually correct hypothesis (the one modeling the real process used to generate the data) at a given complexity level decrease. In mathematical terms, the prior probability of a hypothesis being true decreases as complexity increases. Because one should choose a hypothesis to maximize the combination of prior and posterior probability, and all hypotheses that match the data have a posterior probability of 1, one should base the choice on prior probability alone. In plain language, one should choose the hypothesis with the lowest probability of being wrong. Mathematically, this is always the least complex hypothesis that matches the data. |
|
|
|
|
#2 |
|
Opinionated Jerk
Moderator Join Date: Jul 2006
Location: New York
Posts: 11,882
|
Thanks. Very informative.
|
|
__________________
Follow me on Twitter! @LossLeader This force is receiving all the right to vote through the use of magic. - Miernik Wieslaw <NEW> VOTE FOR ME JUST BECAUSE <NEW> |
|
|
|
|
|
#3 |
|
Penultimate Amazing
Join Date: Jan 2005
Posts: 10,236
|
I don't understand this part. How can all hypotheses have a posterior probability of 1 if only one is actually correct?
ETA: Have I go the question right? "What is the probability this hypothesis is the actually correct hypothesis given 'this' level of complexity?" ETA2: I think that what is meant is that the likelihood ratio is 1? Linda |
|
__________________
God:a capricious creative or controlling force said to be the subject of a religion. Evidence is anything that tends to make a proposition more or less true.-Loss Leader SCAM will now be referred to as DIM (Demonstrably Ineffective Medicine) Look how nicely I'm not reminding you you're dumb.-Happy Bunny When I give an example, do not assume I am excluding every other possible example. Thank you. |
|
|
|
|
|
#4 |
|
Philosopher
Join Date: Oct 2007
Posts: 5,920
|
Shouldn't that be the least complex hypothesis that properly accounts for all relevent data/factors?
And the obvious problem with mistaking this assessing tool, for a divining rod of reality, is that we are almost never able to completely and accurately know all the relevent data/factors. So an Ockham based adjudgement should always conditional to the known information. |
|
|
|
|
#5 |
|
Muse
Join Date: Dec 2005
Location: Brooklyn
Posts: 666
|
I thought Occam's Razor postulated that the best explanation for phenomena is the one that requires the least amount of assumptions, not necessarily the simplest explanation. Am I wrong about that?
|
|
__________________
“The plural of anecdote is not evidence." --George Stigler "I am all in favor of a dialogue between science and religion, but not a constructive dialogue. One of the great achievements of science has been, if not to make it impossible for intelligent people to be religious, then at least to make it possible for them not to be religious. We should not retreat from this accomplishment." --Steven Weinberg |
|
|
|
|
|
#6 |
|
Philosopher
Join Date: Jun 2005
Location: Hyperion
Posts: 6,668
|
I probably have the terms wrong, sorry (I always f--- up these two terms..). Here is how I use them:
prior probability = mathematical probability that a hypothesis is correct in exclusion to all the others in its' level of complexity, which all else being equal is simply 1/<number of hypotheses in complexity level> posterior probability = probability that the hypothesis, if true, will lead to the observed data. |
|
|
|
|
#7 |
|
Philosopher
Join Date: Jun 2005
Location: Hyperion
Posts: 6,668
|
|
|
|
|
|
#8 |
|
Philosopher
Join Date: Jun 2005
Location: Hyperion
Posts: 6,668
|
It seems they are equivalent. By definition, an assumption doesn't influence the hypothesis in a way that can be tested; otherwise it is not an assumption. Thus assumptions simply increase the complexity of a hypothesis.
Thank you for bring up the question though, it made me think for a minute or two! |
|
|
|
|
#9 |
|
Kowalski
Join Date: Aug 2001
Location: gone
Posts: 9,286
|
I was thinking the same thing. The way I've always known it (and taught it) was that Occham's Razor gives you an efficient starting point for evaluating hypotheses to gain confidence in their being productive. Given two hypotheses, the one which can account for the most observations with the least number of assumptions has a higher chance of not encountering an observation which will falsify it. An assumption has a higher chance of being untrue than an observation, which means those hypotheses accounting for the most observations without needing assumptions to make it true will be more likely to be useful.
Then again, I don't think this 'proof' is incompatible with this view, ultimately. I just think it complicates it. Athon |
|
|
|
|
#10 |
|
Graduate Poster
Join Date: Nov 2007
Posts: 1,241
|
|
|
|
|
|
#11 |
|
Penultimate Amazing
Join Date: Jan 2005
Posts: 10,236
|
There would be some number of possible hypotheses that fit the data equally well (i.e. they are all equally ridiculous or equally non-ridiculous) at each level of complexity. This would include hypotheses we have thought of and those we haven't. Each level of complexity contains at least the same number of hypotheses as the one below it.
I haven't thought this thing through thoroughly, but it seems that, without any prior constraints on the level of complexity that the 'true' hypothesis will have, that even though it will be easier to find the 'true' hypothesis at the lowest level of complexity, that will only be the case if the 'true' hypothesis is at the lowest level of complexity. However, without prior constraints, it is more likely that one of the higher levels of complexity contains the 'true' hypothesis, just because the higher levels contain the greatest proportion of all hypotheses. I can't tell from the description in the OP how these two competing influences are reconciled to come up with the answer they give. Fortunately, I think the point is moot, though. Hypotheses seem to compete on the issue of explanatory power, not number of assumptions. Linda |
|
__________________
God:a capricious creative or controlling force said to be the subject of a religion. Evidence is anything that tends to make a proposition more or less true.-Loss Leader SCAM will now be referred to as DIM (Demonstrably Ineffective Medicine) Look how nicely I'm not reminding you you're dumb.-Happy Bunny When I give an example, do not assume I am excluding every other possible example. Thank you. |
|
|
|
|
|
#12 |
|
Sarcastic Conqueror of Notions
Join Date: Mar 2004
Location: A floating island above the clouds
Posts: 23,835
|
|
|
__________________
"Great innovations should not be forced [by way of] slender majorities." - Thomas Jefferson The government should nationalize it! Socialized, single-payer video game development and sales now! More, cheaper, better games, right? Right? |
|
|
|
|
|
#13 |
|
Muse
Join Date: Apr 2005
Posts: 818
|
|
|
|
|
|
#14 |
|
Philosopher
Join Date: Jun 2005
Location: Hyperion
Posts: 6,668
|
I think the structure of the hypothesis space puts some prior constraints into effect that negate this.
For instance, it seems to me that each complexity level is a subset of all the ones above it. As an example, a line is simply a 3rd degree polynomial with one coefficient set to zero, so an equivalent hypothesis could be put in both the 2nd degree and 3rd degree complexity levels. |
|
|
|
|
#15 |
|
Banned
Join Date: Mar 2007
Posts: 698
|
This is non sense.
Quote:
Quote:
So techniquely, this argument only works if we knew all the data and if we had infinte amout of hypotheses. Genius! I am totally convinced. Since, I can count to infinty I will now find the last digit of pi. It is 7. Also, can you define the concept simple and complex. It impossible to argue something when you use ill define concepts. Also, if all the hypotheses matched the data perfectly, wouldn't they be the same. As their would be no different, their would only be superficial difference. |
|
|
|
|
#16 |
|
Philosopher
Join Date: Jan 2006
Location: Vancouver BC Canada
Posts: 5,966
|
I agree. I think it's just reiterating the same thing in new language. The point of Ockham's Razor is to allocate resources, not answer actual questions.
Ockham used the expression 'entities,' as in "entities should not be multiplied unnecessarily." By calling it 'more complex' instead of 'more entities' is not really adding anything. It's really just a rule of thumb that enforces the somewhat conservative nature of science as a social phenomenon. It's a generalization of uniformitarianism and an application of induction. |
|
__________________
"Sometimes it's better to light a flamethrower than curse the darkness." - Terry Pratchett |
|
|
|
|
|
#17 |
|
Muse
Join Date: Apr 2005
Posts: 818
|
You should be more careful about what you say before you start flinging around insults. Your post was so stupid it made me fear for humanity.
Quote:
Quote:
Quote:
Quote:
Similarly, we currently say, based on the data, that the universe started by rapidly expanding and cooling. We don't say "A magical figure who loves us and speaks english and reads our minds and lives in some fairy realm beyond our understanding wished the universe into creation for his own unfathomable reasons. And the creator hates certain people" and we don't say that for the same reason we don't say a giant purple duck quacked the universe into existence. Such a theory might even fit the data, but it makes unnecisary assumptions, it's not as simple as the alternative. consider heliocentric theories versus geocentric theories. Both actually describe the data. One is far more simple. That is why we accept one and not the other. Then relativity is even more simple then that. It may seem complex to say speed and gravity warp time, but it is more simple then stating "the planets revolve around the sun in accordence with the laws of gravity, except Mercury, which decides for no reason to dance around every now and again." A problem that the bending of light and relativistic forces solved. |
|
|
|
|
#18 |
|
Graduate Poster
Join Date: Mar 2002
Location: San Jose, CA
Posts: 1,008
|
No, it isn't. What you really mean is "This doesn't make any sense to me." Just because you don't understand it does not make it nonsense.
Infinity is a powerful and useful concept. Mathematics is a very precise tool for helping us understand things, including the concept of infinity. Please review this post again after you have passed a course in analytical mathematics and you understand the following: The proof that the repeating decimal 0.999... equals 1, One infinite set can be larger than another infinite set, and The sum of an infinite number of terms can have a finite value. |
|
__________________
Infidel by Ayaan Hirsi Ali A powerful and moving story of a strong and courageous woman’s struggle to free herself from a culture that treats women as property. Despite repeated death threats from religious zealots, she campaigns tirelessly for the rights of Muslim women. A tearful, chilling, yet inspiring, tale of personal triumph and dedication to free expression. |
|
|
|
|
|
#19 |
|
Banned
Join Date: Mar 2007
Posts: 698
|
Quote:
Quote:
Quote:
Does the hypotheses have some randomness.
Quote:
Quote:
Again, if you connect the dots in every possible way, it would be superficial. Yes you would have different pattarns, but the pattarn will have the same structure, which is the dots. Again, structure gives something substances, not the outside. Even if has lots of lines drawn on it surface.
Quote:
Quote:
Quote:
Again, the big bang made a prediction, which was correct.
Quote:
Quote:
Quote:
Quote:
Again, you can say something simple by sneaking pass complex ideas. Relativity, well isn't relativity really QFT. Can you explain that in simple terms and explain the mathematics behind it. As I really want to understand the mathematics of relativity. It must be simple by Ockham's Razor.
Quote:
|
|
|
|
|
#20 |
|
Graduate Poster
Join Date: Mar 2002
Location: San Jose, CA
Posts: 1,008
|
Nonsense. Numbers do not occur in nature. But numbers are very useful things. The square root of negative one does not occur in nature, but it is very convenient for modeling some physical processes (ask any electrical engineer). Infinity is a very powerful and useful concept for understanding "real life".
|
|
__________________
Infidel by Ayaan Hirsi Ali A powerful and moving story of a strong and courageous woman’s struggle to free herself from a culture that treats women as property. Despite repeated death threats from religious zealots, she campaigns tirelessly for the rights of Muslim women. A tearful, chilling, yet inspiring, tale of personal triumph and dedication to free expression. |
|
|
|
|
|
#21 |
|
Philosopher
Join Date: Jun 2005
Location: Hyperion
Posts: 6,668
|
Is a straight line more simple than a cubic curve?
You don't, you pick the simplest one you know of, which is kind of the whole point. A straight line can generate an infinite amout of data, which is the set of "all possible data." A scientist might only have a few data points ouf of that set, which is "the data." It can -- the concept is not affected. It is much easier to understand with purely deterministic hypotheses, however. It is different depending on the problem. Clearly, if the hypothesis space is the set of all polynomials that satisfy the data, then complexity is related to the degree of the polynomial. You can extrapolate this to all kinds of hypothesis spaces, for example if they are all bayesian networks then the number of edges and nodes in the network are the primary factor. Not if the dots are only a subset of all possible data. Sure it does -- all of evolution is nothing more than a mechanism put in place and guided constantly by God. Prove me wrong. To non-thinking entities, yes what you say is true. To humans, however, who want to make predictions about the future, the difference between three dots generated by a straight line and three identical dots generated by a cubic curve are very important. Seriously, do you have any clue as to what you are babbling about? That is probably why I said the fact that we can generate infinitely many hypotheses doesn't matter -- as in, it is not an essential part of the argument. |
|
|
|
|
#22 |
|
Graduate Poster
Join Date: Apr 2004
Location: Yorkshire
Posts: 1,196
|
rocketdodger, this 'proof' is very interesting, and I'd say there's something in it, but as described by you it doesn't work - the handling of probability is wrong.
Scrub this paragraph – it's meaningless. There is no prior vs posterior probability, as we are not performing any test or operation that changes the probability. The prior probability (i.e. prior to some hypothetical test that could actually distinguish the single correct hypothesis) is what we want to determine.
Originally Posted by rocketdodger
Let's simplify. We divide all possible hypotheses (i.e. ones that fit the data) into groups - the rationale doesn't matter, but they must have differing numbers of members. The proof boils down to saying that we should choose from the smallest group. But why? The interesting thing is to look at the hidden assumptions. Now, if we simply assume each hypothesis is equally probable then the proof fails, because the categorisation becomes irrelevant (obviously, we can't assume that lower-complexity hypotheses are a priori more probable by Ockham's razor, as that's what we're trying to prove). My initial thought was that the proof would work, though, if for some reason it gets harder to create each hypothesis as group size increases (i.e. each potential one has a greater chance of being missed). You state that the larger groups are higher complexity than the smaller ones, so there does seem some reason to expect that these will also be the groups where hypotheses are missed. But on second thoughts, my argument seems wrong. It would still be the case that each discovered hypothesis had an equal probability. Was there anything to suggest the authors had this explanation in mind? On the other hand, we could assume that the probability is the same for each group. In this case, hypotheses in smaller groups obviously have a higher probability of being the correct one. But is it a reasonable assumption? At the very least, complexity would have to be a meaningful grouping parameter for independent reasons. Also, hypotheses at the same level of complexity would somehow have to 'pool' their probability. It occurs to me that this could have to do with information content and redundancy - perhaps they are more likely to contain the same incorrect information as each other. Again, these requirements seem not implausible, but would need to be proved. Note that the two possible explanations are not the same. The proof should make clear which one it's using. Hmm, needs more thinking about. |
|
__________________
I believe that economic advances merely provide the opportunity for a step forward which, as yet, hasn't happened. All we have done is to advance to a point at which we could make a real improvement in human life, but we shan't do it without the recognition that common decency is necessary. George Orwell |
|
|
|
|
|
#23 |
|
Banned
Join Date: Mar 2007
Posts: 698
|
Quote:
Quote:
Quote:
Quote:
Quote:
Quote:
Quote:
|
|
|
|
|
#24 |
|
Muse
Join Date: Apr 2005
Posts: 818
|
The truth of a hypothesis can be greater or lesser, relative to other hypotheses, yes.
Quote:
Quote:
Quote:
Quote:
Quote:
Quote:
Quote:
Quote:
Quote:
Quote:
Quote:
Quote:
Quote:
Quote:
Quote:
Quote:
Quote:
|
|
|
|
|
#25 |
|
The Infinitely Prolonged
Join Date: Feb 2006
Location: Westchester County, NY (when not in space)
Posts: 13,499
|
Strictly speaking, Occam's Razor has nothing to do with simplicity.
As applied to science, Occam's Razor is an economy of assumptions. It is NOT that the simplest answer is most likely right, it is which answer makes the fewest and most trivial of a priori assumptions, given the data we have available thus far. When godo opens up his Quantum Mechanics book, it is not simple. But, the information it contains fits what we have been able to determine about QM so far, without extraneous assumptions (that is, assuming it is a properly scientific book, and not one of those woo-woo ones). |
|
__________________
WARNING: Phrases in this post may sound meaner than they were intended to be. SkeptiCamp NYC: http://www.skepticampnyc.org/ An open conference on science and skepticism, where you could be a presenter! By the way, my first name is NOT Bowerick!!!! |
|
|
|
|
|
#26 |
|
Graduate Poster
Join Date: Jul 2005
Posts: 1,374
|
Good book. Written, IIRC, by my AI prof at Berkeley. One of the better classes I took.
Quote:
If you mean 'factual claim about the world' (eg "I have a brown-eyed wife"), then this is correct. If you mean 'hypothesis about how things work' (eg theism, gravity, etc), then it's inapplicable.
Quote:
Because it's <=, not <, and you don't get to escape that by bringing in another (IMO, invalid) argument about posterior probability. In plain language, this just means that less complex claims MIGHT be more likely. It's going to be easier to test, surely - less variables = fewer things to control for, fewer trials needed, etc - but not more likely to be true. P.S. If you've read the entirety of the book, I presume* you realize that 'complexity' is only loosely definable, and mostly only in rather limited domains. As applied to theology, 'complexity' definition is rather difficult, if at all possible, to do objectively. * Not certain that this discussion is actually in the book rather than limited to the extra material he presented in class... P.P.S. You do have the terms significantly wrong. Prior probability = P(X) Posterior probability = P(X|Y). |Y means "given that Y is true". E.g. the monty hall problem. Doors A,B,C; one has a prize behind it. Prior probability: P(A)=P(B)=P(C)=1/3. We choose A. Host opens door C, showing it has no prize. Let's call this X for simplicity*. Now we have posterior probability: P(A|X)=1/3. P(B|X)=2/3, P(C|X)=0. (Because the host necessarily would not have opened door A, X only is informative about B and C). *For a more elaborate explanation that doesn't collapse X, see http://en.wikipedia.org/wiki/Monty_H...yes.27_theorem. P.P.P.S. Which is more complex: gravity or intelligent falling? |
|
__________________
Friendly advice: if in an argument with me, don't make tons of fallacies or you'll just embarass yourself when I call you on 'em. Kthx! See my Youtube videos for more good argument, as well as bits about my various other interests, like ASL, cogsci, neurosci, meditation, cooking, etc. |
|
|
|
|
|
#27 |
|
Graduate Poster
Join Date: Jul 2005
Posts: 1,374
|
dp
|
|
__________________
Friendly advice: if in an argument with me, don't make tons of fallacies or you'll just embarass yourself when I call you on 'em. Kthx! See my Youtube videos for more good argument, as well as bits about my various other interests, like ASL, cogsci, neurosci, meditation, cooking, etc. |
|
|
|
|
|
#28 |
|
Graduate Poster
Join Date: Jul 2005
Posts: 1,374
|
dp
|
|
__________________
Friendly advice: if in an argument with me, don't make tons of fallacies or you'll just embarass yourself when I call you on 'em. Kthx! See my Youtube videos for more good argument, as well as bits about my various other interests, like ASL, cogsci, neurosci, meditation, cooking, etc. |
|
|
|
|
|
#29 |
|
Penultimate Amazing
Join Date: Jan 2005
Posts: 10,236
|
|
|
__________________
God:a capricious creative or controlling force said to be the subject of a religion. Evidence is anything that tends to make a proposition more or less true.-Loss Leader SCAM will now be referred to as DIM (Demonstrably Ineffective Medicine) Look how nicely I'm not reminding you you're dumb.-Happy Bunny When I give an example, do not assume I am excluding every other possible example. Thank you. |
|
|
|
|
|
#30 |
|
Graduate Poster
Join Date: Jul 2005
Posts: 1,374
|
|
|
__________________
Friendly advice: if in an argument with me, don't make tons of fallacies or you'll just embarass yourself when I call you on 'em. Kthx! See my Youtube videos for more good argument, as well as bits about my various other interests, like ASL, cogsci, neurosci, meditation, cooking, etc. |
|
|
|
|
|
#31 |
|
Philosopher
Join Date: Jun 2005
Location: Hyperion
Posts: 6,668
|
Thanks for all the replies, I see the many errors I made when I wrote the O.P.
To avoid any misunderstanding, it turns out the book was talking about actual A.I. and that it mentioned something like my argument "naturally leading into" Ockhams razor -- or in other words, a good way to emulate it in the A.I. world. I turned that into the mess you see (which isn't bad for a first try, I tell myself). However, it does give some insight into the matter, or at least reinforces the question "why." I am convinced the real reason why the razor is rational must involve something similar. Using a better definition now, given by (among others) wowbagger:
Quote:
Clearly, assumptions have a higher chance of being wrong than observations. However, making no assumptions is equivalent to simply zeroing out all possible assumptions. So one has to show that zeroing out assumptions is better than giving them some value. Superficially it seems like it wouldn't be, because after all choosing zero has just as much probability of being the wrong choice as choosing anything else. Thus at the heart of the matter I still think it has something to do with complexity levels and the probability of a hypothesis being correct. I will think about this today. |
|
|
|
|
#32 |
|
Graduate Poster
Join Date: Jul 2005
Posts: 1,374
|
Ockham's Razor is a rational measure of the utility of a theory. Simpler theories are easier to investigate, make stronger predictions (because fewer variables are unknown compared to previous data), and are simpler to calculate and model.
However, they are not more likely. This is one of the reasons, IMO, that complexity is so hard to define: it simply has no truly valid definition on the theoretical end (which, if it existed, might lead to a proof like you tried, of whether more complex things are a priori more likely to be false). But it DOES have easily and rigorously definable measures when we're talking about utility - e.g. how many lines of code; how many hours to program; how long to explain through interpretive dance; how many new words needed to explain it. These are all perfectly valid as utility functions on their domain, which is exactly what OR is good for. "Fewest assumptions" is, unfortunately, not a sufficiently sound definition. No matter what you are talking about almost (outside of fundamental set theory, perhaps?), if it's a real world question, you make a countably infinite number of assumptions. And not only that, it's not very well possible to delineate what constitutes "one" assumption. I could make a reductio ad absurdium on this using intelligent falling vs gravity, so that IF had fewer assumptions than gravity. Hell, IF only postulates one new 'entity' (sentient objects), whereas gravity postulates lots (atoms, weak magnetic force, quarks, etc etc etc). I challenge that you CANNOT in fact define complexity in a way that is objective, applicable to real world problems, completely sound and determined, and not merely a covert measure of utility. Without being able to do so, all of the rest of your argument falls apart because you will not be able to show which proposition involves A+B and which just A, so you won't know which side of the <= equation anything is. I suggest that you try looking at OR again as a measure of pragmatic scientific utility, rather than of truth value. I think you'll find it far more defendable and useful that way. |
|
__________________
Friendly advice: if in an argument with me, don't make tons of fallacies or you'll just embarass yourself when I call you on 'em. Kthx! See my Youtube videos for more good argument, as well as bits about my various other interests, like ASL, cogsci, neurosci, meditation, cooking, etc. |
|
|
|
|
|
#33 |
|
Sarcastic Conqueror of Notions
Join Date: Mar 2004
Location: A floating island above the clouds
Posts: 23,835
|
IF's "sentient objects" sounds like one thing, but what is that thing itself? It, in turn, postulates atoms, energies, etc., (or something even more exotic, some kind of "soul stuff") that make up the "sentient object".
Perhaps the point would be a little more obvious if, instead of "IF" for a "sentient object", you instead postulated "gigantic, invisible robotic Rube Goldberg thingie"-falling. A giant, mechanical device makes the complexity more obvious. |
|
__________________
"Great innovations should not be forced [by way of] slender majorities." - Thomas Jefferson The government should nationalize it! Socialized, single-payer video game development and sales now! More, cheaper, better games, right? Right? |
|
|
|
|
|
#34 |
|
Philosopher
Join Date: Jun 2005
Location: Hyperion
Posts: 6,668
|
Well that is what I was getting at, I hope I didn't give someone the wrong idea that I am arguing that the simplest hypothesis is the true hypothesis.
Still, aside from the pragmatic concerns of working with a less complex hypothesis potentially being easier and faster, isn't there still something to be said for simplicity? Lets concentrate on the simplest example, to see if I can make headway. Suppose we have a bunch of points and we are trying to fit an equation to model their distribution. Furthermore, suppose that in reality the points were sampled from a straight line using no noise. Now, the question is, why should we use the simple straight line hypothesis generated, rather than any of the high degree polynomial hypotheses that also fit the points perfectly? It seems to me that the answer must lie in the probabalistic relationship between possible hypotheses on different complexity levels (yes I know I shouldn't be using "complexity" but I can't help it!). In this case, though I have yet to prove it mathematically, it seems intuitive that the chances of many samples from a non-linear curve ending up completely colinear are very slim while the chances of many samples from a line ending up colinear are very high -- thus we should go with the line as our model. However, this idea seems to rely on the presupposition that the line was the correct choice all along (because we feel we are intuitively looking at linearity), and I don't yet know how to resolve that problem. |
|
|
|
|
#35 |
|
Graduate Poster
Join Date: Jul 2005
Posts: 1,374
|
rocket - I didn't think you're saying it is the true one, just that it's more likely. And even that, I think, is not true.
Simplicity is certainly a valuable thing - utilitarianly (which includes aesthetics). Don't get me wrong, I <3 simplicity and beauty in theories. Simple theories tend to be more powerful, give rise to more complex things, etc. Think e.g. Conway's Life or e=mc2. They're wonderful, and I think that they actually are literally better for us humans to work with for neurological reasons. I don't, honestly, understand your last paragraph. However, goodness of fit and the problem of overfitting vs precision of prediction is certainly an important one. Again though, I would say that this is utilitarian - and that you are a bit conflating things. Are you familiar, perhaps, with Bayesian networks? In addition to my AI class using AIMA (http://aima.cs.berkeley.edu/ btw), I took one about neural theory of language (http://www.icsi.berkeley.edu/NTL/), which I very highly recommend. It helped me think about things like this, and covered Bayes nets fairly well - including quite specifically this topic of goodness of fit. The ubersummarized version is that: a) you need to make a fit based on one sample and test it on another b) the more nodes & levels in a bayes net, the better the predictive ability but the harder it is for us to understand wtf it's doing c) there is a diminishing return on node addition d) tweaking the constants involved is a black art, and not to be attempted by the faint of heart (or possibly at all, because it's possible to set up a meta-bayes net to find out what the best constants are) e) there can be a problem with overfitting, which is solved by doing (a) repeatedly for multiple test sets until you get best average fit... but even so it's a very hard problem to do perfectly. Fortunately, "good enough" isn't as hard. B above directly contradicts your hypothesis that more complexity = worse predictiveness. However, it does support my position, that complexity is simply worse for us poor humans' ability to understand and test things. Simplicity is beautiful, grokkable, easy to maintain. Complex approaches (like backpropagating Bayes nets) are not... but they do work disturbingly well anyway, and there is a certain holistic beauty in that too. One other related thing is what in cogsci is called 'chunking'. I'd suggest you look it up (e.g. "the magic number 7+-2"), as it's fairly relevant. |
|
__________________
Friendly advice: if in an argument with me, don't make tons of fallacies or you'll just embarass yourself when I call you on 'em. Kthx! See my Youtube videos for more good argument, as well as bits about my various other interests, like ASL, cogsci, neurosci, meditation, cooking, etc. |
|
|
|
|
|
#36 |
|
Philosopher
Join Date: Jun 2005
Location: Hyperion
Posts: 6,668
|
I am saying, given a set of data points that are colinear, why should we decide to go with the hypothesis that they were sampled from a line rather than some extremely complex curve?
The answer, it seems to me, is because the probability of sampling colinear points from a line is high -- 1.0, in fact -- whereas the probability of sampling colinear points from a complex curve is much lower. This, in turn, suggests that the probability of the points actually having been generated by a linear function is much higher than the probability of them having been generated by a non-linear more complex one. I think this is just another way of measuring predictive power, right? If a hypothesis has a higher chance of generating new samples that will match new observations, then it has more predictive power, and we should choose it over those with less. In the case where we can't make new observations, and must decide on the basis of existing data, some kind of analysis like above should work. With respect to bayesian networks, I don't think it contradicts what I said (or rather, meant) about simplicity -- maybe it just makes it a moot point? Before I thought about it, I meant to say "among hypotheses of equal predictive power..." but now I realize that the mathematics I have been using to argue with actually concern predictive power. If two hypotheses have equal predictive power, then they should be equivalent as far as we can tell, and thus utilitarian concerns should be the only factor in deciding between them. |
|
|
|
|
#37 |
|
Graduate Poster
Join Date: Jul 2005
Posts: 1,374
|
I think that your colinearity probability is improper, because there are an infinite number of possible generative functions for any given set of points. Surely you're aware of that? Claiming it is "linear" is purely a matter of our own perception/analysis, unless you're specifying an infinite number of points - which you're not.
Thus there are an infinite number of both linear and nonlinear functions to match any set of "linear" points... so determining which generated it is extremely difficult if at all possible. This is even worse for something that's non-linear. As a related thought puzzle: Suppose you have a black box with a button on it. You press the button one million times and carefully observe that nothing happens. How do you know whether: a) the box does nothing; b) the box has been doing something that you cannot detect; or c) the box does something only on the million-and-tenth press? The answer is simply, you don't have any way to know. But for the sake of *utility*, we assume that it is (a). Predictive power, while sidestepping the question of whether you're right about the probability there, is a utility function again - not one of the truth value of the hypothesis. I think your move to predictive power has completely abandoned your original point, namely that complexity correlates to probability. If that's your intent, then you're now just making a utilitarian argument, and I think we've come to agreement about what OR is for.
|
|
__________________
Friendly advice: if in an argument with me, don't make tons of fallacies or you'll just embarass yourself when I call you on 'em. Kthx! See my Youtube videos for more good argument, as well as bits about my various other interests, like ASL, cogsci, neurosci, meditation, cooking, etc. |
|
|
|
|
|
#38 |
|
Philosopher
Join Date: Jun 2005
Location: Hyperion
Posts: 6,668
|
Yes I know there is an infinite number of generative functions, and I know the linearity is a matter of my own perception (thats what I was hinting at in a previous post regarding the problem of presupposing a hypothesis).
Still, I think my probability argument might hold some water, but I still need to formulate it properly -- I fully understand all of the problems you and everyone else have brought up. I will work on it this weekend and try to polish it up. Well I think I am now trying to argue that complexity correlates to probability, which correlates to predictive power. We do agree about OR, though (I think). |
|
|
|
|
#39 |
|
Philosopher
Join Date: Jun 2005
Location: Hyperion
Posts: 6,668
|
Ok, I think I have it!
Ignore the fact that "complexity" is extremely difficult to define -- just think of it for the purposes of this argument as "measure of assumption level." One can simply ask "what are the chances that a hypothesis on a given level generated the data that can also be modeled by hypotheses on other complexity levels?" Mathematically, the chances of a lower complexity hypothesis generating data that can be modeled by higher complexity hypotheses is very high. In contrast, the reverse is typically very low. For example, take a bunch of colinear points. There is a both a linear hypothesis and a high degree polynomial one. The chances of the high degree hypothesis generating points that can be fit by any linear function are slim. On the other hand, the chances of a linear hypothesis generating points that can be fit by any high degree polynomial are very high. Furthermore, we know that hypotheses on many complexity levels model the data -- thats why we are trying to choose among them to begin with. Thus, we should choose the hypothesis that has the highest probability of generating such data. For the reason above, it is always the lowest complexity hypothesis. P.S. I haven't figured out if this has anything to do with the fact that higher complexity levels have more possible hypotheses, I.E. the argument in the OP (which I now understand to be incorrect). |
|
|
|
|
#40 |
|
Graduate Poster
Join Date: Jul 2005
Posts: 1,374
|
I think you're making some pretty complicated errors there.
I'd prefer not to get into actual measure theory since it's a bit too complex for my taste. However, in plainer English there are these issues: 1. You assume that the measure of 'complexity' is some sort of polynomial equation over a set of real-number datapoints. Neither of these is the case for theories in general, such as what this is all about: theisms. 2. You confuse anterior and posterior probability, i.e. the probability that a) given a certain data set, the theories that produce it will turn out to be of a given "complexity"; and b) given a certain theory, the data will turn out as they happen to actually do. 3. You confuse what I would call "claims" (i.e. statements of fact about *what* is true) and "theories" (i.e. explanatory frameworks that try to say *why* and *how* things work). (I believe you can reasonably discuss the probability of a claim, but not the probability of a theory - only how well a theory matches available data.) 4. You continue to assume, despite your opening paragraph trying to brush it away, that there is some single, objective measure of "complexity" and that it is well-ordered - i.e. that some item A and some other item B are necessarily either A>B, A=B, or A<B on it. However, as I pointed out above, this is just not the case; there are many measures of complexity, and even within one it may be possible to rank two things in either way. All of these flaws are fatal to your argument, and since some of them are also fundamental to it, I don't think this is something you could fix. |
|
__________________
Friendly advice: if in an argument with me, don't make tons of fallacies or you'll just embarass yourself when I call you on 'em. Kthx! See my Youtube videos for more good argument, as well as bits about my various other interests, like ASL, cogsci, neurosci, meditation, cooking, etc. |
|
|
|
![]() |
| Bookmarks |
| Thread Tools | |
|
|