PDA

View Full Version : Stats geek help: Reject Inference


Baron Samedi
4th May 2007, 07:36 PM
This is a general question more than anything else. If you guys can help out with your experince and thoughts, that would be super. I'm putting together a presentation next week about Reject Inference techniques in predictive modeling. I was just wondering if anyone has heard of this, and if so, what your general experiences have been. Do you think it's useful, or is it a complete waste of time? Or even if you are a stats geek and have no clue what Reject Inference is, that would be cool too. I just would like an idea if this project is worth it, or if I'm completely wasting my time on this one.

Cheers!

Jeff Corey
4th May 2007, 07:55 PM
Is this having to do with rejecting people's credit application?

Baron Samedi
4th May 2007, 08:05 PM
Yuppers, that's it exactly. Normally, you would model Y on the approves only. Reject Inference, you model Y on the approves, fit Y(hat) on the declines, and then build your final model on both Y(approves) and Y(hat)(declines). It's supposed to make your final model so much better.

Jeff Corey
4th May 2007, 08:25 PM
It sounds like signal detection theory problem but different terms.The person applies for a mortgage loan What is the probability that we give this person a loan and then they default? Based on what data we have, and whether we give a **** because the fed is going to bail us out if they all default.
Edit...
Wait, you're in Canada, different rules.

Mercutio
4th May 2007, 08:39 PM
I am good with signal detection, have taught stats... and have no clue what you are talking about. Sorry--I will keep watching, to see if there is anything I can comment on.

Jeff Corey
4th May 2007, 08:56 PM
Merc,
Loan application, Approve or no? Only three possible real world outcomes. Approved, paid off, no problem. Approved not paid off, problem, Not approved, no problem.
But the fourth, not approved but was really no problem, is a problem.

T'ai Chi
4th May 2007, 09:10 PM
This is a general question more than anything else. If you guys can help out with your experince and thoughts, that would be super. I'm putting together a presentation next week about Reject Inference techniques in predictive modeling. I was just wondering if anyone has heard of this, and if so, what your general experiences have been. Do you think it's useful, or is it a complete waste of time? Or even if you are a stats geek and have no clue what Reject Inference is, that would be cool too. I just would like an idea if this project is worth it, or if I'm completely wasting my time on this one.


Any project on any stats topic, especially applied, is worth it. :)

I've heard of reject inference, and believe the gains by it are real, but minimal. But any minimal gain when talking about dollars is worth researching.

The first thing that came to my mind to explore this type of problem is probably some type of logistic regression.

Mercutio
4th May 2007, 09:13 PM
Merc,
Loan application, Approve or no? Only three possible real world outcomes. Approved, paid off, no problem. Approved not paid off, problem, Not approved, no problem.
But the fourth, not approved but was really no problem, is a problem.
Thanks. Thus the SD application. Gotcha.

Baron Samedi
5th May 2007, 06:52 AM
Merc,
Loan application, Approve or no? Only three possible real world outcomes. Approved, paid off, no problem. Approved not paid off, problem, Not approved, no problem.
But the fourth, not approved but was really no problem, is a problem.

Exactly. By building a new model, you're going to start declining people whom you approved today. No problem. However, you're also going to start approving people you would have declined today. Problem, since we have no outcome information on these people. Or as someone wisely said, "We have known knows... we have known unknowns.... and we have unknown unknowns..."

We can just build a logistic-type model on the people we did approve, but two issues:
1) you're now extrapolating well outside your known sample space, which is a no-no
2) applying the model on the declines assumes that all things held the same, the declines behave like the approves. We know this isn't true, because we declined them in the first place, so we have sample bias out the wazoo.

Mortgages are easy. Approve everybody. If they default, you can just repo the house and sell off to your banker friends. There may be an issue if there was a grow-op going on, but these people usually do pay back their mortgages.

Credit cards... ay there's the rub. There's no federal bail-outs on these guys if they go bad.

The big scorecard vendors all do it. FICO, Experian, Equifax, TransUnion, and SAS all say that you must use reject inference or else you'll end up with a very biased model. They'll do it for you, but of course the price of the scorecard build now doubles.

On the other Hand, academics like David Hand say that the idea to somehow fix the bias is a good one, but that you end up spinning your wheels and it really doesn't add much value. Or, that it will add value on 50% of the cases, and lose value on the other 50% of the cases. Besides, there's no actual way to prove that the new reject inferenced model works better than the vanilla model, since you still don't know the outcome on all of those declines. You can look at the ROC and Gini on the approved people only, but that doesn't tell you a thing. His answer was that the only way to model on the declined region is to actually approve some of them randomly and find out two years from now which method works.

Sorry. I'm rambling. But it just seems suspicious to me that in this debate (RI or not to RI), the big supporters are the ones who stand to gain money from it, and the ones who have no monetary payout think that it's window dressing.

69dodge
6th May 2007, 01:11 PM
2) applying the model on the declines assumes that all things held the same, the declines behave like the approves. We know this isn't true, because we declined them in the first place, so we have sample bias out the wazoo.


Why did you decide to decline them in the first place?

If the decision was based on some real data---I don't know what kind of data it would be, but some sort of real data---then you should (continue to) take that same data into account when building any future models. But I don't see how you could magically create useful new information out of thin air.

You're not sure about somebody, but, based on the limited information you have, you think they might be too much of a risk. So you decide to decline them. Now, just because you declined them, you're suddenly more sure about them than you were before? How could anything like that possibly work?

On the other Hand, academics like David Hand say that [...] the only way to model on the declined region is to actually approve some of them randomly and find out two years from now which method works.


Makes sense to me.

(But I hadn't even heard of reject inference till now. So possibly my opinion adds little new information...)

Baron Samedi
7th May 2007, 07:36 AM
Why did you decide to decline them in the first place?

If the decision was based on some real data---I don't know what kind of data it would be, but some sort of real data---then you should (continue to) take that same data into account when building any future models. But I don't see how you could magically create useful new information out of thin air.

You're not sure about somebody, but, based on the limited information you have, you think they might be too much of a risk. So you decide to decline them. Now, just because you declined them, you're suddenly more sure about them than you were before? How could anything like that possibly work?




Makes sense to me.

(But I hadn't even heard of reject inference till now. So possibly my opinion adds little new information...)

Well, let me give you an example. Let's take number of bad trades on record on our applicants, and compare the bad rates after a year:

Previous Bad Trades Bad Rate
0 5%
1 7%
2+ 2%

Based upon this pattern, it looks like if the applicant has 2 or more bad trades, they're the lowest risk of the bunch, hence they should receive the highest scores of the bunch. This seems absolutely nutty, since if you have 5 previous bad trades, you should be worse than a person with a clean record. That's because the only people we would ever approve with 5 bad records would have to have $1M in the bank with us, or named Bush or Mulroney or some special case. So in any case, someone, somehow, has to magically bump up that 2+ bucket to a reasonable bad rate so that way the patterns appears to be monotoniously increasing.

And awesome. You asked the same "stupid" question that I asked before.
If the decision was based on some real data... then you should (continue to) take that same data into account when building any future models.
I haven't found any reasonable answer to this one yet.

69dodge
7th May 2007, 02:25 PM
Well, let me give you an example. Let's take number of bad trades on record on our applicants, and compare the bad rates after a year:

Previous Bad Trades Bad Rate
0 5%
1 7%
2+ 2%

Based upon this pattern, it looks like if the applicant has 2 or more bad trades, they're the lowest risk of the bunch, hence they should receive the highest scores of the bunch. This seems absolutely nutty, since if you have 5 previous bad trades, you should be worse than a person with a clean record. That's because the only people we would ever approve with 5 bad records would have to have $1M in the bank with us, or named Bush or Mulroney or some special case. So in any case, someone, somehow, has to magically bump up that 2+ bucket to a reasonable bad rate so that way the patterns appears to be monotoniously increasing.


The reason it seems nutty is that you're looking only at the number of previous bad trades people had, but ignoring how much money they had in the bank with you and what their last name is. If you thought the latter information was relevant when deciding whether to approve previous applicants, you should think it's relevant when deciding whether to approve future applicants, so you should take it into account, together with other relevant information like bad trades, when making future decisions. That isn't magic; it's merely not throwing away information you have, that you're pretty sure is relevant, just because it may seem less quantitative. (Well, last name isn't quantitative, at least. Money in the bank certainly is.) I can't see why anyone would want to throw it away and then use more or less ad hoc methods to try to make things come out reasonable-looking again.

(Naturally, when I say "last name", I don't mean that literally, as, I'm sure, you didn't either. I mean, who the person is---whatever information about them was used to decide that they should be approved even though they had many bad trades. If it's hard to quantify ... try anyway. It's better than simply throwing it away altogether.)