Thursday, October 30, 2014

Lending Club Data - A Simple Logistic Regression Approach

This post is continuation of the Lending Club Data Analysis (Linear Regression Approach). I was going to start a new project to but I found a source that uses Lending Club Data to teach how to use IPython to develop a simple Logistic Regression model. I will be using R to develop a simple logistic regression model. First step is to clean data and understand data (Data Exploration). 


Lets assume Miss X, who is a computer scientist and a bike enthusiast, earns $6,500 a month and is interested in purchasing a performance bike that costs $15,000. She has a FICO score of 750. She wants to know if she can borrow $15,000 from Lending Club with interest rate 10% or less.

In the previous post I had already found significant variables and I will be using those variables to develop a simple logistic regression model:

Approval.Indicator = b0 + b1 * FICO.Mean + b2 * Amount.Requested + b3 * Monthly.Income

Since "loan approval with interest rate 10%" or less is not provided, I will "approval indicator" variable. 

# First add an indicator variable which indicates whether interest rate is <= 10
loanData$Indicator <- loanData$Interest.Rate <= 10
head(loanData)

summary(loanData)



sapply(loanData, sd)

# Fit a logit model using glm
logitModel <- glm(Indicator ~ FICO.Mean + Amount.Requested + Monthly.Income, data = loanData, family = "binomial")
summary(logitModel)
confint(logitModel)



For every positive unit change in FICO.Mean, the log Odds of loan approval with interest rate 10% or less increases by 0.07224

par(mfrow = c(2, 2))
plot(logitModel)



# Odds ratio and 95% CI
exp(cbind(OR = coef(logitModel), confint(logitModel)))


For one unit increase in FICO.Mean, the odds of loan approval with interest rate 10% or less increases by a factor of 1.0749

boxplot(predict(logitModel, type = "response") ~ loanData$Indicator, col = "blue")


# Choosing a cutoff(re-substitution)
temp <- seq(0, 1, length = 20)
err <- rep(NA, 20)

for (i in 1:length(temp)){
        err[i] <- sum((predict(logitModel, type = "response") > temp[i]) != loanData$Indicator)
}

plot(temp, err, pch = 19, col = "red", xlab = "Cutoff", ylab = "Error")


The error is minimum when Cutoff is approximately equal to 0.4, thus 

# Simple cutoff: Prob > 0.40 means loan approved, otherwise loan not approved.

Checking Model Performance
Performance <- predict(logitModel, type = "response") > 0.4
table(loanData$Indicator, Performance)





Now lets calculate the probability that Miss X's loan request for $15,000 from Lending Club with interest rate 10% or less will be approved or not, given her FICO score = 750 and monthly earning = $6,500

missX <- data.frame(FICO.Mean = 750, Amount.Requested = 15000, Monthly.Income = 6500)

predict(logitModel, newdata = missX, type = "response")


The resulting probability = 0.6464171 > 0.4, this means that Miss X's request for $15,000 from Lending Club with interest rate 10% or less will be approved!


7 comments:

  1. Hello Ankoor,
    The Article on Data Science A Simple Logistic Regression Approach is nice .It give detail information about it .Thanks for Sharing the information about it. hire data scientists

    ReplyDelete

  2. The development of artificial intelligence (AI) has propelled more programming architects, information scientists, and different experts to investigate the plausibility of a vocation in machine learning. Notwithstanding, a few newcomers will in general spotlight a lot on hypothesis and insufficient on commonsense application. IEEE final year projects on machine learning In case you will succeed, you have to begin building machine learning projects in the near future.

    Projects assist you with improving your applied ML skills rapidly while allowing you to investigate an intriguing point. Furthermore, you can include projects into your portfolio, making it simpler to get a vocation, discover cool profession openings, and Final Year Project Centers in Chennai even arrange a more significant compensation.


    Data analytics is the study of dissecting crude data so as to make decisions about that data. Data analytics advances and procedures are generally utilized in business ventures to empower associations to settle on progressively Python Training in Chennai educated business choices. In the present worldwide commercial center, it isn't sufficient to assemble data and do the math; you should realize how to apply that data to genuine situations such that will affect conduct. In the program you will initially gain proficiency with the specialized skills, including R and Python dialects most usually utilized in data analytics programming and usage; Python Training in Chennai at that point center around the commonsense application, in view of genuine business issues in a scope of industry segments, for example, wellbeing, promoting and account.

    ReplyDelete
  3. As such, an online college science course is the best way to learn the theory and application of the specific field of science you are interested in. artificial intelligence course in hyderabad

    ReplyDelete
  4. I extend my gratitude for the valuable information you have provided about the premier junior colleges in Hyderabad offering CEC programs. Thank you for sharing such insightful details!







    Best Juniour Colleges In Hyderabad For CEC

    ReplyDelete
  5. This comment has been removed by the author.

    ReplyDelete
  6. Great writing! You have a flair for informational writing. Your content has impressed me beyond words. I have a lot of admiration for your writing. Thank you for all your valuable input on this topic.
    Sap MM Training In Hyderabad

    ReplyDelete
  7. I want to express my appreciation for the useful details you have provided regarding the top junior colleges in Hyderabad that provide CEC programmes. I appreciate you sharing such useful information.

    Top CEC colleges in Hyderabad


    ReplyDelete