In: Statistics and Probability
An automotive insurance company wants to predict which filed stolen vehicle claims are fraudulent, based on the mean number of claims submitted per year by the policy holder and whether the policy is a new policy, that is, is one year old or less (coded as 1 = yes, 0 = no). Data from a random sample of 98 automotive insurance claims, organized and stored in InsuranceFraud , show that 49 are fraudulent (coded as 1) and 49 are not (coded as 0). (Data extracted from A. Gepp et al., “A Comparative Analysis of Decision trees vis-à-vis Other Computational Data Mining techniques in Automotive Insurance Fraud Detection,” Journal of Data Science, 10 (2012), pp. 537–561.)
Develop a logistic regression model to predict the probability of a fraudulent claim, based on the number of claims submitted per year by the policy holder and whether the policy is new.
explain the meaning of the regression coefficients in the model in (a).
Predict the probability of a fraudulent claim given that the policy holder has submitted a mean of one claim per year and holds a new policy.
At the 0.05 level of significance, is there evidence that a logistic regression model that uses the mean number of claims submit- ted per year by the policy holder and whether the policy is new to predict the probability of a fraudulent claim is a good fitting model?
Atthe0.05levelofsignificance,is there evidence that the mean number of claims submitted per year by the policy holder and whether the policy is new each makes a significant contribution to the logistic model?
Develop a logistic regression model that includes only the number of claims submitted per year by the policy holder to predict the probability of a fraudulent claim.
Develop a logistic regression model that includes only whether the policy is new to predict a fraudulent claim.
Compare the models in (a), (f), and (g). evaluate the differences among the models.