You have a dataset about whether or not someone got in an accident that year. It is important for this insurance company to understand what features about a person or vehicle, makes them more likely to get in an accident.
Clean missing values first but no exploratory info needed. You need to predict the Outcome column..
6. Clean the dataset and discuss how you cleaned each of the variables with missing values and why you choose that method.
7. Build a logistic model and discuss the significant variables. Provide a table of all significant variables and their coeficients (a snippet of the data is not acceptable and if there are no variables at .05 or under, feel free to expand to .1). From your initial thoughts, which variable sticks out to you as intriguing that it is significant and why. How could this information be useful to the insurer.
8. Run a non-ensemble model (only ones used in class) using (1000 iterations). Address the accuracy of each model and why you choose that model. Which model is the most accurate and do you believe the improvement in accuracy is worth the lack of statistical significance that is lost from these optimized model?
9. Build a confusion matrix for each model, discuss which part of the confusion matrix that a company would want to reduce and which model does the best at doing so?
10. In two paragraphs minimum, discuss the features that were important and significant from the models. Use that to provide a recommendation to the insurance company of what they need to look out for the most when someone applies for car insurance with them and how can they reduce that risk when someone uses their insurance?
Order an Essay Now & Get These Features For Free: