|This is an individual assignment. Any discussion of anything concerning this exam with anyone other than the instructor will constitute a violation of the GMU Honor Code. You may refer to books and notes during the exam but not any online resources. Please do not share this exam with anyone even after completion of the course. Good luck!|
To complete the exam, you will need the file Exam2Dataset.xlsx, which is also attached. Any preprocessing of datasets required to build models should be done in RapidMiner.
NoVACatalog is a software catalog company that sells games and educational software. It has recently put together a revised collection of items in a new catalog, which it is preparing to roll out in a mailing. In an attempt to grow its customer base, NoVA Catalog has recently joined a consortium of catalog firms that specialize in computer and software products. The consortium offers members the opportunity to mail catalogs to names drawn from a pooled list of customers. Members supply their own customer lists to the pool, and can withdraw an equal number of names each quarter. Members are allowed to do predictive modeling on the records in the pool so they can do a better job of selecting names from the pool. NoVA Catalog is entitled to draw 200,000 names for a mailing from a pool containing over 5,000,000 names.
NoVA would obviously like to select the names that have the best chance of performing well, so it conducts a test—it draws 2,000 names from the pool and does a test mailing of the new catalog to them. The data from this test mailing are in the NoVA Catalog worksheet. Along with some basic information about each individual (e.g., gender), the dataset contains information on the outcome of the mailing, i.e., whether the individual responded to the mailing, and if they did, how much they spent. (Note that responding to the mailer implies making a purchase.)
The descriptions of all the variables in the dataset are below:
|Sequence||Sequence number of the record.|
|SourceCode||It is an indicator used by the consortium to identify where the name was drawn from. That is, each code represents the source of the record. Detailed description of the sources is not disclosed by the consortium to the participating companies, however, some sources may contain more potential customers.|
|US_Address||Is the address of the customer a US address?||1: yes 0: no|
|Frequency||Number of transactions in last year.|
|WebOrder||Did the person place at least one order via web in the past?|
|Male||Is the person Male?||1: yes 0: no|
|Female||Is the person Female?||1: yes 0: no|
|Res_Address||Is the address of the customer a residential address?||1: yes 0: no|
|Responded||Did the person respond to the test mail?||1: yes 0: no|
|Spending||Indicates the amount (in US Dollars) that someone responding to the test mailing spent.|
Answer the following questions using the NoVA Catalog dataset.
1. What percentage of people who received the test mailings made a purchase? (2) 2. Of the customers who responded, what was the average purchase amount? (3)
3. As indicated in the table above, some sources (captured by the SourceCode attribute) may contain more potential customers. Which are the top three sources of responders, i.e., which three sources generated the maximum number of customers? (5)
4. Using the attributes that could be predictors, build a d e c i s i o n t r e e model i n R a p i d M i n e r that NoVA Catalog can use in the future to predict whether a catalog recipient will respond. What is the dependent variable you used for the model? What are the independent variables you used? (30)
a. Based on your model, what is the best predictor of response (i.e., whether an individual will make a purchase)? (4)
b. Evaluate the predictive accuracy of the model using appropriate metrics. (6)
- You must clearly answer all the questions and provide RapidMiner screen prints to support your answer for Q1, Q2, Q3, Q4a, and Q4b. You will only receive half the points without screen prints even if your answer is correct.
- You need to submit the RMP file for Q4 only. No need to submit RMP files for Q1, Q2, Q3.
Your submission is to be made via Blackboard. The responses to all questions should be submitted in a Word or Acrobat file. For each question, make sure you clearly answer the question that was asked.
In addition to the response file, please submit the RapidMiner process file as evidence of your model. That is, you need to submit the .rmp file only for Question 4.
Make sure you attach all the necessary files before submitting the Exam.
File Naming Guidelines
The response file should be named as lastname.x, where lastname represents your last name, and x the file extension. For example, if Elaine Marie Benes submits her HW in pdf, then the file should be named Benes.pdf.
Further, your full name must appear at the top of the response document (Word or pdf) itself. Do not compress any file.