Topic – Titanic: Machine Learning from Disaster
Part 1 – Proposal and Sample cases
a) Submit a proposal (no more than TWO pages), that includes
a brief description of the problem/opportunity
specific business objective(s) of your analysis
a brief explanation of the predictive modeling task(s)
potential dataset(s) that you plan to use and their sources
approximate number of cases in your dataset
approximate number of cases you plan to use for i) training and ii) validation
potential target/response/dependent variable(s)
potential predictor/explanatory/independent variables
data mining techniques (i.e., decision tree, logistic regression, neural network) that you are considering for the analysis
data mining software (i.e., SAS Enterprise Guide, SAS Studio, SAS Enterprise Miner, R) that you are considering for the analysis
Note: Your proposal should explicitly address each requirement listed above. Predictive modeling is required for the project. Do not submit a proposal that includes only descriptive and exploratory analysis.
b) Submit an Excel or CSV file containing a sample of 50 to 100 cases (with appropriate column headers) from your dataset.
If you plan to use competition or dataset from Kaggle (or, any other source) for your project, include the link (i.e., URL) to the competition/dataset. Repeating verbatim the text from the competition is plagiarism. Write the proposal in your own words.
Part 2 – Data (this is applicable only if you plan to use the on-demand version of Enterprise Miner)
To upload your project data set(s) to the SAS server, follow the instructions provided here:
Part 3 – Final Report
Submit a written report (12 pages excluding appendices) that includes the following:
executive summary of the project
business problem/opportunity (from the proposal)
specific business objective(s) (from the proposal)
process followed for selecting and gathering data
discussion of preliminary data exploration and findings
description of data preparation – repairs, replacements, reductions, partitions, derivations, transformations, and variable clustering
description of data modeling/analyses and assessments
explanation of model comparisons and model selection
conclusions and recommendations (i.e., what did you learn from the analysis; did you meet your stated business objective(s); how can the results of your analysis address the business problem/opportunity; what further analyses, that builds on your work, can be in done in the future)
Relevant output from your analyses should be included in the Appendix and referenced in the body of your report.