MIS 690 Topic 3 CLC Data Cleansing and Data Summary
MIS 690 Topic 3 CLC Data Cleansing and Data Summary
This is a Collaborative Learning Community (CLC) assignment.
Now that you have identified the business problem, translated it into an analytics problem, identified the data needs, and acquired the data, you will use data that you have found (or with the companys permission you can use its data for analysis) to resolve the analytics problem. Using one or more of the following software applications (IBM SPSS Modeler, SPSS Statistics, Excel, PowerBI, Tableau, or R), analyze the data so that the findings can be used to address the established business problem in your company.
Conduct an exploratory data analysis and provide a draft outline describing the key features of the data and any significant relationships and information contained in the data set that you found. You are required to include specific screenshots of graphs, tables, etc., that are provided:
How did you verify that the data was reliable before proceeding?
What problems did you find and how did you address them?
What relationships did you find in the data?
Are there any missing data?
How are you going to summarize data samples?
Analyze trends with respect to any appropriate characteristics that you may have discovered. Include relevant line graphs, pie charts, bar charts, and scatter plots.
What have you done to prevent the Simpsons paradox?
Next, you will work on a descriptive analytics. Supplement your description with appropriate charts/figures and finalize by creating an appropriate dashboard with PowerBI or Tableau. Include a summary that provides a detailed overview of the data behavior you have identified based upon the analysis. Indicate any causal relationships you found.
Segment the data accordingly, if needed, to help describe the data behavior. Did you have to redo your sample? Can you identify any data anomalies? If there are anomalies, what do they represent and how do you avoid them?
Indicate the steps you have taken to investigate the quality of the data and indicate any variables you have transformed or discarded as a result.