[SOLUTION] Data Science For Business
Data Science for BusinessFinal projectDescription: The goal of this final project is to have students work as a team and demonstrate the ability to follow the main steps of a Machine Learning project and develop a Machine Learning model. Each group has to select a dataset either from the links that are provided, or you can find yours from any other resource. The final project includes the basic steps that require students to master data science skills to solve a multiclass classification problem. Students are expected to work as groups and each group member has to actively participate in the final project activities.Project Policy: It is likely that some members will not be active or not participate in the final project. However, this is NOT a legitimate reason for you not being able to turn in your final project on time. If you do care about your grade and project, you have to find a solution for this situation. In fact, the group project also can be done independently. I list the policies as follows:1. Each group consists of two students only.2. Students may opt out of a group and conduct the final project independently.3. Whatever option you choose, you must inform the instructor4. There is NO free ride for the final project. Each group member has to make equal contributions to the final project.5. Failure to participate in group discussions/meetings may result in a zero point of your final projects toward your final grade.6. Failure to present your final project successfully may lead to a lower grade for your final project.Getting the DatasetThe first task is to find a labeled dataset that can be used for multiclass classification problem (only two classes). In this project, the best choice is to have a real-world data, not just artificial datasets. In order for you to have the greatest chance of success with the final project, it is important that you choose a manageable dataset.1. The dataset should be readily accessible and large enough. As such, your dataset must have at least 100 records and between 2 to 6 measurements (exceptions can be made but you must discuss with me first).2. The dataset should be qualified for solving multiclass classification problem.3. The dataset must have two labeled classes only.4. To assist you in choosing a feasible problem and dataset, each team should check with the instructor by email about your project idea for approval by 04/18/2021.5. Informal discussions with the professor can help to refine the project.6. No two teams can work on the same dataset .7. Here are some links you can check to get data (but not limited to, so you can still do your search and find a dataset) :Popular open data repositories:UC Irvine Machine Learning Repository: datasets: https://www.kaggle.com/datasetsAmazons AWS datasets: Portals: Data Monitor: RequirementsRun the K-Nearest Neighbors model in Python to predict the class label from the different measurements in the dataset.1. Introduction: Start with an introduction of your project. This introduction should introduce (1) the problem you want to solve. (2) Dataset descriptions like the size, the number of measurements, the type of the measurements, and the number of classes and their labels.2. Load the data and discover & visualize it to get insights: generate graphs to discover if there is any relationship between measurements or find any clustering.3. Prepare the dataset: Do preprocessing if your dataset needs for example, dimension reduction, removing outliers, handling text and categorical variables, cleaning the data, and/or data standardization (all of the variables used for K-NN model must be on the same order of magnitude in order to produce accurate results.4. Data partitioning: After preprocessing your dataset, you need now to split the dataset into non-overlap sets to perform training and testing phases.5. Different values of K : Choose three different values of K. Discuss your reasons for choosing the different values of K.6. Training Phase : Run the model using the three different values of K you chose in the previous step. Discuss the three main steps in the K-NN algorithm: calculate the distance, find the nearest neighbors, and making predictions.7. Testing Phase :Compare the accuracy between the training phase and the testing phase. Discuss this results8. Evaluation Phase : Check the accuracy of all models predictions (the different values of K) by creating the confusion matrix, compute Recall score, and Precision score. Discuss the predictions results in terms of the accuracy and the misclassification error.9. Present the best model : choose the best model you found based on the results from the evaluation phase. Think of any improvement that can be made to get better results.10. Conclusion : Discuss your final results and conclusion about the model.Project ReportA narrative description of the all the machine learning model steps, provided with screen shots of the code and output.For every step in the project requirements list above do: (1) Discuss what you did. (2) Provide screen shots of the code. Provide screen shots of the output. (3) Provide any graphs if needs.Presentation Deliverables (PowerPoints slides)You should record your screen while you are doing the presentation and submit the recording. Every member of the group should participate in the recording. There are different options you can use. One of them is using Kaltura Recording. Your recording may be added to a folder on Canvas so all other students in the class can view it.Submission Checklist:1. Dataset file: original file and the modified one in case if you did any modifications.2. Python file (.py)3. Report document.4. PowerPoint slides.5. Recorded Presentation.All the above documents should be submitted on 05/02/2021 11:59 PM. Include all files in one folder and compress your folder (.zip)