[Get Solution]Data Preprocessing
Discussion 1:In todays world, data is being generated from various sources and in various formats; as the internet utilization is drastically increasing from different devices like sensors, cc cameras, laptops, workstations, tablets and iPads; the data available from internet is in unstructured formats and available in the form of text files, pdf files, images, videos, tweets and other formats (García, Luengo & Herrera, 2015). The collected is not normalized, clean, availability of incomplete data, de-normalized and unprocessed data. Using direct raw or unprocessed data produced false results and it is not useful for analytics.To process the data and used for the analytics, the quality of data is based on the three factors like accuracy, completeness, and consistency. Initially the data need to be accurate where the inaccuracy causes by human enters random data or chance of entering error data so incorrect and duplication of data causes inaccuracy in data processing. The other factor make sure is completeness where the incomplete data caused by data unavailability, and deleting consistent data. The third factor is consistency, to process the data in order to produce the analytical results maintaining the consistent data is one of the key factors.To perform various analysis where using processed data helps in generating various graphs and tables in decision making. The four stages that include preprocessing the data are data cleaning, data integration, data reduction and data transformation (Kamiran, & Calders, 2012). The first stage data cleaning involves identifying the missing values and eliminating noisy data. In order to remove noisy data different techniques used are binning, regression and outlier analysis. The second stage is data integration- data is being collected from various sources it is necessary to integrate the data to identify the related or correlated data. The third stage is data reduction- using different techniques data reduction helps in eliminating the duplicate data and reduces large volumes of data. Final stage is data transformation- data transformation helps in forming appropriate data in performing various algorithms and analytic techniques.ReferencesGarcía, S., Luengo, J., & Herrera, F. (2015). Data preprocessing in data mining (pp. 195-243). Cham, Switzerland: Springer International Publishing.Kamiran, F., & Calders, T. (2012). Data preprocessing techniques for classification without discrimination. Knowledge and Information Systems, 33(1), 1-33.Discussion 2:Why are the original/raw data not readily usable by analytics tasks?Raw data is usually dirty, inaccurate and misaligned. This means that it cannot be utilized in its raw format (Sharda et al., 2020). Moreover, raw data can be unstructured and overly complicated. This means that data analytics have to be performed to transform raw data into refined data (Sharda et al., 2020). Therefore, data analytics is a critical approach to transform raw data into refined data.What are the main data preprocessing steps?The process starts with data consolidation, which collects, selects and integrates data. It may involve filtering any unnecessary data before its adequately utilized. The next step data cleaning, which ensures that errors are removed from the data (Sharda et al., 2020). Moreover, in this step, data is usually imputed and eliminates any duplication of data. The third step, data transformation, involves standardization, where data is placed in a range between the smallest and largest data. Nevertheless, discretion involves the categorization of data into different classifications (Alasadi & Bhaya, 2017). In data transformation, there is the creation of different attributes of data. The last step in data preprocessing is data reduction, which ensures reduced dimension, reduced volume and balanced data (Alasadi & Bhaya, 2017). The last step ensures that there is no too much data, which may be challenging to handle.List and explain their importance in analytics.Data consolidation, the first step, is essential because it allows for data collection, selection and integration. In this step, all the unnecessary data is usually eliminated to ensure that only appropriate data is available (Losarwar, V., & Joshi, 2012). In data cleaning, data scrubbing is vital because it ensures that all the data with errors is removed. Moreover, the step ensures that there is a reduction in duplication, removing data redundancy. Data transformation enables easier categorization of data (Alasadi & Bhaya, 2017). This is important because when data is organized into categories, it can efficiently be utilized, which would be impossible when data is unstructured (Sharda et al., 2020). Data reduction enables data balancing to ensure that some of the data is not over or under-sampled. Therefore, the process of preprocessing is necessary for data analytics.
So much stress and so little time? We’ve got you covered. Get your paper proofread, edited or written from scratch within the tight deadline.