# Data Analysis in R

## Data Analysis in R

I dont understand this Computer Science question and need help to study.

**Data Analysis in R**

**Read**the income dataset, zipIncomeAssignment.csv, into R. (You can find the csv file in iLearn under the Content -> Week 2 folder.)- Change the column
**names**of your data frame so that*zcta*becomes*zipCode*and*meanhouseholdincome*becomes*income*. - Analyze the
**summary**of your data.What are the mean and median average incomes? **Plot**a scatter plot of the data.Although this graph is not too informative, do you see any outlier values?If so, what are they?- In order to omit outliers, create a
**subset**of the data so that: - Whats your new mean?
- Create a simple
**box plot**of your data.Be sure to add a title and label the axes. - Make a
*ggplot*that consists of just a scatter plot using the function*geom_point()*with position = *jitter*so that the data points are grouped by zip code.Be sure to use*ggplot*s function for taking the log_{10}of the y-axis data.(Hint: for*geom_point*, have*alpha*=0.2). - Create a new
*ggplot*by adding a box plot layer to your previous graph.To do this, add the*ggplot*function*geom_boxplot()*.Also, add color to the scatter plot so that data points between different zip codes are different colors.Be sure to label the axes and add a title to the graph.(Hint: for*geom_boxplot*, have*alpha*=0.1 and*outlier.size*=0). - What can you conclude from this data analysis/visualization?
- Discus challenges that you faced and strategies related to Data Analytics in R.

$7,000 < income < $200,000 (or in R syntax , income > 7000 & income < 200000)

HINT: Take a look at: https://www.tutorialspoint.com/r/r_boxplots.htm (specifically, Creating the Boxplot.) Instead of mpg ~ cyl, you want to use income ~ zipCode.

In the box plot you created, notice that all of the income data is pushed towards the bottom of the graph because most average incomes tend to be low.Create a new box plot where the y-axis uses a log scale.Be sure to add a title and label the axes. For the next 2 questions, use the *ggplot* library in R, which enables you to create graphs with several different types of plots layered over each other.

- Make a
*ggplot*that consists of just a scatter plot using the function*geom_point()*with position = *jitter*so that the data points are grouped by zip code.Be sure to use*ggplot*s function for taking the log_{10}of the y-axis data.(Hint: for*geom_point*, have*alpha*=0.2). - Create a new
*ggplot*by adding a box plot layer to your previous graph.To do this, add the*ggplot*function*geom_boxplot()*.Also, add color to the scatter plot so that data points between different zip codes are different colors.Be sure to label the axes and add a title to the graph.(Hint: for*geom_boxplot*, have*alpha*=0.1 and*outlier.size*=0). - What can you conclude from this data analysis/visualization?
- Discus challenges that you faced and strategies related to Data Analytics in R.

Use Promo Code: FIRST15

**FIRST15**and enjoy expert help with any task at the most affordable price.