Data Analysis in R

I don’t understand this Computer Science question and need help to study.

Data Analysis in R

  • Read the income dataset, “zipIncomeAssignment.csv”, into R. (You can find the csv file in iLearn under the Content -> Week 2 folder.)
  • Change the column names of your data frame so that zcta becomes zipCode and meanhouseholdincome becomes income.
  • Analyze the summary of your data.What are the mean and median average incomes?
  • Plot a scatter plot of the data.Although this graph is not too informative, do you see any outlier values?If so, what are they?
  • In order to omit outliers, create a subset of the data so that:
  • What’s your new mean?
  • Create a simple box plot of your data.Be sure to add a title and label the axes.
  • Make a ggplot that consists of just a scatter plot using the function geom_point() with position = “jitter” so that the data points are grouped by zip code.Be sure to use ggplot’s function for taking the log10 of the y-axis data.(Hint: for geom_point, have alpha=0.2).
  • Create a new ggplot by adding a box plot layer to your previous graph.To do this, add the ggplot function geom_boxplot().Also, add color to the scatter plot so that data points between different zip codes are different colors.Be sure to label the axes and add a title to the graph.(Hint: for geom_boxplot, have alpha=0.1 and outlier.size=0).
  • What can you conclude from this data analysis/visualization?
  • Discus challenges that you faced and strategies related to Data Analytics in R.

$7,000 < income < $200,000 (or in R syntax , income > 7000 & income < 200000)

HINT: Take a look at: https://www.tutorialspoint.com/r/r_boxplots.htm (specifically, Creating the Boxplot.) Instead of “mpg ~ cyl”, you want to use “income ~ zipCode”.

In the box plot you created, notice that all of the income data is pushed towards the bottom of the graph because most average incomes tend to be low.Create a new box plot where the y-axis uses a log scale.Be sure to add a title and label the axes. For the next 2 questions, use the ggplot library in R, which enables you to create graphs with several different types of plots layered over each other.

  • Make a ggplot that consists of just a scatter plot using the function geom_point() with position = “jitter” so that the data points are grouped by zip code.Be sure to use ggplot’s function for taking the log10 of the y-axis data.(Hint: for geom_point, have alpha=0.2).
  • Create a new ggplot by adding a box plot layer to your previous graph.To do this, add the ggplot function geom_boxplot().Also, add color to the scatter plot so that data points between different zip codes are different colors.Be sure to label the axes and add a title to the graph.(Hint: for geom_boxplot, have alpha=0.1 and outlier.size=0).
  • What can you conclude from this data analysis/visualization?
  • Discus challenges that you faced and strategies related to Data Analytics in R.
Read more
OUR GIFT TO YOU
15% OFF your first order
Use a coupon FIRST15 and enjoy expert help with any task at the most affordable price.
Claim my 15% OFF Order in Chat

Good News ! We now help with PROCTORED EXAM. Chat with a support agent for more information

NEW

Thank you for choosing MyCoursebay. Your presence is a motivation to us. All papers are written from scratch. Plagiarism is not tolerated. Order now for a 15% discount

Order Now