Assignment 3: R base graphics, statistical functions
Due by 5:00 PM on Wednesday, November 9, 2022
To do yourself
Subscribe to R-bloggers mailing list or RSS feed
Watch playlists/videos from Brooke Anderson’s channel, as well as her R Programming for Research course
Install swirl R package
- For statistical functions, install and run “Statistical Inference”, “Regression Models” courses (Installation instructions)
- Explore other swirl courses
read.table, readr, Excel, SAS, SPSS, file/folder structure/paths
Download or clone the RGraphics repository, and run through code in the R_graphics_workshop.R until
ggplot2
part (line 440)
To submit on Canvas
- Install package “babynames”. Include your code and results to answer the following questions:
- How many variables are in the
babynames
object?
- What their data types?
- Using base graphics, plot the proportion of your name usage vs. year. Choose closest or random name, if your exact spelling is not available. Make plots using points, lines, barplots. Add color.
- What is the most and least popular names starting with the first letter of your name (or second, third, etc., if no match with the first letter) and matching your sex? How is your name matches within this ranking (approximately)? (Hint: Use
agrep
function to find the closest match to your name if the perfect match cannot be found)
- Load in the ToothGrowth dataset, which comes with R.
- Produce a boxplot of the tooth length variable, with x- and y-axis labels as well as a title.
- Change the graphics parameters to a window of one row and two columns of plots. Then, populate the first one with a scatterplot of dose on the x-axis and length on the y-axis, only using observations with a VC supplement type. Populate the second one with a scatterplot of dose on the x-axis and length on the y-axis, only using observations with an OJ supplement type. Provide informative axis labels. Use filled-in circles as the plotting character in the first plot, and filled-in triangles as the plotting character in the second plot.
- Fit a linear regression model of tooth length on the variables ‘supp’ and ‘dose’. Change the graphics parameters back to just one plot, not several. Plot only the QQ-plot of residuals from this regression model, filled in with blue circles.
- Set a seed to ensure reproducibility. Then, generate n=100 random variables from an Exponential distribution with lambda=5 parameter. Plot the empirical CDF (cumulative distribution function) of this data with informative axes labels. Make a really long title - so long that you must use a special character to get the title to go on two lines.
- Submit your homework as a PDF or Word document (the
LASTNAME-FIRSTNAME-HW03.docx
file).