Simple R Studio Homework

Use the R studio to finish the homework, I’ll send the data with the homework as

STA 100 SSI 2018
Homework 1 – The book homework is for practice.
The R Homework is due Monday, July 1st on Canvas by 5pm.
Book Homework

  1. A student wants to estimate the number of UC Davis students who have gotten
    u shots in the last year. Below is a
    list of sampling methods. Explain one or two downfalls of their method.
    (a) They send an email to everyone in their current classes, asking for a response.
    (b) They interview people on the quad.
    (c) They stand inside the UC Davis Student Health Center, and interview people who enter the building.
    (d) They interview their friends and family only.
  2. For the following random variables, specify if they are nominal, ordinal, continuous, or discrete.
    (a) Number of outbreaks of pneumonia at UC Davis.
    (b) The titles of faculty members at UC Davis (Lecturer, Assistant Professor, Professor, etc.).
    (c) The shape of leaves of particular trees.
    (d) The width of a muskrats snout in cm.
  3. Identify two confounding variables for the following primary variable of interest (recall, confounding variable are ones
    that e ect the primary variable of interest, and are not recorded):
    (a) Exams score in STA 100. Only the name of the student and the exam scores are recorded.
    (b) Weight loss of a group of people. Only the primary mode of exercise and the weight loss are recorded.
    (c) Success in training a dog to dance. Only the method of training, and if the dog was able to dance are recorded.
    (d) Lung capacity. Only lung capacity, and the height of the subject was measured.
  4. The number of leaves that a type of tree shed in a week was recorded, with the following results:
    21; 42; 5; 11; 30; 50; 28; 27; 24; 52
    Use this sample data to solve the following:
    (a) Calculate the mean.
    (b) Calculate the median.
    (c) Calculate the variance.
    (d) Calculate the standard deviation and interpret it in terms of the problem.
  5. Continue with the data in Problem 4.
    (a) Calculate the rst, second, and third quartile.
    (b) Identify any outliers in the dataset, using the boxplot classi cation of an outlier.
    (c) Calculate the 30th and 90th percentiles.
    (d) Between what two values lies the middle 50% of the data?
    1
  6. Consider the following contingency table, in which two species of mice were tested for a speci c parasite:
    Infected Not Infected
    Species 1 18 6
    Species 2 10 15
    (a) Estimate the probability that a randomly selected mouse was species 1.
    (b) Estimate the probability that a randomly selected mouse was infected.
    (c) Estimate the probability that a randomly selected mouse was both infected and species 1.
    (d) Estimate the probability that a randomly selected mouse was not infected and species 2.
  7. Continue with the data from Problem 8.
    (a) If a mouse was species 1, what is the probability they were infected?
    (b) If a mouse was species 2, what is the probability they were infected?
    (c) What is the probability that an infected mouse was species 1?
    (d) What is the probability that an infected mouse was species 2?
  8. For a particular disease, the probability of the disease is 0.005. If someone has the disease, the probability they test
    positive is 0.93. If they do not have the disease, the probability they test negative is 0.98.
    (a) Estimate the probability someone both tests positive and has the disease.
    (b) Estimate the probability that someone tests positive.
    (c) Estimate the probability that if someone tested positive, they have the disease.
    (d) Estimate the probability that if someone tests negative, they do not have the disease.
  9. The number of moles on a persons face can be considered a discrete random variable, and assume it has the following
    distribution:
    Number Moles (Y ) 0 1 2 3
    Relative Freq 0.90 0.03 0.02 0.05
    (a) What is the probability that someone has more than 1 mole?
    (b) What is the probability that someone has between 0 and 2 moles (inclusive)?
    (c) Find the expected number of moles someone has.
    (d) Find the variance of the number of moles someone has.
  10. Hospital records show that for patients su ering from a certain disease, 75% die from it (regardless of the patient).
    Assume patients are independent, and that 6 patients are observed.
    (a) What is the probability that exactly 4 patients recover?
    (b) What is the probability that between 1 and 3 patients die (inclusive)?
    (c) What is the expected number of patients who die, and the standard deviation?
    (d) What is the probability that at least 2 patients die?
    2
    R Homework
    I. On Canvas, you will nd the dataset \Fitbit.csv” (in the folder Datasets). This dataset has the following columns:
    Column 1: Steps – The number of steps for that day.
    Column 2: Miles – The distance walked in miles for that day.
    Column 3: Floors – The number of
    oors climbed for that day (up or down).
    Column 4: Sleep – The number hours of sleep that for that night.
    Column 5: Day – The day of the week.
    Column 6: Month – The month of the year.
    (a) Find the mean, median, and standard deviation of the column for the number of steps.
    (b) Repeat (a) for the column for the hours of sleep.
    (c) Find Q1, Q2, Q3 for the column for the distance walked in miles.
    (d) Repeat (c) for the number of steps.
    II. Use the same data as in Problem I.
    (a) Find the ve number summary for hours of sleep.
    (b) Find mean for the distance walked in miles.
    (c) Find and interpret the standard deviation for the distance walked in miles.
    (d) Find and interpret the coecient of variation for the distance walked in miles.
    III. Use the same data as in Problem I.
    (a) Plot one boxplot for each day of the week for number of steps. This should be in one plot. Does it appear that one
    day is less active than other day? Explain.
    (b) Plot one boxplot for each day of the week for hours of sleep. This should be in one plot. Does it appear that one
    day allows for more sleep than other days?
    (c) Plot a histogram for the overall amount of miles traveled. Does it appear this data is skewed, or symmetric?
    Explain.
    (d) Repeat (c) for the daily amount of sleep in hours.
    IV. On Canvas, you will nd the dataset \work.csv” (in the folder Datasets)
    Column 1. obese: with levels \Overweight”, \Underweight”.
    Column 2. gender: with levels \Female”, \Male”.
    Column 3. marriage: with levels \Married”, \Divorced”, \Widowed”, \Never-Married”.
    (a) Create and display a bar plot for the variable marriage. Which category has the most people?
    (b) Create and display a barplot for the variables marriage and obese. Does it appear that there is relationship between
    obesity and marriage status? Explain.
    (c) Create and display a barplot for the variables gender and obese. Does it appear that one particular gender is more
    obese than the other? Explain.
    (d) Create and display a barplot for the variables gender and marriage. Does it appear that one particular gender is
    more often widowed than the other? Explain.
    3
    V. Using the dataset from problem IV, create a mosaic plot for the variables marriage and obese. Note, it may be helpful
    to look at two graphs, one with marriage on the x-axis, one with marriage on the y-axis. While they give the same
    information, sometimes it is easier to see via one plot over the other. You may pick which to display out.
    (a) Based on the mosaic plot, which sub group (i.e, married and overweight, widowed and underweight, etc) has the
    most people?
    (b) Based on the mosaic plot, are there more widowed overweight or underweight people?
    (c) Based on the mosaic plot, are there more divorced or widowed people?
    (d) Based on the mosaic plot, does there seem to be a relationship between marriage and obesity? Explain.
    4

Leave a Reply

Your email address will not be published.