Labour Force Survey

only do question 3 4 5

3. This question requires you to download data obtained from Statistics Canada. If you are working on campus go to www.odesi.ca (off campus users must first sign into the McMaster library via libaccess at library.mcmaster.ca/libaccess, search for odesi via the library search facilities then select odesi from these search results). Next, select the “Find data” field in odesi and search for “Labour Force Survey

Don't use plagiarized sources. Get Your Custom Essay on
Labour Force Survey
Get a plagiarism free paper Just from $13/Page
Order Essay

June, 2020”, then scroll down and select the Labour Force Survey, June 2020 [Canada]. Next click on the “Explore & Download” icon, then click on the download icon (i.e., the diskette icon, square, along the upper right of the browser pane) and then click on “Select Data Format” then scroll down and select “Comma Separated Value file” (csv) which, after a brief pause, will download the data to your hard drive (you may have to extract the file from a zip archive depending on which operating system you are using). Finally, make sure that you place this csv file in the same directory/folder as your R code file (this file ought to have the name LFS-71M0001-E-2020-June_F1.csv, and in RStudio select the menu item Session -> Set Working Directory -> To Source File Location). There will be another file with (almost) the same name but with the extension .pdf that is the pdf documentation that describes the variables in this data set. Note that it would be prudent to retain this file as we will use it in future assignments (this question is worth 8 marks).

Next, open RStudio, make sure this csv file and your R Markdown script are in the same directory (in RStudio open the Files tab (lower right pane by default) and refresh the file listing if necessary). Then read the file as follows:

lfp <- read.csv(“LFS-71M0001-E-2020-June_F1.csv”)

This data set contains some interesting variables on the labour force status of a random subset of Canadians. We will focus on the variable HRLYEARN (hourly earnings) described on page 22 of the pdf file LFS-71M0001-E-2020-June.pdf. We will also consider other variables so that we can condition our analysis on these variables by restricting attention to subsets of the data, e.g., for full-time workers only (FTPTMAIN==1) reporting positive earnings. We also look at the highest educational attainment for people in the survey and consider both high school graduates (EDUC==2) and those holding a bachelors degree (EDUC==5). To construct these subsets we can use the R command subset as follows (the ampersand is the logical operator and – see ?subset for details on the subset command):

hs <- subset(lfp, FTPTMAIN==1 & EDUC==2 & HRLYEARN > 0)$HRLYEARN

ba <- subset(lfp, FTPTMAIN==1 & EDUC==5 & HRLYEARN > 0)$HRLYEARN

These commands simply tell R to take a subset of the data frame lfp for full-time workers having either a high school diploma or university bachelors degree for those reporting positive earnings, and then retain only the variable HRLYEARN and store these in the variables named hs (hourly earnings for high-school graduates) or ba (hourly earnings for university graduates). The following questions ask you to compute various descriptive statistics and other graphical summaries of these two variables.

Note that nothing will be printed out by running the two lines above – they simply create subsets of the data for subsequent use.

  1. Report the five number summary for each subset (hint: fivenum(hs) etc.). Indicate what each number tells us (hint: see help by typing ?fivenum in the console pane).

ii. What can you say about relative wages of high school and university graduates?

iii. Using Sturges’ rule, how many classes would you construct for the hs and ba wage data (hint – length() gives you the length of the vector, log10() may also be useful, so something like

round(1+3.3*log10(length(hs))) might do the trick for the hs data at least)? iv. Plot histograms for the hs and ba data on separate graphs (hint: hist()).

v. Do the number of classes correspond to Sturges’ rule?

vi. Plot density curves for the hs and ba data on the same graph and add a legend (hint: first use something like plot(density(…),col=”blue”,lty=1) (you need to fill in (…) parts with the name of your data object, e.g., hs etc.) then lines(density(…),col=”red”,lty=2), then see the help page by typing ?legend in the console pane. Note that you can add a legend using something like

legend(“topright”,c(“High School”,”University”),

lty=c(1,2),col=c(“blue”,”red”),bty=”n”)

vii. What do these density curves tell us about the distribution of hourly wages for high school versus university graduates?

4. Consider the following data on annual profits (in $millions of dollars) for all firms in the textbook publishing industry in Canada (ignore the ## [1] and ## [12] that appear at the beginning of each line; this is simply the way R displays a vector of numbers):

## [1] 7.20 8.85 17.80 10.40 10.60 18.60 12.30 3.67 6.57 7.77 16.10 11.80

## [13] 12.00 10.60 7.22

To set these values in a vector in R, if desired, you can use the command profits <- c(…) where … are the values above separated by commas, e.g., profits <- c(3.67, 6.57, etc.)

i. How many observations are there (i.e., what is n, the sample size?)

ii. What is the minimum, maximum, and range?

iii. How many classes would you create if you used Sturges’ rule?

iv. What are the class widths and class boundaries based on your answers to the previous two questions, using Sturges’ rule, the sample minimum as the first lower class boundary, and the sample maximum as the last upper class boundary?

v. Complete the table below showing the absolute frequency, relative frequency, cumulative frequency, and cumulative relative frequency for the above data. For this question you will need to do some manual data entry in the table skeleton provided below after you have figured out what the counts are based on your answers to the previous set of questions. In particular, you are to use Sturges’ rule (above) to obtain the desired number of classes, and use the range of the data (above) when constructing your class boundaries (note that you need to have a blank line between each new row that you add to the table, and the last class must be closed at the right – this question is worth 8 marks).
Cumulative Cumulative Absolute Relative Absolute Relative
Class Frequency Frequency Frequency Frequency
[…,…) … … … … […,…) … … … … […,…] … … … …

5. Since we use the summation operator (Σni=1) often in class, let’s make sure we understand how to calculate objects that can be expressed succinctly using this operator.

i. Care must be exercised when expanding certain sums and quantities. Let the sample size be n=3, and letX1 =1,X2 =−1, andX3 =3. Demonstrate in R that it is generally not true that ni=1 Xi2 = ( ni=1 Xi)2 (this question is worth 2 marks).

ii. Using the same data as in the previous question, compute the sample mean X ̄ = ni=1 Xk/n then compute the sample standard deviation σˆ = ni=1(Xi − X ̄)2/(n − 1) in two ways: longhand
(you can use R and use longhand notation, e.g., X[1], X[2], and X[3] or 1, -1, and 3, whichever you prefer), then using R functions such as mean() and sd() (this question is worth 2 marks).

iii. Express ni=1 K, where K is a constant (i.e., a number that does not change hence has no subscript i), in terms of n and K only (Hint – a constant does not have a subscript as it does not change with i, but it is being added/summed, so type out a string of n constants etc.). Then for K = 3 and n = 5 determine ni=1 K using your result purely using n and K (i.e., without a summation sign – this question is worth 2 bonus marks, and you do not use R, rather use your powerful sense of logic and type out your answer with an explanation).

Master Homework
Order Now And Get A 20% Discount!
Pages (550 words)
Approximate price: -

Advantages of using our writing services

Custom Writing From Scratch

All our custom papers are written by qualified writers according to your instructions, thus evading any case of plagiarism. Our team consists of native writers from the USA, Canada, and the Uk, making it convenient for us to find the best to handle your order.

Unlimited Free Revisions

If you feel your paper didn't meet all your requirements, we won't stop till it's perfect. You're entitled to request a free revision within 7 days after we submit your paper.

Quality Writing In Any Format

If you have issues with citing sources and referencing, you need not worry. Our writers are highly knowledgeable in referencing, including APA/MLA/Havard/Chicago/Turabian and all other formatting styles.

Fast Delivery And Adherence To The Deadline

All our custom papers are delivered on time, even the most urgent. If we need more time to perfect your paper, we may contact you via email or phone regarding the deadline extension.

Originality & Security

At Master Homework, your security and privacy is our greatest concern. For this reason, we never share your personal information with third parties. We use several writing tools to ensure your paper is original and free from plagiarism.

24/7 Customer Support

Our agents are online 24 hours a day, 7 days a week, and are always ready to serve you. Feel free to contact us through email or talk to our live agents whenever you need assistance with your order.

Try it now!

Calculate the price of your order

We'll send you the first draft for approval by at
Total price:
$0.00

How it works?

Follow these simple steps to get your paper done

Place your order

Fill in the order form and provide all details of your assignment.

Proceed with the payment

Choose the payment system that suits you most.

Receive the final file

Once your paper is ready, we will email it to you.

Our Services

We work nonstop to see the best client experience.

Pricing

Flexible Pricing

We offer pocket-friendly prices that coincide with the preferred client's deadline.

Communication

Admission help & Client-Writer Contact

Our support team is always ready to ensure vital interaction between you and the writer whenever you need to elaborate on something.

Deadlines

Paper Submission

We deliver our papers early within the stipulated deadlines. We are glad to help you if there should be an occurrence of any alterations required.

Reviews

Customer Feedback

Your review, positive or negative, is of great concern to us and we take it very seriously. We are, consequently adjusting our policies to ensure the best customer/writer experience.