Data were collected to discern environmental factors affecting health standards. For 21 small regions we have data on the following variables:
POP: population (in thousands),
VALUE: value of all residential housing, in millions of dollars; this is the proxy for economic conditions,
DOCT: the number of doctors,
NURSE: the number of nurses,
VN: the number of vocational nurses, and
DEATHS: number of deaths due to health-related causes (i.e., not accidents); this is the proxy for health standards.
The data are given in Table 8.27.
(a) Perform a regression relating DEATHS to the other variables, excluding POP. Compute the variance-inflation factors; interpret all results.
(b) Obviously multicollinearity is a problem for these data. What is the cause of this phenomenon? It has been suggested that all variables should be converted to a per capita basis. Why should this solve the multicollinearity problem?
(c) Perform the regression using per capita variables. Compare results with those of part (a). Is it useful to compare R2 values? Why or why not?