Saturday, April 13, 2013

R - Hardin/Hilbe Generalized Linear Models and Extensions Chapter2 - R code

various choices of variance functions


  • function in 2.2 to illustrate the the effect of the assumed variance function (using the identity link for all functions)
x <- 1:3
y <- c(1,2,9)
glm(y~x,family=gaussian(link=identity))
Call:  glm(formula = y ~ x, family = gaussian(link = identity))

Coefficients:
(Intercept)            x  
         -4            4  

Degrees of Freedom: 2 Total (i.e. Null);  1 Residual
Null Deviance:      38 
Residual Deviance: 6    AIC: 16.59
glm(y~x,family=poisson(link=identity))
Call:  glm(formula = y ~ x, family = poisson(link = identity))

Coefficients:
(Intercept)            x  
       -2.4          3.2  

Degrees of Freedom: 2 Total (i.e. Null);  1 Residual
Null Deviance:      9.052 
Residual Deviance: 1.69         AIC: 14.36
glm(y~x,family=Gamma(link=identity))
Call:  glm(formula = y ~ x, family = Gamma(link = identity))

Coefficients:
(Intercept)            x  
     -1.799        2.744  

Degrees of Freedom: 2 Total (i.e. Null);  1 Residual
Null Deviance:      2.537 
Residual Deviance: 0.4385       AIC: 14.6
glm(y~x,family=inverse.gaussian(link=identity))
Call:  glm(formula = y ~ x, family = inverse.gaussian(link = identity))

Coefficients:
(Intercept)            x  
     -1.370        2.353  

Degrees of Freedom: 2 Total (i.e. Null);  1 Residual
Null Deviance:      0.8611 
Residual Deviance: 0.1181       AIC: 13.48

model on page 16 to illustrate the use of an offset



require(foreign)
medpar <- read.dta("glmext3/medpar.dta")
m <- glm(los ~ hmo + white + type2 + type3, data=medpar, family=poisson(link="log"))
summary(m)
Warnmeldung:
In read.dta("glmext3/medpar.dta") :
  Faktorbeschriftungen von Stata-5-Dateien können nicht gelesen werden

Call:
glm(formula = los ~ hmo + white + type2 + type3, family = poisson(link = "log"), 
    data = medpar)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-5.3063  -1.8259  -0.6319   1.0081  15.3827  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  2.33293    0.02721  85.744  < 2e-16 ***
hmo         -0.07155    0.02394  -2.988  0.00281 ** 
white       -0.15387    0.02741  -5.613 1.99e-08 ***
type2        0.22165    0.02105  10.529  < 2e-16 ***
type3        0.70948    0.02614  27.146  < 2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

(Dispersion parameter for poisson family taken to be 1)

    Null deviance: 8901.1  on 1494  degrees of freedom
Residual deviance: 8142.7  on 1490  degrees of freedom
AIC: 13868

Number of Fisher Scoring iterations: 5
  • Wald test whether the coefficient on white is equal to -0.2
require(survey)
regTermTest(m,test.terms = "white",null=-0.2,method="Wald")
Wald test for white
 in glm(formula = los ~ hmo + white + type2 + type3, family = poisson(link = "log"), 
    data = medpar)
F =  2.831664  on  1  and  1490  df: p= 0.092632
  • define model with offset variable white=-0.2
  • do the likelihood-ratio test
m1 <- glm(los ~ hmo + type2 + type3 + offset(-0.2*white), data=medpar, family=poisson(link="log"))
anova(m1,m,test="Chisq")
Analysis of Deviance Table

Model 1: los ~ hmo + type2 + type3 + offset(-0.2 * white)
Model 2: los ~ hmo + white + type2 + type3
  Resid. Df Resid. Dev Df Deviance Pr(>Chi)  
1      1491     8145.5                       
2      1490     8142.7  1   2.8657  0.09049 .
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Sunday, April 7, 2013

R - drop unused levels from factors

  • drops unused levels from factors
  • can be used on a factor or a data frame (which makes just sense when the data frame contains any factor)
  • syntax: droplevels(df)
  • create an example data frame
exdf <- data.frame(x=factor(1:10,labels=letters[1:10]),y=rbinom(10,1,0.5))
exdf
x y
1  a 0
2  b 1
3  c 1
4  d 0
5  e 1
6  f 1
7  g 0
8  h 0
9  i 1
10 j 1
  • subset exdf by y==1
exdf <- exdf[exdf$y==1,]
summary(exdf$x)
a b c d e f g h i j 
0 1 1 0 1 1 0 0 1 1
  • we see that all levels of y are present in the summary although several are not used anymore
exdf <- droplevels(exdf)
summary(exdf$x)
b c e f i j 
1 1 1 1 1 1

R - abbreviate strings automatically


  • abbreviate strings automatically
  • option minlength sets the number of characters
  • abbreviate in a way that they remain unique
data(armd.wide,package="nlmeU")
names(armd.wide)
 [1] "subject"  "lesion"   "line0"    "visual0"  "visual4"  "visual12"
 [7] "visual24" "visual52" "treat.f"  "miss.pat"
abbreviate(names(armd.wide))
 subject   lesion    line0  visual0  visual4 visual12 visual24 visual52 
  "sbjc"   "lesn"   "lin0"   "vsl0"   "vsl4"   "vs12"   "vs24"   "vs52" 
 treat.f miss.pat 
  "trt."   "mss."