Tuesday, November 29, 2011

R Confidence Intervals and Regions in a linear model

  • for a linear model: \( mm = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + ... + \beta_n x_n \) you can get the confidence intervals of the parameters \( \beta_0 ... \beta_n \)
data(trees)                  ## load the data
mm <- with(trees, lm(Volume ~ Girth + Height)) ## linear model
confint(mm)                  ## get the confidence intervals
2.5 %      97.5 %
(Intercept) -75.68226247 -40.2930554
Girth         4.16683899   5.2494820
Height        0.07264863   0.6058538
[1] "org_babel_R_eoe"
  • the package ellipse provides a command to construct a 2-dimensional confidence region, here we will compute the ellipse for Girth and Height
library(ellipse)
plot(ellipse(mm,c(2,3)),type="l",xlim=c(0,5.5))
points(0,0)
points(coef(mm)[ 2],coef(mm)[ 3],pch=18)
abline(v=confint(mm)[2,],lty=2)
abline(h=confint(mm)[3,],lty=2)


  • we see: (0,0) lies outside the ellipse so we can reject \( H_0 \)
  • the two abline commands produce the lines indicating the one-way confidence intervals, if they were tangential to the ellipse, the CIs would be jointly correct

Monday, November 21, 2011

EBImage

installing

  • EBImage is a part of BioConductor and it is not available on CRAN, so you have to download and install it from the Bioconductor website
  • please type the following as super user (i.e. with admin rights), you will be asked whether you want to update a bunch of packages, answer a - this could take a few minutes
source("http://www.bioconductor.org/biocLite.R")
biocLite("EBImage")

Information about packages

show installed packages

  • show just the names of the installed packages (ordered)
sort(row.names(installed.packages()))
[1] "abind"           "acepack"         "AER"             "akima"          
  [5] "anchors"         "ape"             "base"            "bdsmatrix"      
  [9] "biglm"           "Biobase"         "BiocInstaller"   "bitops"         
 [13] "boot"            "car"             "CarbonEL"        "caTools"        
 [17] "chron"           "class"           "cluster"         "coda"           
 [21] "coda"            "codetools"       "coin"            "colorspace"     
 [25] "compiler"        "CompQuadForm"    "cubature"        "DAAG"           
 [29] "datasets"        "DBI"             "Deducer"         "DeducerExtras"  
 [33] "degreenet"       "Design"          "digest"          "diptest"        
 [37] "doMC"            "doSNOW"          "dynlm"           "e1071"          
 [41] "Ecdat"           "effects"         "ellipse"         "ergm"           
 [45] "fBasics"         "fCalendar"       "fEcofin"         "flexmix"        
...
  • the command installed.packages() provides much more information:
colnames(installed.packages())
[1] "Package"   "LibPath"   "Version"   "Priority"  "Depends"   "Imports"  
 [7] "LinkingTo" "Suggests"  "Enhances"  "OS_type"   "License"   "Built"
  • so if you want to know the package and its version
installed.packages()[,c("Package","Version")] # you can also use the col numbers c(1,3)
Package           Version      
Biobase         "Biobase"         "2.14.0"     
BiocInstaller   "BiocInstaller"   "1.2.1"      
GenABEL         "GenABEL"         "1.6-9"      
multtest        "multtest"        "2.10.0"     
abind           "abind"           "1.3-0"      
acepack         "acepack"         "1.3-3.0"    
AER             "AER"             "1.1-7"      
akima           "akima"           "0.5-4"      
anchors         "anchors"         "3.0-7"      
ape             "ape"             "2.7-1"      
bdsmatrix       "bdsmatrix"       "1.0"        
...
  • show information about a package
packageDescription("multtest")
Package: multtest
Title: Resampling-based multiple hypothesis testing
Version: 2.10.0
Author: Katherine S. Pollard, Houston N. Gilbert, Yongchao Ge, Sandra
        Taylor, Sandrine Dudoit
Description: Non-parametric bootstrap and permutation resampling-based
        multiple testing procedures (including empirical Bayes methods)
        for controlling the family-wise error rate (FWER), generalized
        family-wise error rate (gFWER), tail probability of the
        proportion of false positives (TPPFP), and false discovery rate
        (FDR).  Several choices of bootstrap-based null distribution
        are implemented (centered, centered and scaled,
        quantile-transformed). Single-step and step-wise methods are
        available. Tests based on a variety of t- and F-statistics
        (including t-statistics based on regression parameters from
        linear and survival models as well as those based on
        correlation parameters) are included.  When probing hypotheses
        with t-statistics, users may also select a potentially faster
        null distribution which is multivariate normal with mean zero
        and variance covariance matrix derived from the vector
        influence function.  Results are reported in terms of adjusted
        p-values, confidence regions and test statistic cutoffs. The
        procedures are directly applicable to identifying
        differentially expressed genes in DNA microarray experiments.
Maintainer: Katherine S. Pollard <kpollard@gladstone.ucsf.edu>
Depends: R (>= 2.9.0), methods, Biobase
Imports: survival, MASS
Suggests: snow
License: LGPL
biocViews: Microarray, DifferentialExpression, MultipleComparisons
LazyLoad: yes
Packaged: 2011-11-01 04:28:05 UTC; biocbuild
Built: R 2.14.0; i686-pc-linux-gnu; 2011-11-11 10:32:24 UTC; unix

-- File: /home/mandy/R/i686-pc-linux-gnu-library/2.14/multtest/Meta/package.rds