GAMsetup                package:mgcv                R Documentation

_S_e_t _u_p _G_A_M _u_s_i_n_g _p_e_n_a_l_i_z_e_d _r_e_g_r_e_s_s_i_o_n _s_p_l_i_n_e_s

_D_e_s_c_r_i_p_t_i_o_n:

     Sets up design matrix X, penalty matrices S_i and linear equality
     constraint matrix C for a GAM defined in terms of  penalized
     regression splines. Various other information characterising the
     bases used is also returned. The output is such that the model can
     be fitted and  smoothing parameters estimated by the method of
     Wood (2000) as implemented in routine 'mgcv()'. This is usually
     called by 'gam'.

_U_s_a_g_e:

     GAMsetup(G)

_A_r_g_u_m_e_n_t_s:

       G: is the single argument to this function: it is a list
          containing at least the elements listed below:

        _m the number of smooth terms in the model

        _d_f an array of 'G$m' integers specifying the maximum d.f. for
             each spline  term.

        _n the number of data to be modelled

        _n_s_d_f the number of user supplied columns of the design matrix
             for any parametric  model parts

        _d_i_m An array of dimensions for the smooths. 'dim[i]' is the
             number of covariates that smooth 'i' is a function of.

        _s_p array of supplied smoothing parameters. If 'fit.method' is 
             '"magic"' then this may be a mixture of positive numbers,
             which are  used as the smoothing parameters, and negative
             to indicate that the  parameters are to be estimated. With
             '"mgcv"' this is unused.

        _f_i_x An array of logicals indicating whether each smooth term
             has fixed degrees of freedom or not.

        _s._t_y_p_e An array giving the type of basis used for each term. 0
             for cubic regression spline, 1 for t.p.r.s 

        _p._o_r_d_e_r An array giving the order of the penalty for each term.
             0 for auto selection.

        _x an array of 'G$n' element arrays of data and (optionally)
             design matrix  columns. The first 'G$nsdf' elements of
             'G$x' should contain the elements of  the columns of the
             design matrix corresponding to the parametric part of the
             model. The  remaining 'G$m' elements of 'G$x' are the
             values of the covariates that are  arguments of the spline
             terms. Note that the smooths will be centred and no
             intercept term  will be added unless an array of 1's is
             supplied as part of in 'G$x'

        _v_n_a_m_e_s Array of variable names, including the constant, if
             present.

        _w prior weights on response data.

        _b_y a 2-d array of 'by' variables (i.e. covariates that multiply
             a smooth term) 'by[i,j]' is the jth value for the ith 'by'
             variable. There are only as many rows of this array as
             there are 'by' variables in the model (often 0). The
             rownames of 'by' give the 'by' variable names.

        _b_y._e_x_i_s_t_s an array of logicals: 'by.exists[i]' is 'TRUE' if the
             ith smooth has a 'by' variable associated with it, 'FALSE'
             otherwise.

        _k_n_o_t_s a compact array of user supplied knot locations for each
             smooth, in the order corresponding  to the  row order in
             'G$x'. There are 'G$dim[i]' arrays of length
             'G$n.knots[i]' for the ith smooth - all these arrays are
             packed end to end in 1-d array 'G$knots' -  zero length 1
             for no knots.

        _n._k_n_o_t_s array giving number of user supplied knots of basis for
             each smooth term 0's for none supplied.

        _f_i_t._m_e_t_h_o_d one of '"mgcv"' for the Wood (2000) method or 
             '"magic"' for a more recent and in principle more stable 
             alternative.  'min.sp'{lower bounds on the smoothing
             parameters: only possible if  fit method is '"magic"'}
             'H'{the offset penalty matrix, 'NULL' for none. This is
             the  coefficient matrix of any user supplied fixed
             penalty.}

_V_a_l_u_e:

     A list 'H', containing the elements of 'G' (the input list) plus
     the  following:   

       X: the full design matrix.

       S: If 'fit.method' is '"magic"' then this is a one  dimensional
          array containing the non-zero elements of the penalty
          matrices. Let 'start[k+1]<-start[k]+H$df[1:(k-1)]^2' and
          'start[1]<-0'. Then penalty matrix 'k' has
          'H$S[start[k]+i+H$df[i]*(j-1)' on its ith row and jth column.
          To get the kth full penalty matrix the matrix so obtained
          would be inserted into a full matrix of zeroes with it's 1,1
          element at 'H$off[k],H$off[k]'.  If 'fit.method' is '"mgcv"'
          then this is a list of penalty  matrices, again stored as
          smallest matrices including all the non-zero  elements of the
          penalty matrix concerned. 

     off: is an array of offsets, used to facilitate efficient storage
          of the penalty  matrices and to indicate where in the overall
          parameter vector the parameters of the ith  spline reside
          (e.g. first parameter of ith spline is at 'p[off[i]+1]').

       C: a matrix defining the linear equality constraints on the
          parameters used to define the the model (i.e. C in Cp=0). 

      UZ: Array containing matrices, which transform from a t.p.r.s.
          basis to the equivalent t.p.s. basis (for t.p.r.s. terms
          only). The packing method is as follows: 
           set 'start[1]<-0' and
          'start[k+1]<-start[k]+(M[k]+n)*tp.bs[k]' where 'n' is number
          of data, 'M[k]' is penalty null space dimension and
          'tp.bs[k]' is zero for a cubic regression spline and the
          basis dimension for a t.p.r.s. Then element 'i,j' of the UZ
          matrix for model term 'k' is:
           'UZ[start[k]+i+(j=1)*(M[k]+n)]'.

      Xu: Set of unique covariate combinations for each term.  The
          packing method is as follows: 
           set 'start[1]<-0' and
          'start[k+1]<-start[k]+(xu.length[k])*tp.dim[k]' where
          'xu.length[k]' is number of unique covariate combinations and
          'tp.dim[k]' is zero for a cubic regression spline and the
          dimension of the smooth (i.e. number of covariates it is a
          function of) for a t.p.r.s. Then element 'i,j' of the Xu
          matrix for model term 'k' is:
           'Xu[start[k]+i+(j=1)*(xu.length[k])]'.

xu.length: Number of unique covariate combinations for each t.p.r.s.
          term.

covariate.shift: All covariates are centred around zero before bases
          are constructed - this is an array of the applied shifts.

      xp: matrix whose rows contain the covariate values corresponding
          to the  parameters  of each cubic regression spline - the
          cubic regression splines are parameterized using their y- 
          values at a series of x values - these vectors contain those
          x  values! Note that these will be covariate shifted.

    rank: an array giving the ranks of the penalty matrices.

  m.free: this is only for use with '"magic"' and is the number  of
          smoothing parameters that must be estimated.

   m.off: again only for '"magic"': the offests for the penalty 
          matrices for the penalties with smoothing parameters that
          must be  estimated.

_A_u_t_h_o_r(_s):

     Simon N. Wood simon@stats.gla.ac.uk

_R_e_f_e_r_e_n_c_e_s:

     Wood, S.N. (2000)  Modelling and Smoothing Parameter Estimation
     with Multiple  Quadratic Penalties. J.R.Statist.Soc.B
     62(2):413-428

     Wood, S.N. (2003) Thin plate regression splines. J.R.Statist.Soc.B
     65(1):95-114

     <URL: http://www.stats.gla.ac.uk/~simon/>

_S_e_e _A_l_s_o:

     'mgcv' 'gam'

_E_x_a_m_p_l_e_s:

     set.seed(0)
     n<-100 # number of observations to simulate
     x <- runif(5 * n, 0, 1) # simulate covariates
     x <- array(x, dim = c(5, n)) # put into array for passing to GAMsetup
     pi <- asin(1) * 2  # begin simulating some data
     y <- 2 * sin(pi * x[2, ])
     y <- y + exp(2 * x[3, ]) - 3.75887
     y <- y + 0.2 * x[4, ]^11 * (10 * (1 - x[4, ]))^6 + 10 * (10 * 
          x[4, ])^3 * (1 - x[4, ])^10 - 1.396
     sig2<- -1    # set magnitude of variance 
     e <- rnorm(n, 0, sqrt(abs(sig2)))
     y <- y + e          # simulated data
     w <- matrix(1, n, 1) # weight matrix
     par(mfrow = c(2, 2)) # scatter plots of simulated data
     plot(x[2, ], y);plot(x[3, ], y);plot(x[4, ], y);plot(x[5, ], y)
     x[1,]<-1
     # create list for passing to GAMsetup....
     G <- list(m = 4, n = n, nsdf = 0, df = c(15, 15, 15, 15),dim=c(1,1,1,1),
          s.type=c(0,0,0,0),by=0,by.exists=c(FALSE,FALSE,FALSE,FALSE),
          p.order=c(0,0,0,0),x = x,n.knots=rep(0,4),fit.method="mgcv")
     H <- GAMsetup(G)
     H$y <- y    # add data to H
     H$sig2 <- sig2  # add variance (signalling GCV use in this case) to H
     H$w <- w       # add weights to H
     H$sp<-array(-1,H$m)
     H$fix<-array(FALSE,H$m)
     H$conv.tol<-1e-6;H$max.half<-15
     H$min.edf<-5;H$fixed.sp<-0
     H <- mgcv(H)  # select smoothing parameters and fit model    

