mgcv                  package:mgcv                  R Documentation

_M_u_l_t_i_p_l_e _S_m_o_o_t_h_i_n_g _P_a_r_a_m_e_t_e_r _E_s_t_i_m_a_t_i_o_n _b_y _G_C_V _o_r _U_B_R_E

_D_e_s_c_r_i_p_t_i_o_n:

     Function to efficiently estimate smoothing parameters in
     Generalized Ridge Regression Problem with multiple (quadratic)
     penalties, by GCV  or UBRE. The function uses Newton's method in
     multi-dimensions, backed up by steepest descent to iteratively 
     adjust a set of relative smoothing parameters for each penalty. To
     ensure that the overall level of smoothing is optimal, and to
     guard against trapping by local minima, a highly efficient global
     minimisation with respect to  one overall smoothing parameter is
     also made at each iteration.

     The other functions in the 'mgcv' package are listed under `See
     Also'.

_U_s_a_g_e:

     mgcv(M)

_A_r_g_u_m_e_n_t_s:

       M: is the single argument to 'mgcv' and should be a list with
          the elements listed below:

        _M$_y The response data vector.

        _M$_w A vector of weights for the data (often proportional to the
              reciprocal of the standard deviation of 'M$y'). 

        _M$_X The design matrix for the problem, note that 'ncol(M$X)'
             must give the number of model parameters, while
             'nrow(M$X)'  should give the number of data.

        _M$_C Matrix containing any linear equality constraints  on the
             problem (i.e. C in Cp=0).

        _M$_S A one dimensional array containing the non-zero elements of
             the penalty matrices. Let 'start<-sum(M$df[1:(k-1)]^2)' or
             'start<-0' if 'k==1'. Then penalty matrix 'k' has
             'M$S[start+i+M$df[i]*(j-1)' on its ith row and jth column.
             The kth full penalty matrix is constructed by placing the
             matrix just derived into an appropriate matrix of zeroes
             with it's 1,1 element at 'M$off[k],M$off[k]'


        _M$_m The number of penalty terms that are to be applied.

        _M$_o_f_f Offset values locating the elements of 'M$S[]' in the
             correct location within each penalty coefficient matrix.
             indexing starts at zero.

        _M$_f_i_x 'M$fix[i]' is FALSE if the corresponding smoothing
             parameter  is to be estimated, or TRUE if it is to be
             fixed at zero.

        _M$_d_f 'M$df[i]' gives the number of rows and columns of 'M$S[i]'
             that are non-zero.

        _M$_s_i_g_2 This serves 2 purposes. If it is strictly positive then
             UBRE is  used with sig2 taken as the known error variance.
             If it is  non-positive then GCV is used (and on return it
             will contain an estimate of the error variance).  Note
             that it is assumed that the variance of y_i is  given by
             'sig2'/w_i.   

        _M$_s_p An array of initial smoothing parameter estimates if any
             of  these is not strictly positive then automatic
             generation of initial values is used. If 'M$fixed.sp' is
             non-zero then these parameters will used as known
             smoothing parameters.

        _M$_f_i_x_e_d._s_p Non-zero to indicate that all smoothing parameters
             are supplied as known, and are not  to be estimated.

        _M$_c_o_n_v._t_o_l The convergence tolerence to use for the multiple
             smoothing parameter search. 1e-6 is usually ok.

        _M$_m_a_x._h_a_l_f Each step of the GCV minimisation is repeatedly
             halved if it fails to improve the GCV score. This is the
             maximum number of halvings to try at each step. 15 is
             usually enough.

        _M$_m_i_n._e_d_f The minimum estimated degrees of freedom for the
             model: set to less than or equal to 0 to ignore. Supplying
             this quantity helps to maintain numerical stability in
             cases in which the best model is  actually very smooth. 

        _M$_t_a_r_g_e_t._e_d_f Set this to negative for it to be ignored. If it
             is positive then this is taken as the target edf to use
             for the model - if the GCV/UBRE function has multiple
             minima w.r.t. the overall smoothing parameter then the 
             minimum closest to the target edf will be selected. This
             feature is primarily useful when selecting smoothing
             parameters as part  of iterative least squares methods -
             the GCV/UBRE score can develop spurious minima in these
             cases that are  not representative of the problem being
             approximated.

_D_e_t_a_i_l_s:

     This is documentation for the code implementing the method
     described in section  4 of  Wood (2000) . The method is a
     computationally efficient means of applying GCV to  the  problem
     of smoothing parameter selection in generalized ridge regression
     problems  of  the form:

 minimise || W (Xp-y) ||^2 rho +  lambda_1 p'S_1 p + lambda_1 p'S_2 p + . . .

     possibly subject to constraints Cp=0.  X is a design matrix, p a
     parameter vector,  y a data vector, W a diagonal weight matrix,
     S_i a positive semi-definite matrix  of coefficients defining the
     ith penalty and C a matrix of coefficients  defining any linear
     equality constraints on the problem. The smoothing parameters are
     the lambda_i but there is an overall smoothing parameter rho as
     well. Note that X must be of full column rank, at least when
     projected  into the null space of any equality constraints.  

     The method operates by alternating very efficient direct searches
     for  rho with Newton or steepest descent updates of the logs of
     the lambda_i.  Because the GCV/UBRE scores are flat w.r.t. very
     large or very small lambda_i,  it's important to get good starting
     parameters, and to be careful not to step into a flat region of
     the smoothing parameter space. For this reason the algorithm
     rescales any Newton step that  would result in a log(lambda_i)
     change of more than 5. Newton steps are only used if the Hessian
     of the GCV/UBRE is postive definite, otherwise steepest descent is
     used. Similarly steepest  descent is used if the Newton step has
     to be contracted too far (indicating that the quadratic model 
     underlying Newton is poor). All initial steepest descent steps are
     scaled so that their largest component is 1. However a step is
     calculated, it is never expanded if it is successful (to avoid
     flat portions of the objective),  but steps are successively
     halved if they do not decrease the GCV/UBRE score, until they do,
     or the direction is deemed to have  failed. 'M$conv' provides some
     convergence diagnostics.

     The method is coded in 'C' and is intended to be portable. It
     should be  noted that seriously ill conditioned problems (i.e.
     with close to column rank  deficiency in the design matrix) may
     cause problems, especially if weights vary  wildly between
     observations.

_V_a_l_u_e:

     The function returns a list of the same form as the input list
     with the  following elements added/modified: 

     M$p: The best fit parameters given the estimated smoothing
          parameters.

    M$sp: The estimated smoothing parameters  (lambda_i/rho)

  M$sig2: The estimated (GCV) or supplied (UBRE) error variance.

   M$edf: The estimated degrees of freedom associated with each
          penalized term. If p_i is the vector of parameters penalized
          by the ith penalty then we can write p_i = P_i y, and the edf
          for the ith term is tr(XP_i)... while this is sensible in the
          non-overlapping penalty case (e.g. GAMs) it makes alot less
          sense if penalties overlap!

    M$Vp: Estimated covariance matrix of model parameters.

M$gcv.ubre: The minimized GCV/UBRE score.

  M$conv: A list of convergence diagnostics, with the following
          elements:

     _e_d_f Array of whole model estimated degrees of freedom.

     _h_a_t Array of elements from leading diagonal of `hat' matrix
          (`influence' matrix). Same length as 'M$y'.

     _s_c_o_r_e Array of ubre/gcv scores at the edfs for the final set of
          relatinve smoothing parameters.

     _g the gradient of the GCV/UBRE score w.r.t. the smoothing
          parameters at termination.

     _h the second derivatives corresponding to 'g' above - i.e. the
          leading diagonal of the Hessian.

     _e the eigen-values of the Hessian. These should all be
          non-negative!

     _i_t_e_r the number of iterations taken.

     _i_n._o_k 'TRUE' if the second smoothing parameter guess improved the
          GCV/UBRE score. (Please report examples  where this is
          'FALSE')

     _s_t_e_p._f_a_i_l 'TRUE' if the algorithm terminated by failing to improve
          the GCV/UBRE score rather than by "converging".  Not
          necessarily a problem, but check the above derivative
          information quite carefully. 

_W_A_R_N_I_N_G:

     The method may not behave well with near column rank deficient X
     especially in contexts where the weights vary wildly.

_A_u_t_h_o_r(_s):

     Simon N. Wood simon@stats.gla.ac.uk

_R_e_f_e_r_e_n_c_e_s:

     Gu and Wahba (1991) Minimizing GCV/GML scores with multiple
     smoothing parameters via the Newton method. SIAM J. Sci. Statist.
     Comput. 12:383-398

     Wood, S.N. (2000)  Modelling and Smoothing Parameter Estimation
     with Multiple  Quadratic Penalties. J.R.Statist.Soc.B
     62(2):413-428

     Wood, S.N. (2003) Thin plate regression splines. J.R.Statist.Soc.B
     65(1):95-114

     <URL: http://www.stats.gla.ac.uk/~simon/>

_S_e_e _A_l_s_o:

     'gam', 's', 'plot.gam', 'print.gam', 'summary.gam', 'predict.gam',
     'vis.gam', 'residuals.gam', 'gam.models', 'gam.selection',
     'gam.control' 'gam.check' 'pcls', 'mono.con' 'GAMsetup',
     'gam.parser', 'gam.fit', 'gam.setup', 'QT', 'gam.neg.bin',
     'null.space.dimension', 'uniquecombs', 'exclude.too.far'

_E_x_a_m_p_l_e_s:

     set.seed(0)    
     n<-100 # number of observations to simulate
     x <- runif(4 * n, 0, 1) # simulate covariates
     x <- array(x, dim = c(4, n)) # put into array for passing to GAMsetup
     pi <- asin(1) * 2  # begin simulating some data
     y <- 2 * sin(pi * x[1, ])
     y <- y + exp(2 * x[2, ]) - 3.75887
     y <- y + 0.2 * x[3, ]^11 * (10 * (1 - x[3, ]))^6 + 10 * (10 *
             x[3, ])^3 * (1 - x[3, ])^10 - 1.396
     sig2<- -1    # set magnitude of variance
     e <- rnorm(n, 0, sqrt(abs(sig2)))
     y <- y + e          # simulated data
     w <- matrix(1, n, 1) # weight matrix
     par(mfrow = c(2, 2)) # scatter plots of simulated data   
     plot(x[1, ], y);plot(x[2, ], y)
     plot(x[3, ], y);plot(x[4, ], y)
     # create list for passing to GAMsetup
     G <- list(m = 4, n = n, nsdf = 0, df = c(15, 15, 15, 15),dim=c(1,1,1,1),
               s.type=c(0,0,0,0),by=0,by.exists=c(FALSE,FALSE,FALSE,FALSE),
               p.order=c(0,0,0,0), x = x,n.knots=rep(0,4),fit.method="mgcv") 
     H <- GAMsetup(G)
     H$y <- y    # add data to H
     H$sig2 <- sig2  # add variance (signalling GCV use in this case) to H
     H$w <- w       # add weights to H
     H$sp<-array(-1,H$m)
     H$fix<-array(FALSE,H$m)
     H$conv.tol<-1e-6;H$max.half<-15
     H$min.edf<-5;H$fixed.sp<-0
     H <- mgcv(H)  # select smoothing parameters and fit model 

