grep                  package:base                  R Documentation

_P_a_t_t_e_r_n _M_a_t_c_h_i_n_g _a_n_d _R_e_p_l_a_c_e_m_e_n_t

_D_e_s_c_r_i_p_t_i_o_n:

     'grep' searches for matches to 'pattern' (its first argument)
     within the character vector 'x' (second argument).  'regexpr' does
     too, but returns more detail in a different format.

     'sub' and 'gsub' perform replacement of matches determined by
     regular expression matching.

_U_s_a_g_e:

     grep(pattern, x, ignore.case = FALSE, extended = TRUE, perl = FALSE,
          value = FALSE, fixed = FALSE)
     sub(pattern, replacement, x,
             ignore.case = FALSE, extended = TRUE, perl = FALSE)
     gsub(pattern, replacement, x,
             ignore.case = FALSE, extended = TRUE, perl = FALSE)
     regexpr(pattern, text,  extended = TRUE, perl = FALSE, fixed = FALSE)

_A_r_g_u_m_e_n_t_s:

 pattern: character string containing a regular expression (or
          character string for 'fixed = TRUE') to be matched in the
          given character vector.

 x, text: a character vector where matches are sought.

ignore.case: if 'FALSE', the pattern matching is _case sensitive_ and
          if 'TRUE', case is ignored during matching.

extended: if 'TRUE', extended regular expression matching is used, and
          if 'FALSE' basic regular expressions are used.

    perl: logical. Should perl-compatible regexps be used if available?
           Has priority over 'extended'.

   value: if 'FALSE', a vector containing the ('integer') indices of
          the matches determined by 'grep' is returned, and if 'TRUE',
          a vector containing the matching elements themselves is
          returned.

   fixed: logical.  If 'TRUE', 'pattern' is a string to be matched as
          is.  Overrides all other arguments.

replacement: a replacement for matched pattern in 'sub' and 'gsub'.

_D_e_t_a_i_l_s:

     The two '*sub' functions differ only in that 'sub' replaces only
     the first occurrence of a 'pattern' whereas 'gsub' replaces all
     occurrences.

     For 'regexpr' it is an error for 'pattern' to be 'NA', otherwise
     'NA' is permitted and matches only itself.

     The regular expressions used are those specified by POSIX 1003.2,
     either extended or basic, depending on the value of the 'extended'
     argument, unless 'perl = TRUE' when they are those of PCRE, <URL:
     ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre/>.

_V_a_l_u_e:

     For 'grep' a vector giving either the indices of the elements of
     'x' that yielded a match or, if 'value' is 'TRUE', the matched
     elements.

     For 'sub' and 'gsub' a character vector of the same length as the
     original.

     For 'regexpr' an integer vector of the same length as 'text'
     giving the starting position of the first match, or -1 if there is
     none, with attribute '"match.length"' giving the length of the
     matched text (or -1 for no match).

_R_e_f_e_r_e_n_c_e_s:

     Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) _The New S
     Language_. Wadsworth & Brooks/Cole ('grep')

_S_e_e _A_l_s_o:

     'agrep' for approximate matching.

     'tolower', 'toupper' and 'chartr' for character translations.
     'charmatch', 'pmatch', 'match'. 'apropos' uses regexps and has
     nice examples.

_E_x_a_m_p_l_e_s:

     grep("[a-z]", letters)

     txt <- c("arm","foot","lefroo", "bafoobar")
     if(any(i <- grep("foo",txt)))
        cat("'foo' appears at least once in\n\t",txt,"\n")
     i # 2 and 4
     txt[i]

     ## Double all 'a' or 'b's;  "\" must be escaped, i.e., 'doubled'
     gsub("([ab])", "\\1_\\1_", "abc and ABC")

     txt <- c("The", "licenses", "for", "most", "software", "are",
       "designed", "to", "take", "away", "your", "freedom",
       "to", "share", "and", "change", "it.",
        "", "By", "contrast,", "the", "GNU", "General", "Public", "License",
        "is", "intended", "to", "guarantee", "your", "freedom", "to",
        "share", "and", "change", "free", "software", "--",
        "to", "make", "sure", "the", "software", "is",
        "free", "for", "all", "its", "users")
     ( i <- grep("[gu]", txt) ) # indices
     stopifnot( txt[i] == grep("[gu]", txt, value = TRUE) )
     (ot <- sub("[b-e]",".", txt))
     txt[ot != gsub("[b-e]",".", txt)]#- gsub does "global" substitution

     txt[gsub("g","#", txt) !=
         gsub("g","#", txt, ignore.case = TRUE)] # the "G" words

     regexpr("en", txt)

     ## trim trailing white space
     str = 'Now is the time      '
     sub(' +$', '', str)  ## spaces only
     sub('[[:space:]]+$', '', str) ## white space, POSIX-style
     if(capabilities("PCRE"))
       sub('\\s+$', '', str, perl = TRUE) ## perl-style white space

