grep                  package:base                  R Documentation

_P_a_t_t_e_r_n _M_a_t_c_h_i_n_g _a_n_d _R_e_p_l_a_c_e_m_e_n_t

_D_e_s_c_r_i_p_t_i_o_n:

     `grep' searches for matches to `pattern' (its first argument)
     within the vector `x' of character strings (second argument).
     `regexpr' does too, but returns more detail in a different format.

     `sub' and `gsub' perform replacement of matches determined by
     regular expression matching.

_U_s_a_g_e:

     grep(pattern, x, ignore.case=FALSE, extended=TRUE, value=FALSE)
     sub(pattern, replacement, x,
             ignore.case=FALSE, extended=TRUE)
     gsub(pattern, replacement, x,
             ignore.case=FALSE, extended=TRUE)
     regexpr(pattern, text,  extended=TRUE)

_A_r_g_u_m_e_n_t_s:

 pattern: character string containing a regular expression to be
          matched in the vector of character string `vec'.

 x, text: a vector of character strings where matches are sought.

ignore.case: if `FALSE', the pattern matching is case sensitive and if
          `TRUE', case is ignored during matching.

extended: if `TRUE', extended regular expression matching is used, and
          if `FALSE' basic regular expressions are used.

   value: if `FALSE', a vector containing the (`integer') indices of
          the matches determined by `grep' is returned, and if `TRUE',
          a vector containing the matching elements themselves is
          returned.

replacement: a replacement for matched pattern in `sub' and `gsub'.

_D_e_t_a_i_l_s:

     The two `*sub' functions differ only in that `sub' replaces only
     the first occurrence of a `pattern' whereas `gsub' replaces all
     occurrences.

     The regular expressions used are those specified by POSIX 1003.2,
     either extended or basic, depending on the value of the `extended'
     argument.

_V_a_l_u_e:

     For `gsub' a vector giving either the indices of the elements of
     `x' that yielded a match or, if `value' is `TRUE', the matched
     elements.

     For `sub' and `gsub' a character vector of the same length as the
     original.

     For `regexpr' an integer vector of the same length as `text'
     giving the starting position of the first match, or -1 if there is
     none, with attribute `"match.length"' giving the length of the
     matched text (or -1 for no match).

_S_e_e _A_l_s_o:

     `charmatch', `pmatch', `match'. `apropos' uses regexps and has
     nice examples.

_E_x_a_m_p_l_e_s:

     grep("[a-z]", letters)

     txt <- c("arm","foot","lefroo", "bafoobar")
     if(any(i <- grep("foo",txt)))
        cat("`foo' appears at least once in\n\t",txt,"\n")
     i # 2 and 4
     txt[i]

     ## Double all 'a' or 'b's;  "\" must be escaped, i.e. `doubled'
     gsub("([ab])", "\\1_\\1_", "abc and ABC")

     txt <- c("The", "licenses", "for", "most", "software", "are",
       "designed", "to", "take", "away", "your", "freedom",
       "to", "share", "and", "change", "it.",
        "", "By", "contrast,", "the", "GNU", "General", "Public", "License",
        "is", "intended", "to", "guarantee", "your", "freedom", "to",
        "share", "and", "change", "free", "software", "--",
        "to", "make", "sure", "the", "software", "is",
        "free", "for", "all", "its", "users")
     ( i <- grep("[gu]", txt) ) # indices
     stopifnot( txt[i] == grep("[gu]", txt, value = TRUE) )
     (ot <- sub("[b-e]",".", txt))
     txt[ot != gsub("[b-e]",".", txt)]#- gsub does "global" substitution

     txt[gsub("g","#", txt) !=
         gsub("g","#", txt, ignore.case = TRUE)] # the "G" words

     regexpr("en", txt)

