Книга: Learning GNU Emacs, 3rd Edition

11.3.2.3 Context

11.3.2.3 Context

Another important category of regular expression operators has to do with specifying the context of a string, that is, the text around it. In Chapter 3 we saw the word-search commands, which are invoked as options within incremental search. These are special cases of context specification; in this case, the context is word-separation characters, for example, spaces or punctuation, on both sides of the string.

The simplest context operators for regular expressions are ^ and $, two more basic operators that are used at the beginning and end of regular expressions respectively. The ^ operator causes the rest of the regular expression to match only if it is at the beginning of a line; $ causes the regular expression preceding it to match only if it is at the end of a line. In Example 2, we need a function that matches occurrences of one or more asterisks at the beginning of a line; this will do it:

(defun remove-outline-marks ( )
  "Remove section header marks created in outline-mode."
  (interactive)
  (replace-regexp "^*+" ""))

This function finds lines that begin with one or more asterisks (the * is a literal asterisk and the + means "one or more"), and it replaces the asterisk(s) with the empty string "", thus deleting them.

Note that ^ and $ can't be used in the middle of regular expressions that are intended to match strings that span more than one line. Instead, you can put n (for Newline) in your regular expressions to match such strings. Another such character you may want to use is t for Tab. When ^ and $ are used with regular expression searches on strings instead of buffers, they match beginning- and end-of-string, respectively; the function string-match, described later in this chapter, can be used to do regular expression search on strings.

Here is a real-life example of a complex regular expression that covers the operators we have seen so far: sentence-end, a variable Emacs uses to recognize the ends of sentences for sentence motion commands like forward-sentence(M-e). Its value is:

"[.?!][]"')}]*($|t| )[ tn]*"

Let's look at this piece by piece. The first character set, [.?!], matches a period, question mark, or exclamation mark (the first two of these are regular expression operators, but they have no special meaning within character sets). The next part, []"')}]*, consists of a character set containing right bracket, double quote, single quote, right parenthesis, and right curly brace. A * follows the set, meaning that zero or more occurrences of any of the characters in the set matches. So far, then, this regexp matches a sentence-ending punctuation mark followed by zero or more ending quotes, parentheses, or curly braces. Next, there is the group ($|t| ), which matches any of the three alternatives $ (end of line), Tab, or two spaces. Finally, [ tn]* matches zero or more spaces, tabs, or newlines. Thus the sentence-ending characters can be followed by end-of-line or a combination of spaces (at least two), tabs, and newlines.

There are other context operators besides ^ and $; two of them can be used to make regular expression search act like word search. The operators < and > match the beginning and end of a word, respectively. With these we can go part of the way toward solving Example 3. The regular expression <program> matches "program" but not "programmer" or "programming" (it also won't match "microprogram"). So far so good; however, it won't match "program's" or "programs." For this, we need a more complex regular expression:

<program('s|s)?>

This expression means, "a word beginning with program followed optionally by apostrophe s or just s." This does the trick as far as matching the right words goes.

Оглавление книги


Генерация: 0.422. Запросов К БД/Cache: 2 / 0
поделиться
Вверх Вниз