In making global replacements, UNIX editors such as vi allow you to search not just for fixed strings of characters, but also for variable patterns of words, referred to as regular expressions.
When you specify a literal string of characters, the search might turn up other occurrences that you didn't want to match. The problem with searching for words in a file is that a word can be used in different ways. Regular expressions help you conduct a search for words in context. Note that regular expressions can be used with the vi search commands / and ? as well as in the ex :g and :s commands.
For the most part, the same regular expressions work with other UNIX programs such as grep, sed, and awk.[19]
[19]Much more information on regular expressions can be found in the two O'Reilly books sed & awk, by Dale Dougherty and Arnold Robbins, and Mastering Regular Expressions, by Jeffrey E.F. Friedl.
Regular expressions are made up by combining normal characters with a number of special characters called metacharacters.[20] The metacharacters and their uses are listed below.
[20]Technically speaking, we should probably call these metasequences, since sometimes two characters together have special meaning, and not just single characters. Nevertheless, the term metacharacters is in common use in UNIX literature, so we follow that convention here.
The * can follow a metacharacter. For example, since . (dot) means any character, .* means "match any number of any character."
Here's a specific example of this. The command :s/End.*/End/ removes all characters after End (it replaces the remainder of the line with nothing).
You can include more than one range inside brackets, and you can specify a mix of ranges and separate characters. For example, [:;A-Za-z()] will match four different punctuation marks, plus all letters.
Most metacharacters lose their special meaning inside brackets, so you don't need to escape them if you want to use them as ordinary characters. Within brackets, the three metacharacters you still need to escape are \ - ]. The hyphen (-) acquires meaning as a range specifier; to use an actual hyphen, you can also place it as the first character inside the brackets.
A caret (^) has special meaning only when it is the first character inside the brackets, but in this case the meaning differs from that of the normal ^ metacharacter. As the first character within brackets, a ^ reverses their sense: the brackets will match any one character not in the list. For example, [^a-z] matches any character that is not a lowercase letter.
\(That\) or \(this\)
saves That in hold buffer number 1 and saves this in hold buffer number 2. The patterns held can be "replayed" in substitutions by the sequences \1 to \9. For example, to rephrase That or this to read this or That, you could enter:
:%s/\(That\) or \(this\)/\2 or \1/
You can also use the \n notation within a search or substitute string:
changes abcdabcd into alphabet-soup.[21]:s/\(abcd\)\1/alphabet-soup/
[21]This works with vi, nvi, and vim, but not with elvis 2.0, vile 7.4, or vile 8.0.
[22]This is a rather flaky feature of the original vi. After using it, the saved search pattern is set to the new text typed after the ~, not the combined new pattern, as one might expect. Also, none of the clones behaves this way. So, while this feature exists, it has little to recommend its use.
Several of the clones support optional, extended regular expression syntaxes. See Section 8.4 for more information.
We have just described the use of brackets for matching any one of the enclosed characters, such as [a-z]. The POSIX standard introduced additional facilities for matching characters that are not in the English alphabet. For example, the French è is an alphabetic character, but the typical character class [a-z] would not match it. Additionally, the standard provides for sequences of characters that should be treated as a single unit when matching and collating (sorting) string data.
POSIX also formalizes the terminology. Groups of characters within brackets are called a "bracket expression" in the POSIX standard. Within bracket expressions, beside literal characters such as a, !, and so on, you can have additional components. These are:
Character classes. A POSIX character class consists of keywords bracketed by [: and :]. The keywords describe different classes of characters such as alphabetic characters, control characters, and so on (see Table 6.1).
Collating symbols. A collating symbol is a multi-character sequence that should be treated as a unit. It consists of the characters bracketed by [. and .].
Equivalence classes. An equivalence class lists a set of characters that should be considered equivalent, such as e and è. It consists of a named element from the locale, bracketed by [= and =].
All three of these constructs must appear inside the square brackets of a bracket expression. For example [[:alpha:]!] matches any single alphabetic character or the exclamation point, [[.ch.]] matches the collating element ch, but does not match just the letter c or the letter h. In a French locale, [[=e=]] might match any of e, è, or é. Classes and matching characters are shown in Table 6.1.
Class | Matching Characters |
---|---|
[:alnum:] | Alphanumeric characters |
[:alpha:] | Alphabetic characters |
[:blank:] | Space and tab characters |
[:cntrl:] | Control characters |
[:digit:] | Numeric characters |
[:graph:] | Printable and visible (non-space) characters |
[:lower:] | Lowercase characters |
[:print:] | Printable characters (includes whitespace) |
[:punct:] | Punctuation characters |
[:space:] | Whitespace characters |
[:upper:] | Uppercase characters |
[:xdigit:] | Hexadecimal digits |
You will have to do some research to determine if you have this facility in your version of vi. You may need to use a special option to enable POSIX compliance, have a particular environment variable set, or use a version of vi that is in an unusual directory.
vi on HP-UX 9.x (and newer) systems support POSIX bracket expressions, as does /usr/xpg4/bin/vi, on Solaris (but not /usr/bin/vi). This facility is also available in nvi, and in elvis 2.1. As commercial UNIX vendors become standards-compliant, expect to see this feature become more widespread.
When you make global replacements, the regular expressions above carry their special meaning only within the search portion (the first part) of the command.
For example, when you type this:
:%s/1\. Start/2. Next, start with $100/
note that the replacement string treats the characters . and $ literally, without your having to escape them. By the same token, let's say you enter:
:%s/[ABC]/[abc]/g
If you're hoping to replace A with a, B with b, and C with c, you'll be surprised. Since brackets behave like ordinary characters in a replacement string, this command will change every occurrence of A, B, or C to the five-character string [abc].
To solve problems like this, you need a way to specify variable replacement strings. Fortunately, there are additional metacharacters that have special meaning in a replacement string.
:%s/Yazstremski/&, Carl/
The replacement will say Yazstremski, Carl. The & can also replace a variable pattern (as specified by a regular expression). For example, to surround each line from 1 to 10 with parentheses, type:
:1,10s/.*/(&)/
The search pattern matches the whole line, and the & "replays" the line, followed by your text.
[23]Modern versions of the ed editor use % as the sole character in the replacement text to mean "the replacement text of the last substitute command."
:%s/yes, doctor/\uyes, \udoctor/
This is a pointless example, though, since it's easier just to type the replacement string with initial caps in the first place. As with any regular expression, \u and \l are most useful with a variable string. Take, for example, the command we used earlier:
:%s/\(That\) or \(this\)/\2 or \1/
The result is this or That, but we need to adjust the cases. We'll use \u to uppercase the first letter in this (currently saved in hold buffer 2); we'll use \l to lowercase the first letter in That (currently saved in hold buffer 1):
:s/\(That\) or \(this\)/\u\2 or \l\1/
The result is This or that. (Don't confuse the number one with the lowercase l; the one comes after.)
:%s/Fortran/\UFortran/
or, using the & character to repeat the search string:
:%s/Fortran/\U&/
All pattern searches are case-sensitive. That is, a search for the will not find The. You can get around this by specifying both uppercase and lowercase in the pattern:
/[Tt]he
You can also instruct vi to ignore case by typing :set ic. See Chapter 7, for additional details.
You should know some additional important facts about the substitute command:
A simple :s is the same as :s//~/. In other words, repeat the last substitution. This can save enormous amounts of time and typing when you are working your way through a document making the same change repeatedly, but you don't want to use a global substitution.
If you think of the & as meaning "the same thing" (as in what was just matched), this command is relatively mnemonic. You can follow the & with a g, to make the substitution globally on the line, and even use it with a line range:
:%&g repeat the last substitution everywhere
The & key can be used as a vi command to perform the :& command, i.e., to repeat the last substitution. This can save even more typing than :sRETURN; one keystroke versus three.
The :~ command is similar to the :& command, but with a subtle difference. The search pattern used is the last regular expression used in any command, not necessarily the one used in the last substitute command.
For example,[24] in the sequence:
[24]Thanks to Keith Bostic, in the nvi documentation, for this example.
:s/red/blue/ :/green :~
The :~ is equivalent to :s/green/blue/.
Besides the / character, you may use any non-alphanumeric, non-whitespace character as your delimiter, except backslash, double-quote, and the vertical bar (\, ", and |). This is particularly handy when you have to make a change to a pathname.
:%s;/user1/tim;/home/tim;g
When the edcompatible option is enabled, vi remembers the flags (g for global and c for confirmation) used on the last substitute, and applies them to the next one.
This is most useful when you are moving through a file and you wish to make global substitutions. You can make the first change:
:s/old/new/g :set edcompatible
After that, subsequent substitute commands will be global.
Despite the name, no known version of UNIX ed actually works this way.
Copyright © 2003 O'Reilly & Associates. All rights reserved.