In the previous lesson, we introduced the regular expression, a programming tool used to match patterns in a string. It is built into many different programming languages, and JavaScript is one of them.
The previous lesson focused on how to match a pattern using the built-in methods match()
and matchAll()
, as well as different matching modes that can be activated by providing the right flag. In this lesson, we are going to cover different ways to describe a pattern using the regular expression.
Matching a set of characters
In a regular expression, you can use a square bracket to match a set of characters instead of just one. For example,
1let regex = /[01234][56789][abc]/g;
2
3let str1 = "18b"; // matched
4let str2 = "98b"; // null, because 9 is outside of [01234]
5let str3 = "18z"; // null, because z is outside of [abc]
6
7console.log(str1.match(regex));
8console.log(str2.match(regex));
9console.log(str3.match(regex));
1[ '18b' ]
2null
3null
This regular expression defines the following pattern:
A number between0
and4
, followed by a number between5
and9
, followed by a letter betweena
andc
.
As a result, the str1
falls into this rage, but the other two don't.
There is a easier way to define a range of characters by using a hyphen (-
).
1let regex = /[0-4][5-9][a-c]/g;
2
3let str1 = "18b";
4let str2 = "98b";
5let str3 = "18z";
6
7console.log(str1.match(regex));
8console.log(str2.match(regex));
9console.log(str3.match(regex));
1[ '18b' ]
2null
3null
Shortcuts are provided for some commonly used character sets. We've seen an example of this, the \d
character.
\d
\d
matches all number digits, which is the same as defining /[0-9]/
.
1let regex = /\d\d\d/g;
2
3let str1 = "123";
4let str2 = "54321";
5let str3 = "1a2b3c";
6
7console.log(str1.match(regex));
8console.log(str2.match(regex));
9console.log(str3.match(regex));
1[ '123' ]
2[ '543' ]
3null
Pay attention to str2
, and notice that when a match is found, JavaScript will continue looking after the last digit of the match, in this case, the digit 3
, so 432
and 321
are not considered.
\w
\w
matches a word character, including 26 characters in the Latin alphabet, the numeric digits, and for some reason, the underscore.
1let regex = /\w\w\w/g;
2
3let str1 = "12345";
4let str2 = "1a2b3c";
5let str3 = "1a_2b_3c";
6
7console.log(str1.match(regex));
8console.log(str2.match(regex));
9console.log(str3.match(regex));
1[ '123' ]
2[ '1a2', 'b3c' ]
3[ '1a_', '2b_' ]
\s
\s
matches all white space characters, including space, newline, tab, and so on.
1let regex = /\s/g;
2
3let str1 = "12 34 56";
4let str2 = "12\n34\t56";
5let str3 = "";
6
7console.log(str1.match(regex));
8console.log(str2.match(regex));
9console.log(str3.match(regex));
1[ ' ', ' ' ]
2[ '\n', '\t' ]
3null
- The dot character (
.
)
The dot character matches everything, except for a newline character.
1let regex = /.../g;
2
3let str1 = "123\t\t\t";
4let str2 = "abc ";
5let str3 = "\n\n\n";
6
7console.log(str1.match(regex));
8console.log(str2.match(regex));
9console.log(str3.match(regex));
1[ '123', '\t\t\t' ]
2[ 'abc', ' ' ]
3null
As we've mentioned before, by enabling the s
mode (also known as the dotall mode), you can make .
match the newline character as well.
1let regex = /.../gs;
2
3let str1 = "123\t\t\t";
4let str2 = "abc ";
5let str3 = "\n\n\n";
6
7console.log(str1.match(regex));
8console.log(str2.match(regex));
9console.log(str3.match(regex));
1[ '123', '\t\t\t' ]
2[ 'abc', ' ' ]
3[ '\n\n\n' ]
Excluding a set of characters
Besides matching for a specified set of characters, you can also define a regular expression to match all characters other than the specified characters, by placing a caret (^
) right after the opening square bracket.
1let regex = /[^1-9][^a-e]/g;
2
3let str1 = "0b"; // null, because b falls in the range of a-e
4let str2 = "0z"; // matched
5let str3 = "2x"; // null, because 2 falls in the range of 1-9
6
7console.log(str1.match(regex));
8console.log(str2.match(regex));
9console.log(str3.match(regex));
1null
2[ '0z' ]
3null
For the shortcuts \d
, \w
, and \s
, their inverses would be \D
, \W
, and \S
.
\D
: anything but a numeric digit.\W
: anything but a word character.\S
: anything but a white space character.
1let regex = /\D\D/g;
2
3let str1 = "ab";
4let str2 = "0z"; // null, because 0 is a numeric digit
5let str3 = "\n\n";
6
7console.log(str1.match(regex));
8console.log(str2.match(regex));
9console.log(str3.match(regex));
Start and end anchors
You can mark the start of a text using a caret (^
). Yes, the same caret we just saw, but it has a different meaning outside of the square brackets.
1let regex = /^xyz/g;
2
3let str1 = "abc xyz"; // null, because xyz is not at the beginning of the text
4let str2 = "xyz xyz"; // Only the first xyz is returned
5let str3 = "xyz";
6
7console.log(str1.match(regex));
8console.log(str2.match(regex));
9console.log(str3.match(regex));
This regular expression looks for the pattern xyz
, and the letter x
must be the start of the text.
1null
2[ 'xyz' ]
3[ 'xyz' ]
Notice that even though str1
contains the pattern xyz
, it is not matched because the pattern is not at the beginning of the string.
Similarly, you can mark the end of a text using a dollar sign ($
).
1let regex = /xyz$/g;
2
3let str1 = "abc xyz";
4let str2 = "xyz xyz"; // Only the second xyz is matched
5let str3 = "xyz abc"; // null, because xyz is not at the end of the text
6
7console.log(str1.match(regex));
8console.log(str2.match(regex));
9console.log(str3.match(regex));
1[ 'xyz' ]
2[ 'xyz' ]
3null
Of course, it is possible to use both markers at the same time.
1let regex = /^xyz$/g;
2
3let str1 = "xyz";
4let str2 = "xyz xyz";
5let str3 = "abc xyz abc";
6
7console.log(str1.match(regex));
8console.log(str2.match(regex));
9console.log(str3.match(regex));
1[ 'xyz' ]
2null
3null
In this case, the regular expression matches the pattern xyz
, x
must be the start of the text, and z
must be the end of the text.
Lastly, after discussing the start and end anchors, we must look back at the m
flag. By default, the anchors only mark the start and end of the whole text, even for a multiline text. For example,
1let regex = /^xyz$/g;
2
3let str = `xyz xyz
4xyz
5xyz xyz`;
6
7console.log(str.match(regex));
1null
But under the m
mode, ^
and $
also mark the start and end of individual lines.
1let regex = /^xyz$/gm;
2
3let str = `xyz xyz
4xyz
5xyz xyz`;
6
7console.log(str.match(regex));
1[ 'xyz' ]
Word boundary
First of all, recall what is a word character.
Word characters are Latin letters, numeric digits, and the underscore.
Non-word characters, on the other hand, are anything but Latin letters, numeric digits, and the underscore, such as spaces, newlines, tabs, commas, periods, and so on.
In JavaScript, you can specify the boundary of a word using \b
. There are three types of word boundaries:
- Between the beginning of the text and a word character.
- Between a word character and the end of the text.
- In the middle of the text, between a word character and a non-word character.
For example,
1let regex = /\bJava\b/g;
2
3let str1 = "I'm learning to code in Java";
4let str2 = "I'm learning to code in JavaScript";
5
6console.log(str1.match(regex));
7console.log(str2.match(regex));
1[ 'Java' ]
2null
This regular expression is looking for the pattern Java
, and there must be a word boundary before the letter J
, and another word boundary after the letter a
.
For str1
, before the letter J
, there is a space character, which counts as a word boundary, and after the letter a
, there is the end of the text, which also counts as a word boundary. So the matched result Java
is returned.
For str2
, there is a word boundary before the letter J
, but after the letter a
, there is another word character S
, which does not fit our criteria to be a word boundary, so null
is turned.
Quantifiers
Previously, when we needed to match three number digits, this is the regular expression we made:
1let regex = /\d\d\d/g;
But, imagine you need to match for an ID number of 11 digits, the already cryptic regular expression will become even more difficult to read:
1let regex = /\d\d\d\d\d\d\d\d\d\d\d/g;
So instead, you can append a quantifier after \d
, indicating the number of digits that should be matched. For example, the previous regular expression can be rewritten as:
1let regex = /\d{11}/g;
2
3let str1 = "12345678901";
4let str2 = "12345";
5
6console.log(str1.match(regex));
7console.log(str2.match(regex));
1[ '12345678901' ]
2null
You can also define a range. For example, this is how you can match numeric values of 3 to 5 digits.
1let regex = /\d{3,5}/g;
2
3let str1 = "12"; // null, because there are less than 3 digits
4let str2 = "123";
5let str3 = "1234";
6let str4 = "12345";
7let str5 = "123456"; // matched, but return "12345"
8
9console.log(str1.match(regex));
10console.log(str2.match(regex));
11console.log(str3.match(regex));
12console.log(str4.match(regex));
13console.log(str5.match(regex));
1null
2[ '123' ]
3[ '1234' ]
4[ '12345' ]
5[ '12345' ]
Or you can define an open boundary range by omitting the upper limit.
1let regex = /\d{3,}/g; // matches more than 3 digits
2
3let str1 = "1";
4let str2 = "12";
5let str3 = "123";
6let str4 = "1234";
7
8console.log(str1.match(regex));
9console.log(str2.match(regex));
10console.log(str3.match(regex));
11console.log(str4.match(regex));
1null
2null
3[ '123' ]
4[ '1234' ]
However, remember that the opposite is not possible. You cannot omit the lower boundary.
1let regex = /\d{,3}/g; // This cannot match anything
2
3let str1 = "1";
4let str2 = "12";
5let str3 = "123";
6let str4 = "1234";
7
8console.log(str1.match(regex));
9console.log(str2.match(regex));
10console.log(str3.match(regex));
11console.log(str4.match(regex));
1null
2null
3null
4null
For the most commonly used quantifiers, there are some shortcuts provided. For instance,
+
: one or more, which is the same as the quantifier{1,}
.
1let regex = /\d+/g;
2
3let str1 = "a"; // null, because there has to be at least one number digit
4let str2 = "a1";
5let str3 = "a12";
6let str4 = "a123";
7
8console.log(str1.match(regex));
9console.log(str2.match(regex));
10console.log(str3.match(regex));
11console.log(str4.match(regex));
1null
2[ '1' ]
3[ '12' ]
4[ '123' ]
?
: zero or one, which is the same as the quantifier{0,1}
.
1let regex = /neighbou?r/g;
2
3let str1 = "neighbor";
4let str2 = "neighbour";
5
6console.log(str1.match(regex));
7console.log(str2.match(regex));
1[ 'neighbor' ]
2[ 'neighbour' ]
Notice that both strings are matched, because the letter u
is optional (zero or one).
*
: zero or more, which is the same as the quantifier{0,}
. Its main difference with+
is that*
allows the regular expression to match zero times. For example,
1let regex = /\d*/g;
2
3let str1 = "";
4
5console.log(str1.match(regex));
1[ '' ]
Character groups
In our previous examples, the quantifiers only work on a single character, but what if you need to describe a pattern with repetitions of a group of characters? For example, how can you repeat the pattern xyz
five time?
To do this, you need to group this pattern inside a pair of parentheses:
1let regex = /(xyz){5}/g;
2
3let str = "xyzxyzxyzxyzxyz";
4
5console.log(str.match(regex));
And now, the following quantifier will operate on xyz
, and not just the letter z
.
1[ 'xyzxyzxyzxyzxyz' ]
Alterations
You can also define a regular expression that matches different options of patterns. For example, if you are looking for patterns either xyz
, or abc
, or \d\d\d
, this is what you can do:
1let regex = /(xyz)|(abc)|(\d\d\d)/g;
2
3let str = "xyz xyz abc 123 456";
4
5console.log(str.match(regex));
1[ 'xyz', 'xyz', 'abc', '123', '456' ]
Notice that the patterns xyz
, abc
, and \d\d\d
are placed inside their own parentheses, that is to make sure they are treated as individual groups.
Look ahead and look behind
Lastly, sometimes you need to look for a pattern only if it is followed or preceded by another pattern.
For example, you have a string with multiple numbers, but you only want to match those that have a dollar sign in front of it. This is called "look behind", because you are looking for a pattern behind another pattern.
Look behind has the following syntax:
1(?<=Y)X
The pattern X
will be matched only if there is a Y
in front of it.
1let regex = /(?<=\$)\d+/g;
2
3let str = "100 $123";
4
5console.log(str.match(regex));
Because the dollar sign ($
) has a special meaning in regular expressions, you must put a backslash (\
) in front of it in order to match a regular dollar sign.
1[ '123' ]
There is also a negative look behind that matches X
only if there is no Y
in front of it, which has the following syntax:
1(?<!Y)X
For example,
1let regex = /(?<!\$)\d+/g;
2
3let str = "100 $123";
4
5console.log(str.match(regex));
1[ '100', '23' ]
The look ahead works similarly. For example, the following syntax matches X only if there is a Y afterward.
1X(?=Y)
For instance,
1let regex = /\d+(?=\$)/g;
2
3let str = "100 123$";
4
5console.log(str.match(regex));
1[ '123' ]
And there is also a negative look ahead that matches X only if there is no Y afterward.
1X(?!Y)
Here is an example,
1let regex = /\d+(?!\$)/g;
2
3let str = "100 123$";
4
5console.log(str.match(regex));
1[ '100', '12' ]