How to Define Patterns in JavaScript with Regular Expressions

In the previous lesson, we introduced the regular expression, a programming tool used to match patterns in a string. It is built into many different programming languages, and JavaScript is one of them.

The previous lesson focused on how to match a pattern using the built-in methods match() and matchAll(), as well as different matching modes that can be activated by providing the right flag. In this lesson, we are going to cover different ways to describe a pattern using the regular expression.

Matching a set of characters

In a regular expression, you can use a square bracket to match a set of characters instead of just one. For example,

javascript
1let regex = /[01234][56789][abc]/g;
2
3let str1 = "18b"; // matched
4let str2 = "98b"; // null, because 9 is outside of [01234]
5let str3 = "18z"; // null, because z is outside of [abc]
6
7console.log(str1.match(regex));
8console.log(str2.match(regex));
9console.log(str3.match(regex));
text
1[ '18b' ]
2null
3null

This regular expression defines the following pattern:

A number between 0 and 4, followed by a number between 5 and 9, followed by a letter between a and c.

As a result, the str1 falls into this rage, but the other two don't.

There is a easier way to define a range of characters by using a hyphen (-).

javascript
1let regex = /[0-4][5-9][a-c]/g;
2
3let str1 = "18b";
4let str2 = "98b";
5let str3 = "18z";
6
7console.log(str1.match(regex));
8console.log(str2.match(regex));
9console.log(str3.match(regex));
text
1[ '18b' ]
2null
3null

Shortcuts are provided for some commonly used character sets. We've seen an example of this, the \d character.

  • \d

\d matches all number digits, which is the same as defining /[0-9]/.

javascript
1let regex = /\d\d\d/g;
2
3let str1 = "123";
4let str2 = "54321";
5let str3 = "1a2b3c";
6
7console.log(str1.match(regex));
8console.log(str2.match(regex));
9console.log(str3.match(regex));
text
1[ '123' ]
2[ '543' ]
3null

Pay attention to str2, and notice that when a match is found, JavaScript will continue looking after the last digit of the match, in this case, the digit 3, so 432 and 321 are not considered.

  • \w

\w matches a word character, including 26 characters in the Latin alphabet, the numeric digits, and for some reason, the underscore.

javascript
1let regex = /\w\w\w/g;
2
3let str1 = "12345";
4let str2 = "1a2b3c";
5let str3 = "1a_2b_3c";
6
7console.log(str1.match(regex));
8console.log(str2.match(regex));
9console.log(str3.match(regex));
text
1[ '123' ]
2[ '1a2', 'b3c' ]
3[ '1a_', '2b_' ]
  • \s

\s matches all white space characters, including space, newline, tab, and so on.

javascript
1let regex = /\s/g;
2
3let str1 = "12 34 56";
4let str2 = "12\n34\t56";
5let str3 = "";
6
7console.log(str1.match(regex));
8console.log(str2.match(regex));
9console.log(str3.match(regex));
text
1[ ' ', ' ' ]
2[ '\n', '\t' ]
3null
  • The dot character (.)

The dot character matches everything, except for a newline character.

javascript
1let regex = /.../g;
2
3let str1 = "123\t\t\t";
4let str2 = "abc   ";
5let str3 = "\n\n\n";
6
7console.log(str1.match(regex));
8console.log(str2.match(regex));
9console.log(str3.match(regex));
text
1[ '123', '\t\t\t' ]
2[ 'abc', '   ' ]
3null

As we've mentioned before, by enabling the s mode (also known as the dotall mode), you can make . match the newline character as well.

javascript
1let regex = /.../gs;
2
3let str1 = "123\t\t\t";
4let str2 = "abc   ";
5let str3 = "\n\n\n";
6
7console.log(str1.match(regex));
8console.log(str2.match(regex));
9console.log(str3.match(regex));
text
1[ '123', '\t\t\t' ]
2[ 'abc', '   ' ]
3[ '\n\n\n' ]

Excluding a set of characters

Besides matching for a specified set of characters, you can also define a regular expression to match all characters other than the specified characters, by placing a caret (^) right after the opening square bracket.

javascript
1let regex = /[^1-9][^a-e]/g;
2
3let str1 = "0b"; // null, because b falls in the range of a-e
4let str2 = "0z"; // matched
5let str3 = "2x"; // null, because 2 falls in the range of 1-9
6
7console.log(str1.match(regex));
8console.log(str2.match(regex));
9console.log(str3.match(regex));
text
1null
2[ '0z' ]
3null

For the shortcuts \d, \w, and \s, their inverses would be \D, \W, and \S.

  • \D: anything but a numeric digit.
  • \W: anything but a word character.
  • \S: anything but a white space character.
javascript
1let regex = /\D\D/g;
2
3let str1 = "ab";
4let str2 = "0z"; // null, because 0 is a numeric digit
5let str3 = "\n\n";
6
7console.log(str1.match(regex));
8console.log(str2.match(regex));
9console.log(str3.match(regex));

Start and end anchors

You can mark the start of a text using a caret (^). Yes, the same caret we just saw, but it has a different meaning outside of the square brackets.

javascript
1let regex = /^xyz/g;
2
3let str1 = "abc xyz"; // null, because xyz is not at the beginning of the text
4let str2 = "xyz xyz"; // Only the first xyz is returned
5let str3 = "xyz";
6
7console.log(str1.match(regex));
8console.log(str2.match(regex));
9console.log(str3.match(regex));

This regular expression looks for the pattern xyz, and the letter x must be the start of the text.

text
1null
2[ 'xyz' ]
3[ 'xyz' ]

Notice that even though str1 contains the pattern xyz, it is not matched because the pattern is not at the beginning of the string.

Similarly, you can mark the end of a text using a dollar sign ($).

javascript
1let regex = /xyz$/g;
2
3let str1 = "abc xyz";
4let str2 = "xyz xyz"; // Only the second xyz is matched
5let str3 = "xyz abc"; // null, because xyz is not at the end of the text
6
7console.log(str1.match(regex));
8console.log(str2.match(regex));
9console.log(str3.match(regex));
text
1[ 'xyz' ]
2[ 'xyz' ]
3null

Of course, it is possible to use both markers at the same time.

javascript
1let regex = /^xyz$/g;
2
3let str1 = "xyz";
4let str2 = "xyz xyz";
5let str3 = "abc xyz abc";
6
7console.log(str1.match(regex));
8console.log(str2.match(regex));
9console.log(str3.match(regex));
text
1[ 'xyz' ]
2null
3null

In this case, the regular expression matches the pattern xyz, x must be the start of the text, and z must be the end of the text.

Lastly, after discussing the start and end anchors, we must look back at the m flag. By default, the anchors only mark the start and end of the whole text, even for a multiline text. For example,

javascript
1let regex = /^xyz$/g;
2
3let str = `xyz xyz
4xyz
5xyz xyz`;
6
7console.log(str.match(regex));
text
1null

But under the m mode, ^ and $ also mark the start and end of individual lines.

javascript
1let regex = /^xyz$/gm;
2
3let str = `xyz xyz
4xyz
5xyz xyz`;
6
7console.log(str.match(regex));
text
1[ 'xyz' ]

Word boundary

First of all, recall what is a word character.

Word characters are Latin letters, numeric digits, and the underscore.
Non-word characters, on the other hand, are anything but Latin letters, numeric digits, and the underscore, such as spaces, newlines, tabs, commas, periods, and so on.

In JavaScript, you can specify the boundary of a word using \b. There are three types of word boundaries:

  • Between the beginning of the text and a word character.
  • Between a word character and the end of the text.
  • In the middle of the text, between a word character and a non-word character.

For example,

javascript
1let regex = /\bJava\b/g;
2
3let str1 = "I'm learning to code in Java";
4let str2 = "I'm learning to code in JavaScript";
5
6console.log(str1.match(regex));
7console.log(str2.match(regex));
text
1[ 'Java' ]
2null

This regular expression is looking for the pattern Java, and there must be a word boundary before the letter J, and another word boundary after the letter a.

For str1, before the letter J, there is a space character, which counts as a word boundary, and after the letter a, there is the end of the text, which also counts as a word boundary. So the matched result Java is returned.

For str2, there is a word boundary before the letter J, but after the letter a, there is another word character S, which does not fit our criteria to be a word boundary, so null is turned.

Quantifiers

Previously, when we needed to match three number digits, this is the regular expression we made:

javascript
1let regex = /\d\d\d/g;

But, imagine you need to match for an ID number of 11 digits, the already cryptic regular expression will become even more difficult to read:

javascript
1let regex = /\d\d\d\d\d\d\d\d\d\d\d/g;

So instead, you can append a quantifier after \d, indicating the number of digits that should be matched. For example, the previous regular expression can be rewritten as:

javascript
1let regex = /\d{11}/g;
2
3let str1 = "12345678901";
4let str2 = "12345";
5
6console.log(str1.match(regex));
7console.log(str2.match(regex));
text
1[ '12345678901' ]
2null

You can also define a range. For example, this is how you can match numeric values of 3 to 5 digits.

javascript
1let regex = /\d{3,5}/g;
2
3let str1 = "12"; // null, because there are less than 3 digits
4let str2 = "123";
5let str3 = "1234";
6let str4 = "12345";
7let str5 = "123456"; // matched, but return "12345"
8
9console.log(str1.match(regex));
10console.log(str2.match(regex));
11console.log(str3.match(regex));
12console.log(str4.match(regex));
13console.log(str5.match(regex));
text
1null
2[ '123' ]
3[ '1234' ]
4[ '12345' ]
5[ '12345' ]

Or you can define an open boundary range by omitting the upper limit.

javascript
1let regex = /\d{3,}/g; // matches more than 3 digits
2
3let str1 = "1";
4let str2 = "12";
5let str3 = "123";
6let str4 = "1234";
7
8console.log(str1.match(regex));
9console.log(str2.match(regex));
10console.log(str3.match(regex));
11console.log(str4.match(regex));
text
1null
2null
3[ '123' ]
4[ '1234' ]

However, remember that the opposite is not possible. You cannot omit the lower boundary.

javascript
1let regex = /\d{,3}/g; // This cannot match anything
2
3let str1 = "1";
4let str2 = "12";
5let str3 = "123";
6let str4 = "1234";
7
8console.log(str1.match(regex));
9console.log(str2.match(regex));
10console.log(str3.match(regex));
11console.log(str4.match(regex));
text
1null
2null
3null
4null

For the most commonly used quantifiers, there are some shortcuts provided. For instance,

  • +: one or more, which is the same as the quantifier {1,}.
javascript
1let regex = /\d+/g;
2
3let str1 = "a"; // null, because there has to be at least one number digit
4let str2 = "a1";
5let str3 = "a12";
6let str4 = "a123";
7
8console.log(str1.match(regex));
9console.log(str2.match(regex));
10console.log(str3.match(regex));
11console.log(str4.match(regex));
text
1null
2[ '1' ]
3[ '12' ]
4[ '123' ]
  • ?: zero or one, which is the same as the quantifier {0,1}.
javascript
1let regex = /neighbou?r/g;
2
3let str1 = "neighbor";
4let str2 = "neighbour";
5
6console.log(str1.match(regex));
7console.log(str2.match(regex));
text
1[ 'neighbor' ]
2[ 'neighbour' ]

Notice that both strings are matched, because the letter u is optional (zero or one).

  • *: zero or more, which is the same as the quantifier {0,}. Its main difference with + is that * allows the regular expression to match zero times. For example,
javascript
1let regex = /\d*/g;
2
3let str1 = "";
4
5console.log(str1.match(regex));
text
1[ '' ]

Character groups

In our previous examples, the quantifiers only work on a single character, but what if you need to describe a pattern with repetitions of a group of characters? For example, how can you repeat the pattern xyz five time?

To do this, you need to group this pattern inside a pair of parentheses:

javascript
1let regex = /(xyz){5}/g;
2
3let str = "xyzxyzxyzxyzxyz";
4
5console.log(str.match(regex));

And now, the following quantifier will operate on xyz, and not just the letter z.

text
1[ 'xyzxyzxyzxyzxyz' ]

Alterations

You can also define a regular expression that matches different options of patterns. For example, if you are looking for patterns either xyz, or abc, or \d\d\d, this is what you can do:

javascript
1let regex = /(xyz)|(abc)|(\d\d\d)/g;
2
3let str = "xyz xyz abc 123 456";
4
5console.log(str.match(regex));
text
1[ 'xyz', 'xyz', 'abc', '123', '456' ]

Notice that the patterns xyz, abc, and \d\d\d are placed inside their own parentheses, that is to make sure they are treated as individual groups.

Look ahead and look behind

Lastly, sometimes you need to look for a pattern only if it is followed or preceded by another pattern.

For example, you have a string with multiple numbers, but you only want to match those that have a dollar sign in front of it. This is called "look behind", because you are looking for a pattern behind another pattern.

Look behind has the following syntax:

text
1(?<=Y)X

The pattern X will be matched only if there is a Y in front of it.

javascript
1let regex = /(?<=\$)\d+/g;
2
3let str = "100 $123";
4
5console.log(str.match(regex));

Because the dollar sign ($) has a special meaning in regular expressions, you must put a backslash (\) in front of it in order to match a regular dollar sign.

text
1[ '123' ]

There is also a negative look behind that matches X only if there is no Y in front of it, which has the following syntax:

text
1(?<!Y)X

For example,

javascript
1let regex = /(?<!\$)\d+/g;
2
3let str = "100 $123";
4
5console.log(str.match(regex));
text
1[ '100', '23' ]

The look ahead works similarly. For example, the following syntax matches X only if there is a Y afterward.

text
1X(?=Y)

For instance,

javascript
1let regex = /\d+(?=\$)/g;
2
3let str = "100 123$";
4
5console.log(str.match(regex));
text
1[ '123' ]

And there is also a negative look ahead that matches X only if there is no Y afterward.

text
1X(?!Y)

Here is an example,

javascript
1let regex = /\d+(?!\$)/g;
2
3let str = "100 123$";
4
5console.log(str.match(regex));
text
1[ '100', '12' ]
How to Define Patterns in JavaScript with Regular Expressions | TheDevSpace