Regular expression is a programming tool widely used in many different programming languages to describe patterns in a string. It has a very cryptic syntax, but is also extremely useful.
Creating a regular expression
JavaScript offers two ways to create a regular expression. You can either use the RegExp()
constructor:
1let regex = new RegExp("xyz");
Or a pair of forward slashes:
1let regex = /xyz/;
In this example, the regular expression matches the pattern "x
followed by a y
and then a z
"
String matching
There is a built-in method for strings in JavaScript called match()
, which allows you to match a string against a regular expression. For instance:
1let regex = /xyz/;
2let str = "This string contains the pattern xyz";
3
4console.log(str.match(regex));
1[
2 'xyz',
3 index: 33,
4 input: 'This string contains the pattern xyz',
5 groups: undefined
6]
If the string contains the pattern /xyz/
, an array will be returned containing the matched string as well as other information. Such as where the match was found, what was the original input string, and so on.
And if the pattern is not found, the method will return null
.
1let regex = /xyz/;
2let str = "This string does not contain the pattern.";
3
4console.log(str.match(regex));
1null
Matching for numeric digits
So it finds the xyz
s in a string, so what? That doesn't seem very useful.
Besides a hardcoded string pattern, you can also define something more flexible. For example, the special character \d
defines a numeric digit.
1let regex = /\d\d\d/;
This regular expression matches a sequence of three numeric digits.
1let regex = /\d\d\d/;
2let str = "123 456 78 9";
3
4console.log(str.match(regex));
1[ '123', index: 0, input: '123 456 78 9', groups: undefined ]
The global flag
However, notice that only the first match (123
) is returned. To make the regular expression return all matches, you need to add the global flag after the closing forward slash.
1let regex = /\d\d\d/g;
2let str = "123 456 78 9";
3
4console.log(str.match(regex));
1[ '123', '456' ]
And this time, both matches are returned.
A recent addition
Besides match()
, a new method called matchAll()
was recently added, which does exactly what match()
does, except it requires a global flag. For example,
1let regex = /\d\d\d/;
2let str = "123 456 78 9";
3
4console.log(str.matchAll(regex));
This will return an error:
1/Users/. . ./index.js:5
2console.log(str.matchAll(regex));
3 ^
4
5TypeError: String.prototype.matchAll called with a non-global RegExp argument
6 at String.matchAll (<anonymous>)
7 at Object.<anonymous> (/Users/. . ./index.js:5:17)
8 . . .
9
10Node.js v21.6.0
The matchAll()
method only accepts regular expressions with a global flag.
1let regex = /\d\d\d/g;
2let str = "123 456 78 9";
3
4console.log(str.matchAll(regex));
1Object [RegExp String Iterator] {}
Instead of an array, an iterable object is returned, which means you can iterate over the returned result using a for of
loop.
1let matches = str.matchAll(regex);
2
3for (let i of matches) {
4 console.log(i);
5}
1[ '123', index: 0, input: '123 456 78 9', groups: undefined ]
2[ '456', index: 4, input: '123 456 78 9', groups: undefined ]
Or convert the result into an array:
1let matches = Array.from(str.matchAll(regex));
2
3console.log(matches);
1[
2 [ '123', index: 0, input: '123 456 78 9', groups: undefined ],
3 [ '456', index: 4, input: '123 456 78 9', groups: undefined ]
4]
This small difference is very important. Remember that the returned result is an iterable object, not an array. Directly accessing the result by specifying indexes will return undefined
.
Lastly, when no matches are found, matchAll()
returns an empty iterable object instead of null
.
1let regex = /\d\d\d/g;
2let str = "12 3 4 56 78 9";
3
4console.log(str.matchAll(regex));
1Object [RegExp String Iterator] {}
Other flags
Besides the global flag, which enables the regular expression to return all matches instead of just the first one, JavaScript also allows five other flags, as shown in the list below:
i
: make the regular expression case insensitive.
By default, the regular expression /xyz/
only matches the string "xyz"
, not "XYZ"
or "Xyz"
. But with the case insensitive flag, x
and X
will both be matched. For example,
1let regex = /xyz/g;
2let str = "Xyz xYz XYZ xyz";
3
4console.log(str.match(regex));
1[ 'xyz' ]
In this case, without the i
flag, the regular expression only matches the last xyz
. But when you add the i
flag:
1let regex = /xyz/gi;
2let str = "Xyz xYz XYZ xyz";
3
4console.log(str.match(regex));
1[ 'Xyz', 'xYz', 'XYZ', 'xyz' ]
m
: multiline mode.
The multiline mode only affects the behavior of markers ^
and $
, which mark the beginning and end of the string.
Without the m
flag, they mark the beginning and end of the entire text, but with the m
flag, they mark the beginning and end of individual lines as well. We will talk more about this later.
s
: dotall mode. Allows the dot (.
) to match for a newline character (\n
).
By default, the dot matches any character except for a new line.
1let regex = /x.y.z/g;
2let str1 = `x1y2z`;
3let str2 = `x\ny\nz`;
4
5console.log(str1.match(regex));
6console.log(str2.match(regex));
1[ 'x1y2z' ]
2null
But under the s
mode, the dot matches for any character, including a newline.
1let regex = /x.y.z/gs;
2let str1 = `x1y2z`;
3let str2 = `x\ny\nz`;
4
5console.log(str1.match(regex));
6console.log(str2.match(regex));
1[ 'x1y2z' ]
2[ 'x\ny\nz' ]
We will talk more about character sets later.
u
: enables full Unicode support.
As we've discussed at the very beginning of this course, when JavaScript was first created, it encodes string characters in 2 bytes, but with the invention of emojis and many other special characters, that quickly become insufficient.
So, the solution was to encode certain characters in 4 bytes instead of 2.
1let str1 = "a";
2let str2 = "😀";
3
4console.log(str1.length);
5console.log(str2.length);
11
22
This is causing problems with regular expression, because by default, it treats all characters as 2 byte characters, meaning it will treat the smiley face emoji as two characters instead of just one. This leads to strange results, as we will see later.
The u
flag fixes this issue with regular expression, and allows it to match 4 byte characters correctly.
y
: sticky mode. Enables searching at the exact position in the text.
This flag is used to match for a pattern at the specified position. For example,
1let regex = /xyz/y;
2let str = "abc xyz abc xyz";
3
4regex.lastIndex = 0;
5console.log(str.match(regex));
6
7regex.lastIndex = 4;
8console.log(str.match(regex));
9
10regex.lastIndex = 5;
11console.log(str.match(regex));
1null
2[ 'xyz', index: 4, input: 'abc xyz abc xyz', groups: undefined ]
3null
To search at an exact position, use the y
flag instead of g
, and then define a lastIndex
property.