[JS] 正則表達式（Regular Expression, regex）

'str'.match(/[0-9]+/); // 1 次以上的數字，等同於 "\d"
'str'.match(/[A-Za-z]+/); // 1 次以上的英文字
'str'.match(/[A-Za-z0-9_]+/); // 1 次以上的英數字含底線，等同於 "\w"
'str'.match(/.+/); // 1 次以上的任意字元

* 表示前一個字元可以是 0 個或多個，例如 /ab*c/，因此 ac, abc, abbbbc 都符合規則。
+ 表示前一個字元可以是 1 個或多個，例如 /a+b/ ，ab, aaaaab 都符合規則。
? 表示前一個字元可以是 0 個或 1 個
^ 匹配輸入的開頭，例如 /^a/ ， a dog 會符合，但 cats 中的 a 不會。
$ 匹配輸入的結尾，例如 /t$/，eat 會符合，但 eaten 中的 t 不會。
. 用來表示任意字元

regex = \$(?!\{)  // 使用 negative lookahead (?!ABC) 找出所有 $ 但不是 ${ 的字

參考資料

👍 Regular expressions @ JavaScript.info
Regular Expression @ MDN - JavaScript Guides
Regular Expression @ MDN - Reference
I hate regex: 可以找到許多常用的 regex 範例
常用正規表達式

建立正規式

正則表達式的規則稱作 pattern。在 JavaScript 中可以透過 Regular expression literals 的方式或建構式的方式來建立 regular expressions pattern：

Regular expression literals

/**
 * Regular expression literals: script 載入時即編譯
 * 當 pattern 不會改變時，使用此方式定義 pattern 效能較好。
 **/
var re = /ab+c/;

Function Constructor

/**
 * Function constructor for RegExp object: 程式執行過程才會被編譯
 * 效能較差，適合用在 regular expression pattern 可能會改變時使用
 **/

var re = new RegExp('ab+c');
var myRe = new RegExp('d(b+)d', 'g');

Regular expression literals 效能較好，適合 pattern 不會改變的情況；Function Constructor 效能較差，適合用在 pattern 可能動態改變的情況。

使用正規式

在 JavaScript 中可以使用正規式的函式包含

RegExp.prototype.test()：搜尋字串中是否有符合的部分，回傳 true/false。
RegExp.prototype.exec()：以陣列回傳字串中匹配到的部分，否則回傳 null。
String.prototype.match()：以陣列回傳字串中匹配到的部分，否則回傳 null。
String.prototype.replace()：尋找字串中匹配的部分，並取代之。
String.prototype.search()：尋找字串中是否有符合的部分，有的話回傳 index，否則回傳 -1。
String.prototype.split()：在字串根據匹配到的項目拆成陣列。

簡單來說，當你想要看字串是否包含某 pattern 時，使用 test 或 search；想要更多的資訊（花較多耗效能），則使用 exec 或 match。

String.prototype.replace()：取代內容

使用 String.prototype.replace(regex|substr, newSubstr) 來置換內容，這個方法會回傳置換後的新字串，不會改變原本的字串：

// 只接把 regex 寫在裡面
newString = <String>.replace(/<p>/g, '<div class="paragraph">')

// 先建立 regex
let regex = new RegExp(wordToBeReplaced, 'gi')
let newString = <String>.replace(regex, 'wordToReplace')

/.../g: global 的意思，找到之後會繼續往後配對 /.../i: case insensitive 的意思

/* 找到第一個就不往後找 */
'banana'.replace(/na/, 'NA'); // 'baNAna'

/* 把 n 後面和後面的字元 */
'banana'.replace(/n./, 'NA'); // 'baNAna'
'banana'.replace(/na/g, 'NA'); // 'baNANA'
'banana'.replace(/na/gi, 'NA'); // 'baNANA'

搭配括號和 `$n`

var re = /(\w+)\s(\w+)/;
var str = 'John Smith';
var newStr = str.replace(re, '$2, $1');
console.log(newStr);

搭配 `replacer function`

function replacer(match, p1, p2, p3, offset, string) {
  // p1 is non-digits, p2 digits, and p3 non-alphanumerics
  return [p1, p2, p3].join(' - ');
}
var newString = 'abc12345#$*%'.replace(/([^\d]*)(\d*)([^\w]*)/, replacer);
console.log(newString); // abc - 12345 - #$*%

String.prototype.match：尋找並取出內容

String.prototype.match @ MDN

使用 String.prototype.match(regexp) 這個方法來判斷給的字串當中是否有符合該 regexp pattern 的內容，有的話以陣列回傳，沒有的話回傳 null。

如果這個 regexp 的 pattern 不包含 g 標籤，那麼 str.match() 回傳的結果和 RegExp.exec() 是一樣的，在回傳的陣列中會包含：

input 屬性：原本被解析的字串
index 屬性：第一個被找的字串的 index 值
所有配對的結果

/* 不包含 g 的話，結果和 RegExp.exec() 一樣 */

let matchedResult = 'An apple a day, keeps apple away.'.match(/(a.)(p.)e/);
// [ 'apple', 'ap', 'pl', index: 3, input: 'An apple a day, keeps apple away.' ]

如果 pattern 中包含 g 的話，那麼回傳的陣列中會直接是整個被 matched 到的字：

/* 包含 g 的話，會直接回傳配對到的結果 */

let matchedResult = 'An apple a day, keeps apple away.'.match(/(a.)(p.)e/g);
// [ 'apple', 'apple' ]

若給入一個非 regexp 的物件，則會自動透過 new RegExp(obj) 轉換；若沒有代入任何參數的話，則會得到帶有空字串的陣列（[""]）。

使用範例

每個 () 會變成一個 $

let matchedResult = 'An apple a day'.match(/(a.)(p.)e/);
RegExp.$1; // ap
RegExp.$2; // pl

'banana'.match(/(.)(a)/g); // [ 'ba', 'na', 'na' ]
// $1 = ['b', 'n', 'n']
// $2 = ['a', 'a', 'a']

// "/" 是特殊字元要用反斜線
'2017/05/16'.match(/(.*)\/(.*)\/(.*)/);
// ['2017/05/16', '2017', '05', '16']

'2017/05/16'.match(/.*\/.*\/.*/);
// [ '2017/05/16' ]

/**
 * 擷取網址中的內容
 **/

let url = 'https://www.ptt.cc/bbs/CodeJob/M.1513840968.A.F93.html'
let timestamp = url.match(/\/M\.(.+)\.A/)

console.log(timestamp[1])  // 1513840968

// result of timestamp
[
  '/M.1513840968.A',  // 該正規式會匹配到的內容
  '1513840968',      // 透過 match () 選取到的內容
  index: 30,      // 從哪個 index 開始批配到
  input: 'https://www.ptt.cc/bbs/CodeJob/M.1513840968.A.F93.html' // 輸入的內容
]

搭配 filter 篩選結果

搭配 Array.prototype.filter 我們就可以根據使用者輸入的內容（wordToMatch）來從 cities 中篩選資料：

function findMatch(wordToMatch, cities) {
  return cities.filter((place) => {
    /**
     * g: global search
     * i: case insensitive search
     **/
    let regex = new RegExp(wordToMatch, 'gi');
    return place.city.match(regex) || place.state.match(regex);
  });
}

[JS30] Day06: AJAX Type Ahead

String.prototype.search()：檢驗字串是否包含

var str = 'hey JudE';
var re = /[A-Z]/g;
var re2 = /[.]/g;
console.log(str.search(re)); // returns 4, which is the index of the first capital letter "J"
console.log(str.search(re2)); // returns -1 cannot find '.' dot punctuation

RegExp.prototype.test()：檢驗字串是否包含

// 判斷是不是數值
/^[0-9]+$/.test(<testString>)    // return True/False

RegExp.prototype.exec：尋找並取出內容

let regexp = /d(b+)d/g;
let matchedResult = regexp.exec('cdbbdbsbz');
// [ 'dbbd', 'bb', index: 1, input: 'cdbbdbsbz' ]

// 透過 while 取出所有配對到的結果
while ((arr = regexp.exec('table football, foosball')) !== null) {
  console.log(`Found ${arr[0]}. Next starts at ${regexp.lastIndex}.`);
  // expected output: "Found foo. Next starts at 9."
  // expected output: "Found foo. Next starts at 19."
}

使用 `RegExp.$1` 來取得配對到的值

這是非標準的使用方式，請勿在正式環境使用：

var re = /(\w+)\s(\w+)/;
var str = 'John Smith';
str.replace(re, '$2, $1'); // "Smith, John"
RegExp.$1; // "John"
RegExp.$2; // "Smith"

RegExp.$1-$9 @ MDN

群組與命名群組（Group and Named Capture Group）

透過 () 可以把配對到的內容分成不同組別（group）放到陣列中：

const regexp = /(\w+)\.jpg/;
const matched = regexp.exec('File name: cat.jpg');
// [
//   'cat.jpg',
//   'cat',
//   index: 11,
//   input: 'File name: cat.jpg',
//   groups: undefined
// ]
const fileName = matched[1]; // car

配對到的內容會被放在陣列當中，因此可以透過解構賦值（destructuring assignment）的方式，將想要的內容取出：

const [match, year, month, day] = regexpWithGroup.exec(str);

命名群組（named group）

❗ 命名群組（named group）的這個功能屬於 ES2018 須留意相容性。

在 ES2018 中則可以使用 (?<name>…) 為組別命名，所有命名的組別會被放在名為 groups 物件內：

// 使用 (?<name>) 來為組別命名
const regexp = /(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/;
const match = regexp.exec('2020-03-04');

console.log(match.groups); // {year: "2020", month: "03", day: "04"}
console.log(match.groups.year); // 2020
console.log(match.groups.month); // 03
console.log(match.groups.day); // 04

如果命名群組中的內容沒有被匹配到的話，該群組 groups 的屬性仍會存在，只是會得到 undefined 的值。

使用 ?: 這可以把 group 起來，但不需要用到的內容隱藏起來（shy group）。

搭配 replace 使用

在 replace 後面可以接 function，在這個 function 則可以直接取得配對到的內容和分組的結果：

const str = 'War & Peace';

const result = str.replace(
  /(?<War>War) & (?<Peace>Peace)/,
  function (match, group1, group2, offset, string) {
    return group2 + ' & ' + group1;
  },
);

console.log(result); // → Peace & War

Sample Code

範例程式碼 @ repl.it

特殊字元 (character)

Regular Express Reference @ MDN

標籤（flag）

regex = /hello/; // 區分大小寫，匹配 "hello", "hello123", "123hello123", "123hello"，但不匹配 "hell0", "Hello"
regex = /hello/i; // 不區分大小寫 "hello", "HelLo", "123HelLO"
regex = /hello/g; // 全域搜尋

dotAll Flag, `/s`

ES 2019 新增 /s 的標籤，過去 . 可以用來匹配除了換行符號以外（\n, \r）的所有字元：

// 過去 . 可以匹配到除了「換行符號」以外的所有字元
console.log(/./.test('\n')); // → false
console.log(/./.test('\r')); // → false

過去雖然可以使用 [\w\W] 來匹配到換行符號，但這不是最好的做法：

console.log(/[\w\W]/.test('\n')); // → true
console.log(/[\w\W]/.test('\r')); // → true

在 ES 2019 中，只要最後有標記 /s 的標籤，如此 . 將也能夠匹配到換行符號：

console.log(/./s.test('\n')); // → true
console.log(/./s.test('\r')); // → true

普通字元 `//`

var regex = /a/;
var regex = /is/;

反斜線 `\`

/* 在「非」特殊字元前面使用反斜線時，表示要把反斜線後的字當作是特殊字元 */
var regex = /\b/; // b 原本不是特殊字元，這個 b 要當成特殊字元

/* 在特殊字元前面使用反斜線時，表示要把反斜線後的字當作是「非」特殊字元 */
var regex = /if\(true/; // ( 原本是特殊字元，但這裡要當成非特殊字元
var regex = /1\+2=3/; // + 原本是特殊字元，但這裡要當成非特殊字元

任意一個字元 `.`

可以用來匹配除了換行符號（\n）以外的所有字元：

var regex = /a.man/; // a*man 都會 match，例如 "acman", "awman", 但 "a\nman" 無法匹配。

var regex = /.a/; // 任何一個字元後加上 a

多個字元 `[]`

// 小寫 a 或大寫 A
var regex = /[aA]/;

// 匹配所有不是 a 或 A 的字
var regex = /[^aA]/;

// a, e, i, o, u 都會 match
var regex = /[aeiou]/;

// 英文字母
var regex = /[a-z]/; // 所有小寫的字母，從小寫 a 到小寫 z
var regex = /[A-Z]/; // 所有大寫的字母，從大寫 A 到大寫 Z
var regex = /[a-zA-Z]/; // 所有英文字母

// 數字 5 ~ 8
var regex = /[5-8]/;

括號`()`：套用到所有

var regex = /^a|^the|^an/; // 套用到裡面所有的

var regex = /^(a|the|an)/; // 等同於

不是（除了） `^`

/* 不是 a 都會 match */
var regex = /[^a]/;

/* 不是數字都會 match */
var regex = /[^0-9]/;

多個字元縮寫

keywords: `\d`, `\w`, `\s`, `\b`, `\D`, `\W`, `\S`

\d: digit，[0-9]
\w: word，包含英文大小寫、數字、底線，[A-Za-z0-9_]
\s: space，包含 space, tab, form feed, line feed，[\f\n\r\t\v\u00a0\u1680\u2000-\u200a\u2028\u2029\u202f\u205f\u3000\ufeff]
\D: 不是 digit，等同於 [^\d]
\W: 不是 word [^\w]
\S: 不是 space [^\s]

/* 所有 word + e */
var regex = /\we/;

/* 連續兩個任意的數值 */
var regex = /\d\d/;

/* 句子中結尾為 s 的單字 */
var regex = /s\b/;

var regex = /\b[a-z]/g; // 句子中各個單字的第一個字母

其他特殊字元

\t：tab
\b：word boundary，用來比對單字和單字間的空白，/s\b/ 則會比對句子中最一個字母是 s 的單字

Word boundary `\b`, `\B`

透過 \b 可以配對 word boundary，word boundary 指的是一個字元的前後沒有其他任何字元。

要注意 \b 和 [\b] 是不一樣的，[\b] 是用來配對 backspace。

// is 這個單字才會被選到，Th`is` 的 is 不會
let matchedResult = 'This is an apple.'.match(/\bis\b/);
// [ 'is', index: 5, input: 'This is an apple.' ]

相反地，\B 則是 non-word boundary，包含：

Before the first character of the string, if the first character is not a word character.
After the last character of the string, if the last character is not a word character.
Between two word characters
Between two non-word characters
The empty string

// 使用 \B 會配對到 This 中的 is

let matchedResult = 'This is an apple.'.match(/\Bis/);
// [ 'is', index: 2, input: 'This is an apple.' ]

出現次數 `* + ? {} {, }`

keywords: `*`, `+`, `?`, `{次數}`, `{最少次數, 最多次數}`

*: 任意次數，等同於 {0,}
+: 至少一次（後面要跟著），等同於 {1,}
?: 零或一次（有或沒有），等同於 {0,1}
{次數}
*{最少次數, 最多次數}**

var regex = /abc/; // 找到符合 "abc"

var regex = /ab*c/; // *表示前一個單字可以是 0 個或多個，因此 ac, abc, abbbbc 都符合規則

var regex = /n?a/; // n 可有可無

var regex = /a{2}/; // a 要 2 次，所以會是 a

var regex = /a{2,4}/; // a 介於 2 次到 4 次之間

var regex = /a{2,}/; // 2 次以上的 a 都可以，大括號後面不要有空格

var regex = /(hello){4}/; // 4 次的 hello，hellohellohellohello

var regex = /\d{3}/; // 3 次的數字

開頭與結尾

keywords: `^`

^ 開頭
$ 結尾

/* 以 A 開頭的字才會匹配到 */
/^A/gm.test('Abc'); // true
/^A/gm.test('bac'); // false

/* 開頭有 He */
var regex = /^He/;

/* 結尾有 llo */
var regex = /llo$/;

/* 開頭 He 結尾 llo 中間任意字元可以有任意次數 */
var regex = /^He.*llo$/;

或 `|`

// and 或 android，match 到 `and`roid 就不 match `android`
var regex = /and|android/;

// match 到 android 還是會 match and
var regex = /android|and/;

LookAround Assertions

keywords: `x(?=y)`, `x(?!y)`

Lookahead assertions: x(?=y), x(?!y)
Lookbehind assertions: (?<=y)x, (?<!y)x

Look Ahead

?=：後面需要跟著
?!：後面不能跟在

// foo(?=bar)，foo 後面要跟著 bar 才會配對到 foo
const regexp = /foo(?=bar)/;
regexp.exec('foo'); // null
regexp.exec('bar'); // null
regexp.exec('foobar'); // [ 'foo', index: 0, input: 'foobar', groups: undefined ]

// foo(?!bar)，foo 後面不能跟著 bar，如此才會配對到 foo
const regexp = /foo(?!bar)/;
regexp.exec('foo'); // [ 'foo', index: 0, input: 'foo', groups: undefined ]
regexp.exec('foo123'); // [ 'foo', index: 0, input: 'foo123', groups: undefined ]
regexp.exec('bar'); // null
regexp.exec('foobar'); // null

Look Behind

?<= 前面需要跟著才會匹配到
?<!：前面不能跟著才會匹配到

// (?<=foo)bar，當 bar 前面有 foo 時才會配對到 bar
const regexp = /(?<=foo)bar/;
regexp.exec('foo'); // null
regexp.exec('bar'); // null
regexp.exec('foobar'); // [ 'bar', index: 3, input: 'foobar', groups: undefined ]

// (?<!foo)bar，當 bar 前面沒有 foo 時才會配對到 bar
const regexp = /(?<!foo)bar/;
regexp.exec('foo'); // null
regexp.exec('bar'); // [ 'bar', index: 0, input: 'bar', groups: undefined ]
regexp.exec('123bar'); // [ 'bar', index: 3, input: '123bar', groups: undefined ]
regexp.exec('foobar'); // null

❗ Lookbehind assertions 屬於 ES2018 的語法，須注意相容性。

Backreferences

backreferences @ javascript.info

語法是 \N，其中的 N 是數字（例如，\1），適合用在有對稱的情況。

貪婪模式（Greedy Mode）

greedy and lazy quantifiers @ JavaScript.info

預設會啟用貪婪模式，如果想要關閉貪婪模式，也就是讓到一匹配到就停止，可以使用在 * 、 + 等後面加上 ?，例如 .*?、.+?。

常用例子

西元生日

var regex = /^[1-9]\d{3}-\d{2}-\d{2}$/;

開頭只能 1-9 ^[1-9] 連續三個數字 \d{3} 連續兩個數字 \d{2}

身份證字號

var regex = /^[A-Z]\d{9}$/;

第一個是大寫英文字，後面連續 9 個數字

GMAIL

var regex = /^\w+@gmail\.com$/;

+ 不能什麼都不填 \. 因為 . 是特殊字元

網址（URL）

/* 需要 http(s) protocol */
var regex =
  /https?:\/\/(www\.)?[-a-zA-Z0-9@:%._\+~#=]{2,256}\.[a-z]{2,6}\b([-a-zA-Z0-9@:%_\+.~#?&//=]*)/;

/* 不需要 http(s) protocol */
var regex = /[-a-zA-Z0-9@:%._\+~#=]{2,256}\.[a-z]{2,6}\b([-a-zA-Z0-9@:%_\+.~#?&//=]*)/;

What is a good regular expression to match a url @ StackOverflow

HTML 標籤

Match HTML Tag

// 關鍵在於 [^>]*，匹配除了 > 以外的其他內容

var regex = /<\s*a[^>]*>(.*?)<\s*\/\s*a>/g;

筆記來源

5/17 19:30 承億主講 regular expression @ 線上讀書會
Regular Express Reference @ MDN - Global Object Reference
Regular Expression Guide @ MDN - JavaScript Guides
regexCheatSheet @ Gist
New JavaScript Features That Will Change How You Write Regex @ Smashing Magazine

建立正規式​

Regular expression literals​

Function Constructor​

使用正規式​

String.prototype.replace()：取代內容​

搭配括號和 $n​

搭配 replacer function​

String.prototype.match：尋找並取出內容​

使用範例​

搭配 filter 篩選結果​

String.prototype.search()：檢驗字串是否包含​

RegExp.prototype.test()：檢驗字串是否包含​

RegExp.prototype.exec：尋找並取出內容​

使用 RegExp.$1 來取得配對到的值​

群組與命名群組（Group and Named Capture Group）​

命名群組（named group）​

搭配 replace 使用​

Sample Code​

特殊字元 (character)​

標籤（flag）​

dotAll Flag, /s​

普通字元 //​

反斜線 \​

任意一個字元 .​

多個字元 []​

括號()：套用到所有​

不是（除了） ^​

多個字元縮寫​

keywords: \d, \w, \s, \b, \D, \W, \S​

其他特殊字元​

Word boundary \b, \B​

出現次數 * + ? {} {, }​

keywords: *, +, ?, {次數}, {最少次數, 最多次數}​

開頭與結尾​

keywords: ^​

或 |​

LookAround Assertions​

keywords: x(?=y), x(?!y)​

Look Ahead​

Look Behind​

Backreferences​

貪婪模式（Greedy Mode）​

常用例子​

西元生日​

身份證字號​

GMAIL​

網址（URL）​

HTML 標籤​

筆記來源​