How do I read the regex patterns in my rejections?

What is regex?

Regex, or regular expressions, are special sequences used to find or match patterns in strings. These sequences use metacharacters and other syntax to represent sets, ranges, or specific characters. For example, the expression [0-9] matches the range of numbers between 0 and 9.

Regular Expression Language - Quick Reference

Some terminology:

  • pattern: regular expression pattern
  • string: test string used to match the pattern
  • digit: 0-9
  • letter: a-z, A-Z
  • symbol: !$%^&*()_+|~-=`{}[]:”;'<>?,./
  • space: single white space, tab
  • character: refers to a letter, digit or symbol
  • Partial range: selections such as [a-f] or [g-p].
  • Capitalized range: [A-Z].
  • Digit range: [0-9].
  • Symbol range: for example, [#$%&@].
  • Mixed range: for example, [a-zA-Z0-9] includes all digits, lower and upper case letters. Do note that a range only specifies multiple alternatives for a single character in a pattern.To further understand how to define a range, it’s best to look at the full ASCII table in order to see how characters are ordered.

Some helpful basics:

Square Brackets ([]):

The name might sound scary, but it is nothing but the symbol: []. Some people also refer to square brackets as character class – a regular expression jargon word that means that it will match any character inside the bracket. For instance:

Pattern

Matches

[Pp]enguin

Penguin, penguin

[0123456789]

(This will match any digit)

[0oO]

0, o, O


Disjunction (|):

The pipe symbol means nothing but either 'A' or 'B', and it is helpful in cases where you want to select multiple strings simultaneously. For instance:

Pattern

Matches

A|B|C

A, B, C

Black|White

Black, White

[Bb]lack|[Ww]hite

Black, black, White, white


Question Mark (?):

The question mark symbol means that the character it comes after is optional. For instance:

Pattern

Matches

Ab?c

Ac, Abc

Colou?r

Color, Colour


Asterisk (*):

The asterisk symbol matches with 0 or more occurrences of the earlier character or group. For instance:

Pattern

Matches

Sh*

(0 or more of earlier character h)

S, Sh, Shh, Shhh.

(banana)*

(0 or more of earlier banana. This will also match with nothing, but most regex engines will ignore it or give you a warning in that case)

banana, bananabanana, bananabananabanana.


Plus (+):

The plus symbol means to match with one or more occurrences of the earlier character or group. For instance:

Pattern

Matches

Sh+

(1 or more of earlier character h)

Sh, Shh, Shhh.

(banana)+

(1 or more of the earlier banana)

banana, bananabanana, bananabananabanana.


Difference between Asterisk (*) and Plus(+):

The difference between the asterisk confuses many people; even the experts sometimes must look at the internet for their differences. However, there is an effortless way to remember the distinction between them.

Imagine
you have a number 1, and you multiply it with 0:
1*0
= 0 or more occurrences of earlier character or group.

Now suppose that you have the same number 1, and you add it with 0:1+0 = 1 or more occurrences of an earlier character or group.

It is that simple when you try to understand things intuitively.

Negation (^):

Negation has two everyday use cases:

1.     Inside square brackets, it will search for the negation of whatever is inside the brackets. For instance:

Pattern

Matches

[^Aa]

It will match with anything that is not A or a

[^0123456789]

It will match anything that is not a digit

 2.     It can also be used as an anchor to search for expressions at the start of the line(s) only. For instance:

Pattern

Matches

^Apple

It will match with every Apple that is at the start of any line in the text

^(Apple|Banana)

It will match with every Apple and Banana that is at the start of any line in the text


Dollar ($):

A dollar is used to search for expressions at the end of the line. For instance:     

Pattern

Matches

$[0123456789]

It will match with any digit at the end of any line in the text.

$([Pp]anda)

It will match with every Panda and panda at the end of any line in the text.