Beginner’s Guide to Regex Expressions

Regex expressions or regular expressions are a series of statements that can be utilized for searching for specific patterns of texts. There are multiple regex expressions that may seem overwhelming to a beginner program but broken down, these concepts are quite simple and can be very useful once utilized effectively.

Image for post
Image for post

Before we get into regex expressions, here are the common types of characters and symbols you will encounter with most regex applications:

Letters- abcdefg.. ABCDEFG (note: Regex is case sensitive! we will want to watch out for this as we search through our patterns).

Numbers- 0123456789

MetaCharacters- .[{()\^$|?*+ — These characters need to be escaped through the utilization of a backslash (For instance, if we are searching for a period we would type the following: \.).

Types of regex expressions

The following types of regex expressions can be utilized to sort through data and find specific attributes (letter, number, symbols, etc.) I’ll begin by introducing the different types of regex expressions and their functionality then utilize these expressions for real world scenarios and applications.

Regex Expressions

. — Matches any character except a new line (does not match new line).

^- Matches only at the beginning of a string.
$- End of a string

\d — matches all digits from 0–9
\D — not a digit (0–9)

\w — word character (a-z, A-Z, 0–9, _) (does not match any Metacharacters).
\W Matches with non word character (matches all metaCharacters.

\s — matches all whitespace. (Space, tab, newline).
\S — Non whitespace.

\b- Matches expressions if there is a word boundary where used.
\B- Not a word boundary.

[] — Matches characters in brackets.
[^ ] — Matches everything that is not in the specific bracket.
| — Either or expression.
( )- Allows us to group regex expressions together.

Quantifiers:

* — will match 0 or more
+ — will match 1 or more
? — 0 or one
{2} — exact number
{2,3} — range of numbers {min, max)

To search for a range:
[a-z]
To search for multiple ranges create them back to back:
[a-zA-Z]

Regex Expression Application

Lets begin with the simple email example. Suppose we have a series of text that include information about the names, numbers, and emails of all site users. We would like to create a regex expression that will aid in extracting all email addresses of our users to be utilized for a newsfeed.

Lets begin with a sample example:

Mark Antonio

MarkAntonio@email.com

123–456–7890

We want to create a regex expression to extract the email information above.

We will begin by specifying a range of letters (both uppercase and lowercase) that may exist in our email.

[a-zA-Z]+

We can use the + quantifier here to match 1 or many of these characters

So far we’ve matched the following characters of our email: MarkAntonio

As we know that the next symbol is an @, we can simply specify:

[a-zA-Z]+@

Following the @ sign, there are a series of characters which can be matched using the same range of characters used for the initial character matching.

[a-zA-Z]+@[a-zA-Z]+

So far we’ve matched: MarkAntonio@email

Now we can simply specify the domain with a backslash prior to the . MetaCharacter:

\.com

Overall expression: [a-zA-Z]+@[a-zA-Z]+\.com

Great job! We were able to use the regex expression to match a simple email address pattern. Now let's consider a scenario with more complex email addresses:

1. Match any letter that may be lowercase or uppercase?
Martian@email.com
[a-zA-Z]+@[a-zA-Z]+\.com

2. What about a student email to an email with special characters?
Mar.Tian@email.edu
Martian1–23@yahoo.net
[a-zA-Z0–9.-]+@[a-zA-Z0–9-]+\.[a-zA-Z0–9-.]+

Matching a phone number in this instance:

123–456–7890

To match a phone number.
\d\d\d.\d\d\d.\d\d\d\d

As a reminder: \d — matches all digits from 0–9

Create a character set that you want to match by specifying the exact character (here we used a dash -).
\d\d\d[-.]\d\d\d[-.]\d\d\d\d

Better way to write this with quantifiers:
\d{3}[-.]\d{3}[-.]\d{4}

You do not need to escape MetaCharacters inside brackets due to expression features.

To find a phone number that begins with either an 8 or 9 we can use the following:

800–123–4567 or 900–123–4567

[89]00[-.]\d\d\d[-.]\d\d\d\d

Other utilizations of regex expressions:

Searching for the following Mr. James
Mr Robert

Mr\.? — will search for it with or without period.

Full match: Mr\.?\s[A-Z]\w+

What if we wanted to match with an individual with a one character name?

Mr\.?\s[A-Z]\w*

Mrs. James and Ms Brown as well?
M(r|s|rs)\.?\s[A-Z]\w*

Working backwards:

Congratulations, you solved some pretty complex regex expressions. The hardest part of a regex expression is reading a regex pattern and trying to understand what is going on. The best way to get better at regex patterns is through reading other people’s examples and trying to work backwards. Lets try this with our following example:

Mr\.?\s[A-Z]\w*

The expression begins with a Mr

Followed by a \. Mr.

\s implying a space.

[A-Z]\w* that will match 0 or more characters.

Conclusion

Overall, regex expressions may seem like a complex subject that is quite overwhelming to beginner programmers. Hopefully through the use of the guide above, you can get started matching and understanding different regex expressions and identifying their patterns.

Written by

Software Engineer focused on Full Stack development with MERN stack and Ruby experience. Interested in sharing my learning journey with aspiring developers.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store