Regular Expressions or RegEx Explained


Simply put, RegEx is a set of rules on certain characters that is used as a search pattern on strings. So, when you want to programmatically find certain substring of a string or object, you’ll use RegEx. In order for RegEx to even function we need to have an engine supporting this type of behavior. To have RegEx explained we need to understand the scenarios in which we are using it. We’ll go through scenarios below but, first, before any of the scenarios is taken care of, we need to approach it as a project:

  1. Understand the scenario and review edge cases
  2. Create a plan of attack using an online tool for testing RegEx patterns (I personally use regex101)
  3. Apply/Deploy

Understanding the scenario

This piece of the strategy in solving the RegEx problem is most important. It goes along with reviewing edge cases. It is crucial to understand what are the goals we are aiming for. On what type of data are we performing the search. The possible deviations from the initial requirement. When we have a full understanding of the scenario and potential edge cases list we can start creating a plan of attack.

Creating a plan of attack

When we have a full understanding of the scenario we can then jump to wrapping up the weapons needed to tackle this one. Now, this is where the mentioned set of characters takes place. Below is all the tool you can use to create a good pattern. These tools are just a set of characters we can compose to have the most successful search pattern for our scenario.

RegEx rules/tools:

character/toolDescription
.Matches any character.
\wMatches any alphanumerical character, or an underscore. It’s a wildcard/short for [a-zA-Z0-9_]
\dMatches any digit. It’s a wildcard/short for [0-9]
\sMatches any whitespace character. Regardless of how many “places” this whitespace character takes or if it’s a list of whitespace characters.
\WMatches a non-alphanumerical character and non-underscore. It’s a wildcard/short for [^a-zA-Z0-9_]
\DMatches any non-digit character. It’s a wildcard/short for [^0-9]
\SMatches any non-whitespace character. Regardless of how many “places” this whitespace character takes or if it’s a list of whitespace characters.
[]Any character specified in these brackets will be used as a pattern. If you use special characters special (characters are the RegEx characters/tools or operand) they will be treated as a regular character – not as operands.
[^]Any character used within these brackets after the character “^” will be used to exclude listed characters from the search target.
(…)Matches a group of characters or substring in a target stringjjjj
\Escape character. Treats any special character (special characters are the RegEx characters/tools)
|This is logical OR, basically. It tells that between two sets of patterns at least one needs to be true
^Matches the beginning of the target string.
$Matches the end of the target string.
?=Positive lookahead. Used if we want to match character A that does not have a certain character before itself in a string.
?!Negative lookahead. Used if we want to match character A that does not have a certain character following it.
+Multiplying search patterns one time or an infinite number of times. Used when we want to match something that exists at least ones or an infinite number of times.
*Multiplying search patterns zero times or infinite number of times. Used when we need to match something that can exist but, doesn’t necessarily have to be there.
{X}Matches repetitions of a previous character (before the {} brackets) for X times.
{X,Y}Matches repetitions of a previous character (before the {} brackets) between X and Y times.
{X,}Matches repetitions of a previous character (before the {} brackets) between X and an infinite number of times.
RegEx operands

Having all of this in mind, we can create a perfect pattern using any of the online tools to tackle the scenario we are facing. I am using regex101.com for a quick and easy RegEx test.

Deployment of the RegEx pattern

Eventually, the created plan of attack is implemented. There are number of different approaches to this process so, we are going to mention just a few. It is your obligation as a programmer to find the best possible deployment method for your project. Below are some of the examples we have chosen not to be more than just a glimpse for how to match the word bluegrid.io in the URL http://bluegrid.io/edu/:

Using regex in php

<?php
preg_match('/(bluegrid\.io)/', 'http://bluegrid.io/edu/', $found, PREG_OFFSET_CAPTURE);
print_r($found);
?>

Using RegEx with python

import re

target = "http://bluegrid.io/edu/"
found = re.findall("bluegrid\.io", target)
print(found)

Using RegEX with JavaScript

const target = 'http://bluegrid.io/edu/';
const pattern = /(bluegrid\.io)/g;
const found = paragraph.match(pattern);

console.log(found);

Using RegEx with Linux/Unix-like Operating Systems command line

cat file.txt | grep -e 'bluegrid\.io'

In conclusion, we need to be thorough when working with regex. We need to know what is our scenario, what are the edge cases and then using it we can create a plan.

In a related article, we are showing different scenarios and you will be able to see how edge cases are dictating a plan of attack.

Share this post

Share this link via

Or copy link