Regular Expression or RegEx examples


In one of our educational articles, we have covered the basics of RegEx. We have shown how to approach to solving certain RegEx problem by following the plan of attack. Now, we’ll focus only on some RegEx examples where we’ll show why edge cases are important and how we can get it wrapped up.

At the very beginning, let’s establish rules of engagement. We’ll be using the online tool regex101.com. We’ll be going through several different scenarios and digest each individually.

Scenario 1 – Match a specific URI

  • URI: /1/a/image.png
  • Task: Match URIs that start with /number/ followed by a letter and ending with the image file name.

Creating a plan

We have URI part of the URL that consists of three slashes (URI levels) and the file in question is an image. We can start building a pattern by the following instruction from the “Task”:

  1. ^\/[\d]\/ – Starts with /number/
  2. [a-zA-Z] – Followed by a letter
  3. \/[\w-]+.[a-z]{3,4} – Ending with image file name. (Assuming image extension will be lower case).
    • Why 3 to 4 characters for an image?
      • Because of webp type of image (extension) which contains 4 characters.

Overall we’ve got ^\/[\d]\/[a-zA-Z]\/[\w-]+.[a-z]{3,4}$ pattern

RegEx example #1
RegEx example #1

Example #2 – Match only values from the query string in the URL

URLs:

  • http://bluegrid.io/?v=123
  • http://bluegrid.io/?a=1&b=2&c=3
  • http://bluegrid.io/?url=http%3A%2F%2Fbluegrid.io&a=1

Task: Get all the query string values without corresponding keys (from “a=1” get only “1”).

Creating a plan

Our focus on above strings (URLs) should start from character “?” which indicates there is a query string following it. Query strings are formed with a key and a value joined with character “&” (ex: a=1&b=2).

Edge cases:

  • There can be any number of query string key: value pairs.
  • The key is always a letter.
  • The values are expected to consist of alphanumeric characters, numbers, “.” character and “%” character only
  1. ([a-z]+=) – Matching a key
  2. ([a-zA-Z0-9%.]+) – Matching key value
  3. &? – Matching the end of the key: value pair by finding “&” character which if exists it means there is another key: value pair. If not, we are calling it the end of the query string. “?” after the “&” characters tell that there might be the “&” after the previous key: value air but, doesn’t have to be one.

Finally, the RegEx pattern we’ve got is: ([a-z]+=)([a-zA-Z0-9%.]+)&?

RegEx example #2
RegEx example #2

Example #3 – Match substring in the URI

URLs:

  • http://bluegrid.io/edu/what-is-the-secondary-dns-slave-dns/
  • http://bluegrid.io/edu/cache-control-headers-explained/
  • http://bluegrid.io/edu/what-are-conditional-http-request-headers/
  • http://bluegrid.io/edu/how-https-protocol-works/
  • http://bluegrid.io/edu/what-is-dns-service-and-how-does-it-work/
  • http://bluegrid.io/edu/how-symmetric-encryption-works/

Task: Match URLs with HTTP or HTTPS protocols and has the substring that starts with “/edu/what-is” or “/edu/what-are” and ends with “/“.

Creating a plan

First, we need to catch the protocol and domains within the URL, then we need to position our focus on URI part of the interest “/edu/…” and the end “/“. Let’s check out what are the edge cases and rules we need to follow:

Edge cases:

  • URIs consist of letters only, no digits are expected
  • No other “/” characters are expected between “/edu/what-is” or “/edu/what-are” and with “/” at the end
  • There is no limit on the number of the total characters in the URI
  1. https?:\/\/ – Match http:// or https://
  2. ([\w.])* – Match potenntially existing subdomain
  3. ([\w\d-]*) – Match the domain
  4. \.([a-z]*) – Match Top Level Domain (.com, .rs, .io…)
  5. \/edu\/what-(is|are – Match “/edu/what-is” or “/edu/what-are
  6. (.*)\/$ – Match the rest of the URL until it ends with “/

Eventually, the pattern looks like this: https?:\/\/([\w.])*([\w\d-]*)\.([a-z]*)\/edu\/what-(is|are)(.*)\/$

RegEx example #3
RegEx example #3

Example #4 – Match only keys in the query string (ignore values)

Unlike in one of the previous examples, we needed to match only values in the key: value pairs of the query string, now it’s the opposite task.

  • http://bluegrid.io/?v=123
  • http://bluegrid.io/?a=1&b=2-1&c=3
  • http://bluegrid.io/?1=2&b=23
  • http://bluegrid.io/?x=@23
  • http://bluegrid.io/?url=http%3A%2F%2Fbluegrid.io&a=1

Task: Match only the keys in the key: value pair in the query string.

Creating a plan

Our point of interest is the query string in the URI, anything after “?” character. We need to create matching groups that are going to isolate border items (“?” at the beginning and “&” in the end). Then we’ll need to split keys and values using matching groups:

Edge cases:

  • There can be any number of query string key: value pairs.
  • The key is always a letter.
  • The values are expected to consist of alphanumeric characters, numbers, “.” character and “%” character only
  1. ([a-z]+=) – Match the key
  2. ([a-zA-Z0-9%.]+) – Match the value
  3. &? – Match joining character “&” if there is one. It can exist after the key: value pair but, if there are no more pairs, “&” will not be there.

Eventually, the final pattern is: ([a-z]+=)([a-zA-Z0-9%.]+)&?

RegEx example #4
RegEx example #4

There you go, folks! For more educational articles, feel free to visit our EDU page!

Share this post

Share this link via

Or copy link