Tag Archives: regular

Simple Regular Expression Tutorial for Excel VBA

I’ll go through a simple tutorial for using regular expressions for VBA.  This is a great quick start guide for people who have used regular expressions before in other languages.  Just as a review, regular expressions are used to match patterns in strings.  This is a powerful tool that can help you data cleanse and data mine.  The directions are the same for Microsoft Access as well.

First thing first, let’s enable the reference in Excel so that we can use this functionality.

Go to Tools > References

Then find Microsoft VBScript Regular Expressions 5.5

Click Ok, and now you’ll be able to create a RegExp object to use in your VBA.

Here’s a basic function that finds 5 letter words in a string.

Here’s a copy of the Excel file that I am using in the example: RegEx Excel Tutorial XLS File

Function findFiveLetter(sentence)
    Dim regEx As New VBScript_RegExp_55.RegExp
    Dim matches, s
    regEx.Pattern = "\W\w{5}\W"
    regEx.IgnoreCase = True 'True to ignore case
    regEx.Global = True 'True matches all occurances, False matches the first occurance
    s = ""
    If regEx.Test(sentence) Then
        Set matches = regEx.Execute(sentence)
        For Each Match In matches
            s = s & " Position: " & Match.FirstIndex
            s = s & " Word: " & Match.Value & " "
            s = s & Chr(10)
        Next
        findFiveLetter = s
    Else
        findFiveLetter = ""
    End If
End Function

Now let’s break it down.

First you need to declare your regEx obejct with:

Dim regEx As New VBScript_RegExp_55.RegExp

Next you give your object some parameters.

The most important is .Pattern, which is the pattern of characters that you want your function to find.  Then there are a couple of settings like Global, if you want it to search for all occurrences (TRUE) or just the first occurrence (FALSE), and also IgnoreCase.

Here’s a simple guide to get you started on writing regex patterns.

regEx.Pattern = "\b\w{5}\b"
    regEx.IgnoreCase = True 'True to ignore case
    regEx.Global = True 'True matches all occurances, False matches the first occurance

Now to test if the pattern has been found in a string you can use

regEx.Test(sentence)

To get an array of matches from your regex you simply have the regex execute.

Set matches = regEx.Execute(sentence)

To get the results from this array object, you can reference matches(n) or use a For Each loop. There are two important pieces of information returned in the array, the index of the match and the actual match.

I used a For Each:

For Each Match In matches
            s = s & " Position: " & Match.FirstIndex 'Position of the match
            s = s & " Word: " & Match.Value 'The actual match
            s = s & Chr(10) 'Prints a new line
        Next

Here’s the example output:

Line Matches
What began as a small group of protesters expressing their grievances  Position: 5 Word: began
Position: 16 Word: small
Position: 22 Word: group
Position: 53 Word: their
about economic inequities last month from a park in New York City  Position: 0 Word: about
Position: 31 Word: month
has evolved into an online conversation that is spreading across the harry  Position: 69 Word: harry
country on social media platforms.  Position: 18 Word: media

Here’s a copy of the Excel file that I used in the example: RegEx Excel Tutorial XLS File
The regex function is called findFiveLetter(“put a string in here”)

Basic Regular Expression Patterns for Beginners

This is super simplified, but it’s enough to get started with. I remember being confused by the gigantic tables with all the special clauses, so when you’re beginning keep it simple.

Remember if you want to match a literal character you can just type it out, unless there’s a special character. Special characters have meanings in regex patterns, so you need to type a \ in front of them if you want to specifically match them in the pattern.

Common special characters are: + . \ – [] {} $ ? |

Example: If I wanted to match taste, taster, tasters, I would type a pattern like taste\w{0,2}

If I wanted to match taste!, I would write a pattern like taste\!

Code What does it Match?
\d Numbers
\w Word characters like letters of any case
\s Spaces
\b Boundary of a word character
\D Not a number
\W Not a word character
\S Not a space
\n Line break
\r Line break
\t Tab
? Previous code is optional
* Repeat previous code 0 to infinity
+ Repeat previous code once or more
{1} Repeat previous code once (can use any number)
{1,2} Repeat previous code 1 to 2 times (can use any numbers)