Analytics Market Chart

Regular Expressions for Google Analytics

^[Rr]ead th(is|ese|eir) .* blogs?$

If you want to create filters, perform searches or set up goals in Google Analytics or Angelfish Software, you need a basic understanding of regular expressions. This article is a basic refresher. You can also use our free regex tester to test your own regular expressions.

What are Regular Expressions?

Regular expressions (also known as regex) are used to find specific patterns in a list. In Google Analytics, regex can be used to find URLs (or almost anything else) that match a certain description. For example, you can find all pages within a subdirectory, or all pages with a query string more than ten characters long.

Regular expressions provide a powerful and flexible way to describe what the pattern should look like, using a combination of special characters and alphanumerics.

For example, typing html into the search box in the content reports will return all URLs that contain "html" anywhere in path. It would return "/index.html" as well as "/html-definitions.php".

Escaping Special Characters

Regular expressions use a series of special characters that carry specific meanings. This is a thorough, but not comprehensive, list of the special characters in regex that carry a non-literal meaning.

^ $ . ? [] () + \

You can see that a period is a special character. If you wanted to match a literal period in regex, you would need to "escape" it by adding a backslash before it. For example, \.html would match a dot followed by the string "html".

If you want to match a series of special characters in a row, you need to escape each one individually. To match "$?", you would type \$\?.

As you may have guessed, since a backslash itself is a special character, you would need to type two backslashes into regex in order to match a single literal backslash.

If you're unsure whether a character is a special character or not, you can escape it without any negative consequences.

Anchors

Regular expressions match the pattern you specify if they occur anywhere in the string--beginning, middle or end. There are anchors you can use in regex to specify that a pattern should only occur at the beginning or end.

Use the caret symbol (^) to anchor a pattern to the beginning. Use a dollar sign to anchor a pattern to the end. Use both to match only the pattern, with nothing preceding or following.

^car will match "car", "carpet" and "cartoon". It won't match "scar", "red car" or "new cars".

car$ will match "car", "scar" and "red car", but not "cars", "carpet" or "cartoon".

^car$ will match only "car" and ^$ will match only empty strings.

Ranges of Characters

Regex can also be used to match ranges or combinations of characters. Square brackets allow you to specify a variety of characters that can appear in a certain position in the string.

For example, [eio] would match either "e", "i" or "o".

You could conceivably include a long list of characters in square brackets. It's sometimes easier to match a range of characters, though, with a hyphen. For example, [a-z] will match any lowercase letter. (Google Analytics is case-insensitive by default.) [a-z0-9] will match any letter or number. [a-dx-z] will match a, b, c, d, x, y, or z.

Square brackets look at each individual character, not whole words. [word] would match an instance of "w", "o", "r" or "d". To match a specific string, enclose them in parentheses and use a pipe as an "or" character. For example, to match an instance of "cat" or "dog", you would type (cat)|(dog) OR (cat|dog).

Finally, if you wanted to match any character, you would use a period. car.s will match "carrs", "car?s", "car5s", etc.

Repeating Patterns

That's all fine and dandy, but we haven't even touched on the thing makes regex so powerful. You can specify the number of times a pattern should occur.

Using a question mark after a character will match zero or one occurrence of the character. a? will match "a" or "".

Using a plus sign matches one or more occurrences. a+ will match "a", "aa", "aaaaaaaaaa", etc.

Using an asterisk match any number of occurrences (including zero). a* will match all of the above.

Using curly brackets you can match a specific range of occurrences. You specify the minimum and maximum number of occurrences. ca{3,5}t will match "caaat", "caaaat", "caaaaat", but not "cat" or "caaaaaaaaat".

These quantity characters can be likewise applied to patterns. .* matches any number of any character. [a-z]{1,3} matches one, two or three occurrences of any letters.

Google Analytics for Intranets

Does your company use Google Analytics to track Intranet websites? If so, there's a slight problem:

Google Analytics isn't designed for Intranets