English Posts

Regular Expression (RegEx): the Guidebook

regular expressions regex the guidebook

The RegEx or Regular Expressions are a very powerful instrument to use for your analysis. It allows you to optimize your job and faster isolate pieces of information.

At a first glance, RegEx can be scary because they seem like a foreign unknown language but, using them, you’ll realize how useful they are and you can no longer do without them.

RegEx and Google Analytics

The Regular Expressions can be used in Google Analytics in two ways:

  1. Into the AnalyticsUser Interface;
  2. Using the GA APIs;

In this post I’ll show you the meaning of RegEx and concrete examples to better understand them. There are different regular expressions, some of them can be used only into the GA’s UI and other to filter data when you’re working with the API in Sheets.

The contents of this post are the following:

1. User Interface Google Analytics

You can use RegEx for different scopes into the Analytics’ UI:

  • To create View Filters;
  • By filtering data into the Reports (every dimension can be filtered, as search terms, page title etc.);
  • By creating Segments;
  • By creating Content Groupings;
  • By creating Goals;
  • By creating Channels
  • By creating Custom Reports

2. Google Analytics APIs

When you download GA data directly to Google Sheets, you often need to filter some and isolate them.

To make the best use of the filter, Regular Expressions are your best friend.

core reporting api add-ons

Regular Expressions (RegEx) List

Now, I’ll show you the Regular Expressions you can use into the Google Analytics User Interface.

. (dot)

The dot, inserted before or after a character, considers every character before or after the dot itself.

Example: cas.a

RegExValidInvalid
cas.acassacasa
castacassandra
etc.etc.

Invalid words include “casa” and “cassandra“, because the dot (.) refers to a single character. The word “cassandra” contains five letters after the “s” (s,a,n,d,r,a) and this is the reason why it is not included into our regular expression “cas.a”.

* (asterisk)

This RegEx allows you to take into account none or more characters of the element preceding the asterisk.

Example: cas*a

RegExValidInvalid
cas*acasacassandra
cassssacasta
etc.etc.

The words “cassandra” and “casa” are invalid because the asterisk is placed after the “s” letter (cas*a). That means, it’ll be considered the element “s”, which can be either unique or more than one (i.e., cassssa).

Both the word “casta” and “cassandra” contain elements, which are different from “s”: respectively “t” (casta) and “andr” (cassandra).

| (pipe)

This symbol means “or”.

It’s a very useful RegEx: it allows you to write into a single string more than one word.

You can use it for several reason like selecting some search terms on your website and analyze their trend.

Example: analytics|tag manager

RegExValidInvalid
analytics|tag manageranalytics marketing, google analytics, tag managergoogle, ga4
google tag manager, gtm, ga
etc.etc.

\ (backslash)

This Regular Expression gives us the value of the next character.

The typical use of this symbol is by reading the dot as a “dot” and not as a Regular Expression.

Example: 192\.168\.1

In this example, by using the backslash, the value will be read as: “192.168.1”.

If we didn’t use the \ (backslash) then the . (dot) would be read as a RegEx as described in the paragraph above.

^ (caret)

This regula expression means that the considered value begins with the characters written immediately after.

Example: ^search

RegExValidInvalid
^searchsearchingtest search
search termss earch
etc.etc.

$ (dollar sign)

The dollar sign has the same value of the ^ (caret) but it reads the precedent values.

Example: search$

RegExValidInvalid
search$term searchsearch term
test searchsearching
etc.etc.

? (question mark)

This Regular Expression makes the character before the question mark optional.

Example: goo?gle

RegExValidInvalid
goo?glegoogle analyticsga
goglegmail
etc.etc.

() (parentheses)

They allow you to group multiple elements within them.

Example: google\.(it|com)

RegExValidInvalid
google\.(it|com)google.itgoogleit
google.comgooglecom

[] (square brackets)

All the values within the square brackets are considered as a list.

Example: c[yi]clette

RegExValidInvalid
c[yi]clettecyclettebici
biciclettecico
ciclettecyp
etc.etc.

{} (curly brackets)

The values inside the curly brackets indicate how many times the last element must be repeated.

Example: bici{2}

RegExValidInvalid
bici{2}biciibici
bicii donnabici donna
etc.etc.

By typing the number “2” within the curly brackets, all the results displayed will be the words with two “i” letters at the end.

This Regular Expression can be useful for finding misspellings or for looking up numerical values like for IP addresses.

For example, if I want to capture IP addresses from 192.168.1.0 to 192.168.1.99 I have to write this RegEx: 192\.168\.1\.[0-9]{1,2}$

By this way, I tell that one digit must be captured – because the mubers rangingfrom 0 to 9 are composed of only one unit – and maximum two digits (the numbers from 10 to 99 have two units). The following IP address will not be captured: 192.168.1.100, simply because 100 has three units.

(dash)

Separating values with the – (dash) symbol allows me to indicate a list of characters or numbers.

Here you can see some examples which can be also used into the Google Analytics Filters:

  • [a-z]: indicates all the lowercase characters
  • [A-Z]: indicaes all the uppercase characters
  • [0-9]: indicates all the numbers
  • [a-zA-Z0-9]: indicates all the lower and upper case characters and the numbers

+ (plus)

This RegEx returns one or more values preceding the + plus sign as results.

Example: cycle+

RegexValidInvalid
cycle+cyclecyclinge
cycleeebicycling
etc.etc.

Regular Expressions with Analytics API

In addition to all the RegEx seen above, which can be used in the GA User Interface, there are a series of regular expressions, which can only be used with the Google Analytics API, for example when working on Google Sheets.

These RegEx are useful when I want to filter some dimensions or metrics from my report.

In this Google resource you can find the list with all the dimensions and metrics available in the Core Rporting API

Below some practical examples.

; (semicolon)

The meaning of this symbol is “and”.

When I want to analyze several values and I type ; (semicolon), I set the condition “and”.

Example: I want to see the sessions with mobile device and from Italy.

Regex: ga:deviceCategory==mobile;ga:country==Italy

core reporting api semicolon

, (comma)

The comma sign means “or”.

When I type the comma into a string the condition I want is “or”.

Example: I want to filter the sessions with mobile or desktop device.

Regex: ga:deviceCategory==mobile,ga:deviceCategory==desktop

analytics api comma

=~ (equal and tilde)

Using these two symbols the meaning is “includes/matches to”.

Example: I want to see the transaction Id which contains the name “test”.

Regex: ga:transactionId=~test

analytics api =~

!~ (exclamation mark and tilde)

The meaning of this RegEx is “excludes/not match”.

Example: I wantto exclude the transactions with the value “test”.

Regex: ga:transactionId=!test

anaylitics api !~

== (double equal)

Writing two equals means “exactly matches”.

Example: I want to filter the sessions from organic channel.

Regex: ga:channelGrouping==Organic Search

analytics api ==

!= (exclamation mark and equal)

This RegEx means “not match to”

Example: I want to see the sessions which are not equal to the Organic Search.

Regex: ga:medium!=organic

analytics api !=

Greater than, less than

The symbols > and < correspond respectively to “greater than” and “less than” and they can be combined with the symbol = (equal).

They can indicate “greater or equal to” when I write this: >=

Vice versa , if I want to indicate “less or equal to” I’ll write: <=

<> (between values)

Typing the RegEx <> I’ll filter the data that fall between one value and another.

Example: I want to see the transaction costs between 6 and 9.

Regex: ga:costPerTransaction<>6_9

analytics api <>

=@ (equal and at)

The RegEx =@ means “contains substring”.

Example: I search the campaign with the value “social”.

Regex: ga:campaign=@social

analytics api =@

~@ (tilde and at)

This RegEx means “not contain the substring”.

Example: I search all the campaigns except those that contain the value “social”.

Regex: ga:campaign~@social

analytics api ~@

Filter By Segment With the API

There is another interesting mehod that can be used in Google Sheets and it is the same method as using segments in the Google Analytics User Interface.

Example: I want to filter the sessions from Italy.

If I worked into the Analytics UI, I’ll create a segment like this: Sessions – Include – Country – contains – Italy

google analytics segment

In Google Sheets I should write the following expressions, in the line referring to the segment:

sessions::condition::ga:country=~Italy

api analytics segment

I can simpl add several conditions, like in the Analytics UI, by typing the semicolon symbol (;) and rewriting “sessions::condition::“, like in this example:

sessions::condition::ga:country==~Italy;sessions::condition::ga:deviceCategory==mobile

filter by segment in google sheets

The string above has the same meaning of the following segment in the Analytics User Interface:

google analytics segment

Practical Examples: RegEx and Google Analytics

Let’s now see some practical examples of Regular Expressions into action in Google Analytics User Interface.

a) Report Filters

You can filter every reports in GA, but using the RegEx all is simpler and faster.

I can type the regular expression either in the simple Filter or into the advanced options (by clicking on “advanced”) and selecting “matching RegExp“:

regular expressions analytics

b) Custom Reports

I have the possibility to create several custom reports in GA in order to improve my analysis.

I can filter my custom reports. So I can see only the data I need to. By using the RegEx all becomes simpler.

In the image below I’ve created a custom report that shows me some steps of my funnel. Those steps have the following naming: STEP_1, STEP_2 until STEP_5. Creating this regular expression, all is more easy: STEP_[1-5]:

custom reports regex

c) View Filters

In the traditional Universal Analytics version we have the Views where we can apply filters to see only some data.

View Filters can be created easily by using regular expressions. In the image below there’s a filter which shows us all the values for the Referral dimension, simply using the regex: (.*)

view filter regex google analytics

d) Goals

It’s more easy create a goal using the regular expressions.

As you see in the example below, it’s been set up a Goal on the destination page that starts with the URL string “/confirmation” (Regex: ^/confirmation) and follows a certain funnel of several other pages:

goals regex google analytics

e) Segments

Segments are a fundamental tool to analyse data. With Regular Expressions it’s possible to create more qualitative segments as you can see in the example below. The segment in the image catches sessions from users who have selected at least one Store (the Store value has collected by the custom dimension with index 25):

segments google analytics

f) Channel Grouping

In the admin section of Google Analytics, under the column View is possible to set up several channels. Using regular expressions is more simple to create a new one, as you can see in the image below where the values “l.facebook.com” and “m.facebook.com” have collected under the Channel called “Facebook”:

channel grouping google analytics

g) Content Grouping

When you create a new Content Grouping for the content of your website (read the post to know more about what Content Groupings are in GA), the Regular Expressions can help you to build qualitative groups of content.

You have the possibility to define set of rules using RegEx.

content grouping regular expression

Conclusions

Regular Expressions are an important help that we need to know in the web analytics field. By using them we have the possibility to quickly filter data but also to set up our account in GA.

If you are at the beginning of the path with the RegEx, don’t be afraid! I suggest you to start using the simpler Regular Expressions like the | (pipe) or ^ (caret), perhaps starting with the simplest reports such as the most searched terms on the website.

Now, what to say… sit back, relax and enjoy your RegEx!

Good analysis!


You may also be interested by the following articles:

What is GA4 Measurement ID?

PII (Personally Identifiable Information): what is it?

How to Link Google Analytics 4 to BigQuery

How to create a new Property in Google Analytics 4

How to delete gtm_debug=x in Google Analytics

How to filter internal traffic in Google Analytics 4

Search and Replace Filter in Google Analytics

Lascia un commento

Il tuo indirizzo email non sarà pubblicato. I campi obbligatori sono contrassegnati *