English Posts

Regular Expression (RegEx): the Guidebook

regular expressions regex the guidebook

The RegEx or Regular Expressions are a very powerful instrument to use for your analysis. It allows you to optimize your job and faster isolate pieces of information.

At a first glance, RegEx can be scary because they seem like a foreign unknown language but, using them, you’ll realize how useful they are and you can no longer do without them.

RegEx and Google Analytics

The Regular Expressions can be used in Google Analytics in two ways:

  1. Into the AnalyticsUser Interface;
  2. Using the GA APIs;

In this post I’ll show you the meaning of RegEx and concrete examples to better understand them. There are different regular expressions, some of them can be used only into the GA’s UI and other to filter data when you’re working with the API in Sheets.

The contents of this post are the following:

1. User Interface Google Analytics

You can use RegEx for different scopes into the Analytics’ UI:

  • To create View Filters;
  • By filtering data into the Reports (every dimension can be filtered, as search terms, page title etc.);
  • By creating Segments;
  • By creating Content Groupings;
  • By creating Goals;
  • By creating Channels
  • By creating Custom Reports

2. Google Analytics APIs

When you download GA data directly to Google Sheets, you often need to filter some and isolate them.

To make the best use of the filter, Regular Expressions are your best friend.

core reporting api add-ons

Regular Expressions (RegEx) List

Now, I’ll show you the Regular Expressions you can use into the Google Analytics User Interface.

. (dot)

The dot, inserted before or after a character, considers every character before or after the dot itself.

Example: cas.a

RegExValidInvalid
cas.acassacasa
castacassandra
etc.etc.

Invalid words include “casa” and “cassandra“, because the dot (.) refers to a single character. The word “cassandra” contains five letters after the “s” (s,a,n,d,r,a) and this is the reason why it is not included into our regular expression “cas.a”.

* (asterisk)

This RegEx allows you to take into account none or more characters of the element preceding the asterisk.

Example: cas*a

RegExValidInvalid
cas*acasacassandra
cassssacasta
etc.etc.

The words “cassandra” and “casa” are invalid because the asterisk is placed after the “s” letter (cas*a). That means, it’ll be considered the element “s”, which can be either unique or more than one (i.e., cassssa).

Both the word “casta” and “cassandra” contain elements, which are different from “s”: respectively “t” (casta) and “andr” (cassandra).

| (pipe)

This symbol means “or”.

It’s a very useful RegEx: it allows you to write into a single string more than one word.

You can use it for several reason like selecting some search terms on your website and analyze their trend.

Example: analytics|tag manager

RegExValidInvalid
analytics|tag manageranalytics marketing, google analytics, tag managergoogle, ga4
google tag manager, gtm, ga
etc.etc.

\ (backslash)

This Regular Expression gives us the value of the next character.

The typical use of this symbol is by reading the dot as a “dot” and not as a Regular Expression.

Example: 192\.168\.1

In this example, by using the backslash, the value will be read as: “192.168.1”.

If we didn’t use the \ (backslash) then the . (dot) would be read as a RegEx as described in the paragraph above.

^ (caret)

This regula expression means that the considered value begins with the characters written immediately after.

Example: ^search

RegExValidInvalid
^searchsearchingtest search
search termss earch
etc.etc.

$ (dollar sign)

The dollar sign has the same value of the ^ (caret) but it reads the precedent values.

Example: search$

RegExValidInvalid
search$term searchsearch term
test searchsearching
etc.etc.

? (question mark)

This Regular Expression makes the character before the question mark optional.

Example: goo?gle

RegExValidInvalid
goo?glegoogle analyticsga
goglegmail
etc.etc.

() (parentheses)

They allow you to group multiple elements within them.

Example: google\.(it|com)

RegExValidInvalid
google\.(it|com)google.itgoogleit
google.comgooglecom

[] (square brackets)

All the values within the square brackets are considered as a list.

Example: c[yi]clette

RegExValidInvalid
c[yi]clettecyclettebici
biciclettecico
ciclettecyp
etc.etc.

{} (curly brackets)

The values inside the curly brackets indicate how many times the last element must be repeated.

Example: bici{2}

RegExValidInvalid
bici{2}biciibici
bicii donnabici donna
etc.etc.

By typing the number “2” within the curly brackets, all the results displayed will be the words with two “i” letters at the end.

This Regular Expression can be useful for finding misspellings or for looking up numerical values like for IP addresses.

For example, if I want to capture IP addresses from 192.168.1.0 to 192.168.1.99 I have to write this RegEx: 192\.168\.1\.[0-9]{1,2}$

By this way, I tell that one digit must be captured – because the mubers rangingfrom 0 to 9 are composed of only one unit – and maximum two digits (the numbers from 10 to 99 have two units). The following IP address will not be captured: 192.168.1.100, simply because 100 has three units.

(dash)

Separating values with the – (dash) symbol allows me to indicate a list of characters or numbers.

Here you can see some examples which can be also used into the Google Analytics Filters:

  • [a-z]: indicates all the lowercase characters
  • [A-Z]: indicaes all the uppercase characters
  • [0-9]: indicates all the numbers
  • [a-zA-Z0-9]: indicates all the lower and upper case characters and the numbers

+ (plus)

This RegEx returns one or more values preceding the + plus sign as results.

Example: cycle+

RegexValidInvalid
cycle+cyclecyclinge
cycleeebicycling
etc.etc.

Regular Expressions with Analytics API

In addition to all the RegEx seen above, which can be used in the GA User Interface, there are a series of regular expressions, which can only be used with the Google Analytics API, for example when working on Google Sheets.

These RegEx are useful when I want to filter some dimensions or metrics from my report.

In this Google resource you can find the list with all the dimensions and metrics available in the Core Rporting API

Below some practical examples.

; (semicolon)

The meaning of this symbol is “and”.

When I want to analyze several values and I type ; (semicolon), I set the condition “and”.

Example: I want to see the sessions with mobile device and from Italy.

Regex: ga:deviceCategory==mobile;ga:country==Italy

core reporting api semicolon

, (comma)

The comma sign means “or”.

When I type the comma into a string the condition I want is “or”.

Example: I want to filter the sessions with mobile or desktop device.

Regex: ga:deviceCategory==mobile,ga:deviceCategory==desktop

analytics api comma

=~ (equal and tilde)

Using these two symbols the meaning is “includes/matches to”.

Example: I want to see the transaction Id which contains the name “test”.

Regex: ga:transactionId=~test

analytics api =~

!~ (exclamation mark and tilde)

The meaning of this RegEx is “excludes/not match”.

Example: I wantto exclude the transactions with the value “test”.

Regex: ga:transactionId=!test

anaylitics api !~

== (double equal)

Writing two equals means “exactly matches”.

Example: I want to filter the sessions from organic channel.

Regex: ga:channelGrouping==Organic Search

analytics api ==

!= (exclamation mark and equal)

This RegEx means “not match to”

Example: I want to see the sessions which are not equal to the Organic Search.

Regex: ga:medium!=organic

analytics api !=

Greater than, less than

The symbols > and < correspond respectively to “greater than” and “less than” and they can be combined with the symbol = (equal).

They can indicate “greater or equal to” when I write this: >=

Vice versa , if I want to indicate “less or equal to” I’ll write: <=

<> (between values)

Typing the RegEx <> I’ll filter the data that fall between one value and another.

Example: I want to see the transaction costs between 6 and 9.

Regex: ga:costPerTransaction<>6_9

analytics api <>

=@ (equal and at)

The RegEx =@ means “contains substring”.

Example: I search the campaign with the value “social”.

Regex: ga:campaign=@social

analytics api =@

~@ (tilde and at)

This RegEx means “not contain the substring”.

Example: I search all the campaigns except those that contain the value “social”.

Regex: ga:campaign~@social

analytics api ~@

Filter By Segment With the API

There is another interesting mehod that can be used in Google Sheets and it is the same method as using segments in the Google Analytics User Interface.

Example: I want to filter the sessions from Italy.

If I worked into the Analytics UI, I’ll create a segment like this: Sessions – Include – Country – contains – Italy

google analytics segment

In Google Sheets I should write the following expressions, in the line referring to the segment:

sessions::condition::ga:country=~Italy

api analytics segment

I can simpl add several conditions, like in the Analytics UI, by typing the semicolon symbol (;) and rewriting “sessions::condition::“, like in this example:

sessions::condition::ga:country==~Italy;sessions::condition::ga:deviceCategory==mobile

filter by segment in google sheets

The string above has the same meaning of the following segment in the Analytics User Interface:

google analytics segment

Practical Examples: RegEx and Google Analytics

Let’s now see some practical examples of Regular Expressions into action in Google Analytics User Interface.

a) Report Filters

You can filter every reports in GA, but using the RegEx all is simpler and faster.

I can type the regular expression either in the simple Filter or into the advanced options (by clicking on “advanced”) and selecting “matching RegExp“:

regular expressions analytics

b) Custom Reports

I have the possibility to create several custom reports in GA in order to improve my analysis.

I can filter my custom reports. So I can see only the data I need to. By using the RegEx all becomes simpler.

In the image below I’ve created a custom report that shows me some steps of my funnel. Those steps have the following naming: STEP_1, STEP_2 until STEP_5. Creating this regular expression, all is more easy: STEP_[1-5]:

custom reports regex

c) View Filters

In the traditional Universal Analytics version we have the Views where we can apply filters to see only some data.

View Filters can be created easily by using regular expressions. In the image below there’s a filter which shows us all the values for the Referral dimension, simply using the regex: (.*)

view filter regex google analytics

d) Goals

It’s more easy create a goal using the regular expressions.

As you see in the example below, it’s been set up a Goal on the destination page that starts with the URL string “/confirmation” (Regex: ^/confirmation) and follows a certain funnel of several other pages:

goals regex google analytics

e) Segments

Segments are a fundamental tool to analyse data. With Regular Expressions it’s possible to create more qualitative segments as you can see in the example below. The segment in the image catches sessions from users who have selected at least one Store (the Store value has collected by the custom dimension with index 25):

segments google analytics

f) Channel Grouping

In the admin section of Google Analytics, under the column View is possible to set up several channels. Using regular expressions is more simple to create a new one, as you can see in the image below where the values “l.facebook.com” and “m.facebook.com” have collected under the Channel called “Facebook”:

channel grouping google analytics

g) Content Grouping

When you create a new Content Grouping for the content of your website (read the post to know more about what Content Groupings are in GA), the Regular Expressions can help you to build qualitative groups of content.

You have the possibility to define set of rules using RegEx.

content grouping regular expression

Conclusions

Regular Expressions are an important help that we need to know in the web analytics field. By using them we have the possibility to quickly filter data but also to set up our account in GA.

If you are at the beginning of the path with the RegEx, don’t be afraid! I suggest you to start using the simpler Regular Expressions like the | (pipe) or ^ (caret), perhaps starting with the simplest reports such as the most searched terms on the website.

Now, what to say… sit back, relax and enjoy your RegEx!

Good analysis!


You may also be interested by the following articles:

  • Google Analytics 4: source/medium report
    One of the most popular reports within the Universal Analytics version is definitely the Source/Medium Report. In this Report, you can quickly observe the source and medium of users landing on the website. In UA we can find it under Acquisition > All Traffic > Source/Medium The new version of GA4 offers many features but […]
  • Landing Pages Report in GA4
    One of the most interesting reports in Google Analytics – Universal Analytics is the Landing Pages Report. In this report you can observe the first landing page of a user on the analyzed website. It’s a very useful report to quickly understand what our user’s entry points are. However, the question may arise: how do […]
  • Google Analytics 4: Comparison
    In this post, I’ll show you how to compare data in Google Analytics 4. Just as with the UA (Universal Analytics) version, where you can use segments to better analyze your users’ behavior on the site, GA4 offers a similar feature. However, there are some nuances, compared to the UA version. Compare data in GA4 […]
  • Search Console and Google Analytics 4
    In this post I will show you two things that I find very useful: Connect the Google Search Console directly to your Google Analytics Property 4 Save a widget for faster access to Search Console data in the GA4 User Interface Let’s start! How to Link Search Console to GA4 Property The steps to link […]
  • How to Change Language in Google Data Studio
    Google Data Studio is a great data visualization tool that is completely free, allowing you to connect different data sources to create dashboards. Some of the available data sources are: Google Analytics Google Sheets Google BigQuery File in .csv etc. The sources are different and, as you have seen, they are not only from the […]
  • How to Implement Content Group in Google Analytics 4
    Content Groups allow you to create sections of the website, grouping contents in a convenient way for your analysis. Let’s give some examples: I can create a specific group that shows me the most viewed Brands on my website or I can create Product categories to analyze, at a higher level, the interactions of users […]
  • Server-side Tagging: what is it?
    Update: October 5th. Google Tag Manager Server Side is officially out of Beta, as confirmed by Google, and has entered a new phase. Announced in August 2020, Google Tag Manager Server-side is still a tool / theme unknown to most. There are many doubts and questions on the subject and in this post I want […]

Lascia un commento

Il tuo indirizzo email non sarà pubblicato.