SQL RegEx

SQL RegEx : Capturing groups 101


Capturing groups” are parts of RegEx which are used to group one or more characters inside a RegEx. Let me try to explain with a simple example.

The 300-115 questions day 300-115 questions 210-065 study guides has past delightfully. Delight itself, 210-065 study guides 300-115 questions however, is a 210-065 study guides weak 200-310 demo 210-065 study guides 200-310 demo term 210-065 study guides to express the feelings 300-115 questions of 300-115 questions a naturalist who, for the first time, has 200-310 demo been wandering by himself 200-310 demo 200-310 demo 200-310 demo in a Brazilian forest. 300-115 questions 200-310 demo Among the multitude 300-115 questions 210-065 study guides of 200-310 demo striking objects, 210-065 study guides the general luxuriance 210-065 study guides 300-115 questions of the vegetation bears away the victory. The elegance of 300-115 questions 200-310 demo the grasses, the novelty of the parasitical 300-115 questions plants, 210-065 study guides the beauty 200-310 demo of 200-310 demo 300-115 questions 300-115 questions the flowers, the glossy green of the foliage, all tend to this end. A 210-065 study guides 300-115 questions 200-310 demo most paradoxical 200-310 demo mixture 210-065 study guides of sound and silence pervades 210-065 study guides 300-115 questions 200-310 demo the shady parts 210-065 study guides of the 210-065 study guides wood.

The 200-310 demo noise 300-115 questions from the insects is 210-065 study guides 200-310 demo so loud, that it may be heard even in a vessel anchored several hundred yards from the 210-065 study guides shore; yet within the recesses of the forest a universal silence appears to reign. To a person fond of natural history, such a day as 300-115 questions this, brings with it a deeper pleasure than he ever can hope again to experience.

Email address as Source Text : abc123@qsys400.com

Here is a simple RegEx to match this email address : [a-zA-Z]\w+@\w+\.\w+

This RegEx matches the given Email address perfectly. If we look closely, we can break this email address in following parts-

  1. User name “abc123” : is a group of alphanumeric characters starting with English alphabet. =>  [a-zA-Z]\w+
  2. address sign “@” : is a string literal to create a partition between user name and domain name. => @
  3. Domain name “qsys400” : is a group of alphanumeric characters. => \w+
  4. Dot “.” : String literal dot to create a partition between domain and top level domain name. => \.
  5. Top level domain “com”: is a group of alphanumeric characters. => \w+

So if we write the above RegEx again to group character as described above, it will look as ([a-zA-Z]\w+)(@)(\w+)(\.)(\w+)

Now this RegEx has 5 groups

  1.  ([a-zA-Z]\w+)
  2. (@)
  3. (\w+)
  4. (\.)
  5. (\w+)

What is the use of these groups?

Every programming language which supports RegEx, gives functionality to get values corresponding to each  groups in RegEx.

For example let say I want to get user name and domain name (including top level domain) from the email address

RegEx :  ([a-zA-Z]\w+)@(\w+\.\w+)

In this RegEx there are following 2 groups

  1. ([a-zA-Z]\w+) : from User name (like abc123)
  2. (\w+\.\w+) :  for domain name (like qsys400.com)

“@” is not a part of any of the groups.

So when a programming language processes this RegEx, it gives functionality to get(or capture) the value for GROUP 1 and GROUP2. Where value of GROUP1 will be user name from the email and value of GROUP2 will be domain name .

 

Capturing Groups Numbering

Each capturing group, in a RegEx, gets a unique number starting from 1. It is very simple. From left, start giving number to each opening parenthesis “(“-

(A)(B) ===> Group 1 (A) ===> Group 2  (B)

Same rule applies for Nested capturing groups-

(A(B))(C) ==> Group 1 (A(B)) ==> Group 2 (B) ==> Group 3 (C)

Example:

RegEx  (\w{3})  : It creates a group of 3 characters

Source Text : “abcdef ghi

Based on RegEx [without capturing groups] \w{3} will find 3 matches

  1. abc
  2. def
  3. ghi

With “Capturing groups”, RegEx  (\w{3}) will create one group for each match

Match# Full Match Group# Group value
1 abc 1 abc
2 def 1 def
3 ghi 1 ghi

Group ZERO

  1.  Group numbering starts with number 1.
  2. In some RegEx engines, there is a GROUP ZERO which contains the value of complete RegEx match.

Example

RegEx (\d{3})\w+

Source Tex : “123abc%def”

  1. This complete RegEx will match “123abc” from source text i.e. GROUP ZERO = “123abc”
    1. With in this match Group 1 will contains the value “123”

 Numbering with Quantifiers

If a quantifier is added with any group in RegEx , number of groups in RegEx does not change which means “quantifiers do not impact number of groups”.

(A){3} ==> It says that there must be exact 3 occurrences of capturing group (A)

But number of groups in this RegEx is still one. So, you will get value of GROUP 1 (there is no GROUP 2 or GROUP 3). With every new match, previous value of the group will be overridden by new value.

Example

RegEx  (\w{3})+  : Last character “+” adds the quantifier (ONE or MORE occurrences) to the RegEx group with creates a group of 3 characters

Source Text : “abcdef ghi

  1. Due to quantifier on First Match, this RegEx will consume “abcdef” (due to space before “ghi”)
    1. Text “abcdef” has 2 sets of length 3 characters
      1. “abc”
      2. “def”
    2. So technically First match contains 2 groups.
    3. But there is only 1 group defined in this RegEx i.e (\w{3})
    4. Here is how it works
      1. After consuming “abcdef” ,RegEx will try to find group value (it has only 1 Group)
        • First match at “abc” ===>    Group 1 = “abc”
        • 2nd Match at “def” ===>      Group 1 = “def”
          • 2nd match will override the value of 1st match for GROUP 1
Match# Full Match Group# Group value
1 abcdef 1 def
2 ghi 1 ghi

Capturing Groups Names

  1. Some RegEx engins allow to name the Capturing Group. [Still working on this part]

IBM i developer.

View Comments
There are currently no comments.