SQL RegEx

SQL RegEx : Basic


Regular expression (aka RegEx) is a way to define the pattern of any text. This pattern can represent  the complete text or a part of the text. RegEx is a very powerful and handy to use option for to work with strings specially for search and replace functions.

After wandering about 100-105 itexamlab.com for some 400-101 pdf hours, I returned to 210-065 study guides the landing-place; 400-101 pdf but, before reaching it. I was overtaken by a tropical storm. 1 210-065 study guides tried to find shelter 400-101 pdf 100-105 itexamlab.com under a tree which was so thick that it 400-101 pdf would 210-065 study guides never have 100-105 itexamlab.com been penetrated by common English 400-101 pdf rain; but here, in a couple of minutes, a little torrent flowed down 100-105 itexamlab.com the trunk. It is to 210-065 study guides this 400-101 pdf 400-101 pdf violence of the rain we 100-105 itexamlab.com must attribute 400-101 pdf the verdure at the bottom of 100-105 itexamlab.com the thickest woods: if the showers were like those of a colder climate, the greater part 210-065 study guides would be 100-105 itexamlab.com absorbed 210-065 study guides or evaporated before it reached the ground. I will not at 210-065 study guides present attempt to describe the gaudy 100-105 itexamlab.com 400-101 pdf scenery of this noble bay, 100-105 itexamlab.com because, in our homeward voyage, we called here 210-065 study guides a second 400-101 pdf time, and I shall then have 400-101 pdf occasion to remark on it.

A small forest in 400-101 pdf my memorythis is a 100-105 itexamlab.com forest that not as 100-105 itexamlab.com large as a collage, but it is all of my 400-101 pdf 400-101 pdf childhood. in 210-065 study guides summer, 100-105 itexamlab.com i can smell sweet from different flowers, 210-065 study guides and play games with friends 210-065 study guides behind the 100-105 itexamlab.com trees. i 100-105 itexamlab.com 400-101 pdf 210-065 study guides 100-105 itexamlab.com 210-065 study guides also can hear the sounds of owl and 400-101 pdf birds. and rabbites and hedgehog. in 210-065 study guides winter, there is a 210-065 study guides white 210-065 study guides world, i can play 100-105 itexamlab.com snow with my friends, and make snow man. there are most improtatant memory for me,and i can not forget this period of my life.

Almost all modern programming languages support RegEx in similar manner and with latest releases IBM introduced RegEx in IBM I SQL.

Here is an example of how can RegEx be useful-

Pattern for a simple email (like abc123@gmail.com) address is something like this-

  1. 1st character must be an English alphabet.
  2. After 1st character, there can be any number or an alphanumeric characters (in some cases, few special characters are also allowed).
  3. Then there must be one “@” character
  4. @ is followed by a domain name which also has a pattern in itself-
    • any number of alphanumeric characters (mostly company name like gmail)
    • followed by a ‘.’ (dot)
    • followed by at least two English alphabets (like com/net etc) which is called Toplevel domain

Now,  a RegEx rule, for this pattern, can be defined as below-

[a-zA-Z][a-zA-Z0-9]*@[a-zA-Z0-9]+\.[a-zA-Z]{2,}

At first, above string looks a string of some random characters with no meaning (like a foreign language). But after understating “Regular expression” (explained below), this will make perfect sense.

To start with, let us first look into different entities involved in this RegEx.

  1. Character Classes : represented using meta characters within Square brackets  [ ]
    1. one character class matches or validates only 1 character
      1. [abc] : This character class will match any of the “a” or “b” or “c”.
      2. [146] : This will match ant of the 1 or 4 or 6.
    2. Use hyphen (-) to define the range (Called Range separator)
      1. [a-f] : will validate any one character from the range a to f (lower case) of English alphabet. (i.e. a, b, c, d, e, or f).
      2. [a-z] : similarly, it will validate any lower case character from a to z.
      3. [a-zA-Z] : validates any character from ( lower case a to z ) or (upper case A to upper Z).
      4. [a-zA-Z0-9] :  validates any character from ( lower case a to z ), (upper case A to Z), or (digits 0 to 9).
    3. We will discuses few more cases for “Character Classes” later.
  2. Meta characters
    1. Meta charters have special meaning in RegEx.
    2. For example, a hyphen (-), at first point, is a meta character to define the range and Square brackets to define a character class.
    3. We used other meta character * (asterisk) in our example.
      1. * is a quantifier to define the quantity ZERO or MORE preceding part(s) of RegEx.
      2.  It means that preceding part of RegEx can be repeated ZERO or MORE times.
    4. Anther meta character in our example is + (plus)
      1. Same as *, it is a quantifier and defines the quantity ONE or MORE.
      2. Using a ‘+’ means that preceding part of RegEx must be repeated at least ONCE can be repeated for any number of times.
  3. Meta character escaping 
    1. In above example, we used one meta character “.” (DOT ). A “.” (DOT), as meta character, can match any character;  like an alphabet, adigit, or any special character.
    2. But for email, we need exactly a “.” DOT (not any other character).
    3. To do this we need to tell RegEx not to use DOT as meta character. So we need to escape the DOT with a backward slash “\”.
    4. If any meta character is escaped with a backward slash “\”, it will lose all its power to be meta character and RegEx will consider that meta as a simple literal.
  4. String literals
    1. “@” sign, in the example, is a string literal.
    2. There is no special meaning to this and it must match as it is on the same position and same number of times.
  5. {n,}
    1. Curly brackets are also meta character of type quantifiers.
    2. “n,” (character “n” followed by “,” ) defines that preceding RegEx must match atleast “n” or MORE times (n is a number).

 

Now let us break our RegEx into different parts-

[a-zA-Z][a-zA-Z0-9]*@[a-zA-Z0-9]+\.[a-zA-Z]{2,}

  1. [a-zA-Z] 
    1. It has Square brackets [ ] and  Hyphen for range; so any one character from a to z or A to Z is valid
    2. So this complete part will match first part of the email and makes sure that first character must be an English alphabet.
  2. [a-zA-Z0-9]*
    1. It has Square brackets  [ ]  and  Hyphen for range
      • So as per this :  one character from (English alphabet) or (digit is valid)
    2. last character of this part (“*”) is a meta character which means any character from (English alphabet) or (digit) can be repeacted ZERO or MORE time(s) (which is defined using preceding RegEx [a-zA-Z0-9]).
    3. This defines the 2nd part of email which says that after 1st character, there can be any number of alphanumeric characters or digits.
  3. @
    1. This is a string literal and must match as it is.
    2. One @ is required.
  4. [a-zA-Z0-9]+
    1. Square bracket part [a-zA-Z0-9] is same as defined earlier (As per Square brackets part:  a  character from (English alphabet) or (digit) is valid).
    2. Last character + is another meta character
      1. So there must be at least one character from (English alphabet) or (digit) defined using preceding RegEx [a-zA-Z0-9].
    3. This define the 4th part (company name  from domain ) of email
  5. \.
    1. It is an escaped meta character “.” (DOT) and must be a string literal “.” (DOT) and must match as it is.
    2. It means that one “.” (DOT) is required after company name.
  6. [a-zA-Z]{2,}
    1. Square bracket part [a-zA-Z] is same as defined earlier as per Square brackets part i.e.  one character from English alphabet (a to z or A to Z).
    2. {2,} – This part defines that there must be at least 2 character from English alphabets (defined using preceding RegEx [a-zA-Z])
    3. this define the final part (Top level domain) of email.

 

You can use website http://www.regexr.com/ to verify our RegEx.  At Footer part of the webpage there is a link “Explain” which will display how each part of RegEx is used to match the given text.

 


IBM i developer.

View Comments
There are currently no comments.