Getting Started with Adobe After Effects - Part 6: Motion Blur


Upload Image Close it
Select File

Browse by Tags · View All
#tsql 3
brh 2
#xml 1
views 1
SQL Server 1
regular expressions 1

Archive · View All
March 2012 1
June 2011 1
February 2011 1

Extracting integers from strings

Feb 24 2011 5:28PM by IGO   

Usually, if we have to extract numbers from a string, in any programming language, the first idea is to use regular expressions. How could we use regular expressions to extract integers from strings in TSQL?

Let’s suppose we have a table with a string column in which we can have letters and numbers. We want to extract the numbers, the possible situations are:

  • numbers can be at any possition in the string column
  • the numbers are integers but they can have different length
  • in a string there can be one or many numbers

We want to obtain a column that will contain only the numbers in the string column, if there are many numbers in a cell, we will have all of them separated by comma (for example). We will define the following test data:

DECLARE @sample TABLE(
                      id INT,    
                      data VARCHAR(200) ) 
INSERT @sample
       SELECT 1, 'bar 1245 foo' UNION ALL
       SELECT 2, 'foo 35234 bar 38383' UNION ALL
       SELECT 3, 'bar 126831' UNION ALL
       SELECT 4, '72348707 foo ' UNION ALL
       SELECT 5, 'foo bar';

We want to use regular expressions, and we have to find an algorithm to solve the following problems:

  • how to identify where it starts a number and where it ends
  • how to find one or many occurences in the same string

For the first problem, identifing a number in a string can be done using regular experssions like [0-9][^0-9] which means “one or no number followed by not a number”. We can use this regular expression in PATINDEX and manage the special case of a number at the end of the string. 
For the second problem, we can use a recursive CTE to find all occurences of an integer number in the string.

So, a possible solution can be:

WITH numberList AS (
      SELECT
            id, data, nEnd, nStart,
            SUBSTRING(data, nStart,
                            CASE
                                WHEN nEnd = 0 THEN LEN(data)
                                ELSE nEnd
                            END - nStart + 1) AS numbers
      FROM
          (SELECT
                 data, id,
                 PATINDEX('%[0-9]%', data) AS nStart,
                 PATINDEX('%[0-9][^0-9]%', data) AS nEnd
           FROM 
               @sample
           WHERE 
                data LIKE '%[0-9]%') process    
      UNION ALL
      SELECT
            id, data, nEnd, nStart,
            SUBSTRING(data, nStart, 
                            CASE
                                WHEN nEnd = 0 THEN LEN(data)
                                ELSE nEnd    
                            END - nStart + 1) AS numbers
      FROM
          (SELECT
                 numbers, id,
                 SUBSTRING(data, nEnd + 1, LEN(data) - nEnd) AS data,
                 PATINDEX('%[0-9]%', SUBSTRING(data, nEnd + 1, LEN(data) - nEnd)) AS nStart,
                 PATINDEX('%[0-9][^0-9]%', SUBSTRING(data, nEnd + 1, LEN(data) - nEnd)) AS nEnd
           FROM 
               numberList
           WHERE 
                nEnd > 0 AND 
                data LIKE '%[0-9]%') process
      WHERE 
           nStart > 0)
SELECT distinct
      id, STUFF((SELECT 
                       ', ' + numbers 
                 FROM 
                      numberList nl2
                 WHERE 
                      nl2.id = nl.id
                 FOR XML PATH('')), 1, 2, '' ) AS numbers 
FROM 
    numberList nl

The most important points in this algorithm are:

  • to have inside the CTE a select that keeps the start and end positions of the numbers found in the string
  • the condition WHERE nEnd > 0 is used to exclude rows from CTE processing once they have been searched
  • the SUBSTRING functions uses for the last parameter a CASE instruction that takes care of the special case of having numbers at the end of the string
  • the result contains two columns, the id and the concatenated numbers found in the initial string column. To concatenate strings a combination of two instruction have been used: STUFF and FOR XML PATH('')

 

This example only works for integers, decimals present the extra difficulty of interpreting the decimal separator ('.' or ',') in different ways depending on its position.  

Tags: regular expressions, #tsql, SQL Server, brh,


IGO
204 · 1% · 228
0
Liked
 
0
Lifesaver
 
0
Refreshed
 
0
Learned
 
0
Incorrect



Submit

Your Comment


Sign Up or Login to post a comment.

    Copyright © Rivera Informatic Private Ltd Contact us      Privacy Policy      Terms of use      Report Abuse      Advertising      [ZULU1097]