SQL Server 2005 2008 Integration Services (SSIS) information, products, free scripts, tasks, components, productivity
Got a SQL Server or .NET question? Discuss it in the forums. (SQL Server Forums | Dot NET Forums)
First Time? You can support us by signing up. It takes only 5 seconds. Click here to sign up. If you already have an account, click here to login.

TSQL Challenge 21 - Reading, recognizing and processing graphical representation of digits.

Here is a challenge that takes you away from those repetitive boring type of queries that you write over and over again, several times a day. All of us, the database people, are familiar with thinking in set based manner as well as row by row style. Here is something that is very interesting where you might need to process records in a 'three-line-at-a-time' fashion.

For the purpose of this challenge, imagine that you are working for a bank which just decided to scan all the banking documents. Assume that they have an old fashioned scanner which scans the documents and produces a text file with the customer number. So far so good. Well, not really! Unfortunately the scanner produces a graphical representation of the customer number using three lines of symbols: space, unerscores and pipe characters. Here is an example of the output produced by the scanner.

    _  _     _  _  _  _  _
  | _| _||_||_ |_   ||_||_|
  ||_  _|  | _||_|  ||_| _| 
 

Here are the rules to keep in mind while reading and recognizing the output generated by the scanner.

  • Each digit is represented using 9 cells (3x3)
  • Only spaces, underscores and pipe characters are used
  • The number of digits in each account number may vary.
  • The Scanner is not 100% reliable and it might produce some digits that are invalid
The Challenge

Your job is to read the output produced by the scanner and identify the the customer number represented by each image. Remember that the scanner is not very reliable and it might produce invalid digit representations. For each digit that is not valid, set the value to 'X'

Sample Data

Here is the sample data for this challenge. Please take care with spaces, tabs and carriage returns as each digit is represented by three lines of text and if a space, tab or carriage return is misplaced, the whole image will be distorted.

Id          ScanNumber
----------- ---------------------------
1            _  _  _  _  _  _  _  _  _ 
            | || || || || |  || ||_ |_|
            |_||_||_||_||_|  ||_| _| _|
                           
2               _  _  _  _  _  _     _ 
            |_||_|| || ||_   |  |  ||_ 
              | _||_||_||_|  |  |  | _|
                           
3            _  _  _     _  _  _  _  _ 
            |_ |_|| || ||_ |_| _|  ||_|
            |_||_||_||_||_||_||_   | _|
                           
4               _  _  _  _  _  _     _ 
            |_||_|| ||_||_   |  |  ||_ 
              | _||_||_||_|  |  |  ||_|
                           
5               _  _  _  _  _  _     _ 
            | ||_|| ||_||_   |  |  ||_ 
              | _||_||_||_|  |  |  ||_|
                           
6            _     _  _     _  _  _  _ 
            | |  | _| _||_||_ |_   ||_|
            |_|  ||_  _|  | _||_|  ||_|

 

Expected Results

Based on the sample input and the rules discussed earlier, here is the expected output.

Id          Value
----------- ---------
1           000007059
2           490067715
3           680X68279
4           490867716
5           X90867716
6           012345678
Sample Scripts

Use the following script to generate the sample data for this challenge.

DECLARE @t TABLE (Id int, ScanNumber NVARCHAR(116))

INSERT INTO @t
SELECT	1,--> 000 007 059
' _  _  _  _  _  _  _  _  _ 
| || || || || |  || ||_ |_|
|_||_||_||_||_|  ||_| _| _|
                           
' UNION 
SELECT  2, --> 490 067 715
'    _  _  _  _  _  _     _ 
|_||_|| || ||_   |  |  ||_ 
  | _||_||_||_|  |  |  | _|
                           
' UNION
SELECT	3, --> 680 X68 279
' _  _  _     _  _  _  _  _ 
|_ |_|| || ||_ |_| _|  ||_|
|_||_||_||_||_||_||_   | _|
                           
' UNION
SELECT	4, --> 490 867 716
'    _  _  _  _  _  _     _ 
|_||_|| ||_||_   |  |  ||_ 
  | _||_||_||_|  |  |  ||_|
                           
'  UNION
SELECT	5, --> X90 867 716
'    _  _  _  _  _  _     _ 
| ||_|| ||_||_   |  |  ||_ 
  | _||_||_||_|  |  |  ||_|
                           
' 
UNION 
SELECT 6, --> 012 345 678
' _     _  _     _  _  _  _ 
| |  | _| _||_||_ |_   ||_|
|_|  ||_  _|  | _||_|  ||_|
                           
' 

Notes

  1. Each record may have more than three lines of data (each line is separated by a CR and LF). Your code should consider only the first three lines.
  2. The length of the first three lines of each recrd will always be the same and will be divisible by three.
  3. There may be 3x3 blocks of spaces in the string. In such a case, you should generate an empty space in the output. If a 3x3 block does not create a valid digit (except for the case of a 3x3 block of spaces), you should generate an "X".
  4. The number of 3x3 blocks in each record may vary
  5. Read the Submission Guidelines and make sure that your solution follows them.
  6. The solution should work on SQL Server 2005, 2008 or later versions
  7. Use this forum for any questions related to TSQL Challenge #21 

Last date for submitting your solutions

We will close this challenge for evaluation on 25 January 2010 Midnight GMT. All the solutions we receive till that date will be processed and the results will be announced in the format given here. However, you can continue to send us your solutions till the day we announce the evaluation results.

How to submit a solution after the results are published?

You can submit solutions for challenges even after we publish the results. However, there are a few things you should do before submitting your solutions.

  • Do a basic testing of your solution using the sample data we post along with the results of the evaluation. See an example here.
  • If your solution passes the basic testing, do a logic testing of your solution. We will publish the data to be used for the logic testing along with the challenge results. You can see an example here.
  • If your solution passes the logic testing, do a load testing of your solution using the load testing data that we publish along with the challenge results. You can see an example here.
  • After load testing, match your results with the results we have published for that challenge. (see an example here). If the results of your solution comes in the top 10 solutions listed on the results page, go ahead and send us your solution. Make sure that you send us the results of your load testing along with your submission.
  • We will re-evalute your solution and if it comes in the top 10, we will list it in the results page and you will be added to the winners page (see an example here) and will get a certain number of SQL Stars based on the grade of your solution.

About the Author


Challenge Evaluation Details

Evaluation QueueAll SubmissionsAccepted Submissions
Basic Testing ResultsLogic Testing ResultsPerformance Testing Comparision
Winners

Though the evaluation of this TSQL Challenge is completed and winners are announced, you can still submit a solution and we will be very happy to review it. Before submitting, make sure that your solution passes the basic testing and logic testing and the performance statics are good. See the submission guidelines before submitting your solution.

Previous Challenge

Current Challenge

Next Challenge

TSQL Challenge 20 - Identify repeating digits in Fibonacci SeriesTSQL Challenge 37 - Calculate the downtime and duration of servers based on the monitoring logTSQL Challenge 22 - Filling balls into boxes based on rules and preferences

Share

Comments

# re: TSQL Challenge 21 - Reading, recognizing and processing graphical representation of digits.

Monday, January 11, 2010 2:47 AM by vc.yap

Hi Rui,

The result for Sample Scripts seems doesn't produce the sample data. it seems missing a space for each of the first row .

there is also a missing underscore in line 30, to form the digit "4"

Thanks.


# re: TSQL Challenge 21 - Reading, recognizing and processing graphical representation of digits.

Monday, January 11, 2010 3:05 AM by Peso

No, the missing underscore character is there for a purpose. Compare the source data and the expected output.


# re: TSQL Challenge 21 - Reading, recognizing and processing graphical representation of digits.

Monday, January 11, 2010 3:06 AM by Jacob Sebastian

Hi vc.yap,

There was a missing space and I have corrected it. Thank you for pointing it out.

regards

Jacob


# re: TSQL Challenge 21 - Reading, recognizing and processing graphical representation of digits.

Monday, January 11, 2010 3:47 AM by batchakamal

Can we write a function or stored procedure? or even it should be through one TSQL statement?


# re: TSQL Challenge 21 - Reading, recognizing and processing graphical representation of digits.

Monday, January 11, 2010 4:11 AM by Jacob Sebastian

This challenge does not allow creating a function or stored procedure. The solution should be a single query that starts with either SELECT or WITH.

For those who dislike the 'one-query' requirement, there is something interesting coming up in the next challenge :-)

regards

Jacob


# re: TSQL Challenge 21 - Reading, recognizing and processing graphical representation of digits.

Monday, January 11, 2010 4:48 AM by Aurelien Verla

Hello Rui, Jacob,

9 digits is a fixed value or should we take care of other digits length ?

Does each line have exactly the length of 3 * [digits length] or could it have some trailing spaces before the carriage return ?

In the sample data, with copy/paste, the last id does not produce a full spaces last line, is this the behavior or does this come from the copy/paste ?

Regards,

Aurelien


# re: TSQL Challenge 21 - Reading, recognizing and processing graphical representation of digits.

Monday, January 11, 2010 5:30 AM by Jacob Sebastian

Hi Aurelien,

#1 - Your code should take care of more than 9 digits.

#2 - There may be spaces (leading or trailing) in the string values.

#3 - I could not see the problem you mentioned about 'full spaces last line'. could you explain what is the problem you found?


# re: TSQL Challenge 21 - Reading, recognizing and processing graphical representation of digits.

Monday, January 11, 2010 6:18 AM by Sergejack

The rules are chaning again?!

I give up ^^

Challenge 15 went to interresting to trivial.

And this one goes from interresting to tricky at the moment being but who knows how trivial/tricky it'll end.

BTW: a "Delete" button would do good in the media gallery.


# re: TSQL Challenge 21 - Reading, recognizing and processing graphical representation of digits.

Monday, January 11, 2010 6:29 AM by Pragnesh Patel

Hi Jacob,

Still it seems there is missing one space at end of each digit. As you mentioned Each digit is represented using 9 cells (3x3), but if you can see below example its not.

eg.

' _  _  _  _  _  _  _  _  _' ----> 1 space is missing

'| || || || || |  || ||_ |_|'

'|_||_||_||_||_|  ||_| _| _|'

'    _  _  _  _  _  _     _' ----> 1 space is missing

'|_||_|| || ||_   |  |  ||_' ----> 1 space is missing

'  | _||_||_||_|  |  |  | _|'

-- Pragnesh


# re: TSQL Challenge 21 - Reading, recognizing and processing graphical representation of digits.

Monday, January 11, 2010 6:34 AM by Pragnesh Patel

Jacob,

Continue in above example...

1st 3 lines are from ID = 1

and next 3 lines are from ID = 2

--Pragnegsh


# re: TSQL Challenge 21 - Reading, recognizing and processing graphical representation of digits.

Monday, January 11, 2010 7:05 AM by Jacob Sebastian

@Pragnesh,

I checked the sample code and see that the first line has 27 characters in all the rows.

Try running the following query and see if you are getting "28" on all rows.

"select

charindex(CHAR(13), ScanNumber) FirstCRPosition

from @t"

If you are getting 28, it means that there are no missing characters. If you are getting 27, please do a CTRL+F5 on your browser and copy the code again. We had fixed an issue with spacing this morning and it could be that your page is loaded from the cache.

Let me know if there is still a problem.


# re: TSQL Challenge 21 - Reading, recognizing and processing graphical representation of digits.

Monday, January 11, 2010 7:15 AM by Jacob Sebastian

@Sergejack,

[quote]

The rules are changing again?!

[/quote]

Well, length of the string being processed and the possible leading and trailing spaces are two common questions that would come across when processing strings. The "Logic" testing part of the evaluation process tries to see how much care the author has taken to handle unexpected cases such as longer strings, or strings having extra spaces etc.

We are not changing the rules here, but trying to add more clarification to the logic.

[quote]

Challenge 15 went to interesting to trivial.

[/quote]

Agreed/accepted!

[quote]

And this one goes from interesting to tricky at the moment being but who knows how trivial/tricky it'll end.

[/quote]

At the moment, it does not look like we will have a problem similar to challenge #15 on this one :-)

[quote]

BTW: a "Delete" button would do good in the media gallery.

[/quote]

The problem is setting up the permission so that each user can delete only his/her own submissions. The site is built on community server and if some one knows how to set up this correctly, please let us know.

regards

Jacob


# re: TSQL Challenge 21 - Reading, recognizing and processing graphical representation of digits.

Monday, January 11, 2010 7:16 AM by dishdy

Jacob,

When I execute the data loading script I end up with 6 rows with the first 5 lines each of length 118 and the last line of length 91. The first 5 rows also contain 27 trailing blanks. All rows end with a final cr/lf.

So are we dealing with 6 rows or 18 rows?


# re: TSQL Challenge 21 - Reading, recognizing and processing graphical representation of digits.

Monday, January 11, 2010 7:21 AM by dishdy

Oh yes, and there a 3 cr/lf in each row separating the 3 blocks of characters.


# re: TSQL Challenge 21 - Reading, recognizing and processing graphical representation of digits.

Monday, January 11, 2010 7:55 AM by Jacob Sebastian

@dishdy,

I see your point. The last row does not have the extra line that other rows have.

I have updated the sample code with the additional line to make it uniform.

These extra lines should be ignored by your processing logic. Note that the data will be in the first three lines of the text stored in the column. Your code should ignore/remove the extra lines if present.


# re: TSQL Challenge 21 - Reading, recognizing and processing graphical representation of digits.

Monday, January 11, 2010 8:20 AM by Aurelien Verla

@jacob,

Thanks for all your answers.

Another question now regarding leading and trailing spaces !

Should we trim them or should we detect a X when a 9x9 spaces square is detected ?

Example:

 _     _ 

| |   | |

|_|   |_| => 0X0

What about:

    _    

   | |   

   |_|    => X0X or 0 ?

(I hope formating will be ok!)


# re: TSQL Challenge 21 - Reading, recognizing and processing graphical representation of digits.

Monday, January 11, 2010 8:25 AM by Mike Lewis

#1 - Your code should take care of more than 9 digits.

 - This is not a requirement set out in the question.  Based purely on the rules in the question, I think any solution looking for more than 9 digits would be over-engineered.  If extra digits exist, then I would expect any solution to simply ignore them.  Is this a fair assessment?

Also, when I copied the code to produce the test table, I only received line feed characters at the end of each row, not the cr/lf combination.  Thought that was worth pointing out.


# re: TSQL Challenge 21 - Reading, recognizing and processing graphical representation of digits.

Monday, January 11, 2010 8:43 AM by dishdy

Jacob,

OK, so you are confirming 6 rows in the sample data broken up by cr/lf all ending with 27 blanks followed by a cr/lf. Totale length each 116 or (27+2)x4.

Earlier you responded to Aurelien sayinging that 'your code should take care of more than 9 digits'. I submit that this is impossible to do in a dynamic way. The solution must somehow be told, at least via a parameter, how many digits to expect.

After all, if you start throwing garbage at the solution with the data rows of different lengths there is no way for the solution to know whether this is an error without also knowing how many digits to expect.


# re: TSQL Challenge 21 - Reading, recognizing and processing graphical representation of digits.

Monday, January 11, 2010 11:22 AM by Jacob Sebastian

@Aurelien,

I see there are some confusing parts when it comes to the logic testing part of the code.

From what I understood so far, the sample data and expected results are clear to every one. The question now is, how should the code behave when it is tested against the 'tricky'/'unexpected' data.

The tricky test 'data' is usually decided after the challenge is closed. But we see that it is a better idea to decide the tricky testing before the challenge is published, because this type of questions can be easily answered if we have the tricky data ready. I am taking this as another point to our task list to improve.

For this challenge, we will follow the guidelines given below when building the tricky data.

1. Each string may have more than 3 lines of data.If you come across such a string, you should take only the first three lines. You can skip the rest of the string.

2. The length of the first three lines of each row will always be the same, and will be always divisible by three.

3. The number of 3x3 blocks in each record may vary. Not all records will have 9 digits. It is also possible that some records will have more than nine 3x3 blocks.

4. It is possible to have a 3x3 block of spaces in the string. If you come across such a block, it should be translated to a single space in your output. For any non-recognizable character block, produce an "X" in the output. So your output can have digits 0-9, "X" or a space (" ")

Does this answer all the open questions?

thanks

jacob


# re: TSQL Challenge 21 - Reading, recognizing and processing graphical representation of digits.

Monday, January 11, 2010 11:35 AM by Jacob Sebastian

@Michael,

Each record may have different number of digits. All the rows in the sample data have 9 digits each. However, your code should be able to handle lesser or more number of digits.

Please also refer to the comments I posted above (in response to Aurelien's question)

Each line within a record is separated by a CR and LF. I think the code formatter removed the CR character from the sample code listing. But your code should expect a CR/LF pair.


# re: TSQL Challenge 21 - Reading, recognizing and processing graphical representation of digits.

Monday, January 11, 2010 11:35 AM by Mike Lewis

I'm happy with the requirements now, although it does mean I'll have to change my solution...


# re: TSQL Challenge 21 - Reading, recognizing and processing graphical representation of digits.

Monday, January 11, 2010 11:39 AM by Jacob Sebastian

@dishdy,

Total length of each row may vary. Please see my comments above on this.

I agree that it is easier to write a solution if the maximum length is known in advance. However, for the sake of this challenge, your solution should not assume any fixed number of digits in the string.


# re: TSQL Challenge 21 - Reading, recognizing and processing graphical representation of digits.

Monday, January 11, 2010 12:54 PM by Leszek Gniadkowski

Can we check only single cr (as first char of delimiter) instead of cr/lf pair?


# re: TSQL Challenge 21 - Reading, recognizing and processing graphical representation of digits.

Monday, January 11, 2010 1:16 PM by Ramireddy

Jacob,

 Is any extra space is there in for every top row of each record for the sample data???

Since, when i copy the data as it is and run against the query, its not matching any record... when i delete one extra space at the top row of every record then the numbers are matching and the query giving output.....


# re: TSQL Challenge 21 - Reading, recognizing and processing graphical representation of digits.

Monday, January 11, 2010 3:17 PM by whitneje

[quote]

3. The number of 3x3 blocks in each record may vary. Not all records will have 9 digits. It is also possible that some records will have more than nine 3x3 blocks.

[/quote]

This seems to contradict one of the original stated rules: Each account number is 9 digits long.

If the account number is going to have a variable number of digits, can you give us a range (eg. 0-20 digits)?

Can you update the article with the rule clarifications in the comments. It is getting difficult to understand the problem as stated and then read through user comments to pick up changes to the original rules.


# re: TSQL Challenge 21 - Reading, recognizing and processing graphical representation of digits.

Monday, January 11, 2010 10:14 PM by Jacob Sebastian

@Ramireddy,

The extra space given on some rows are required and the character blocks will be correctly printed with the sample data.

You might notice formatting problem if you do a select * from @t. However, if you select only the "scannumber" column, it will print correctly.

Let me know if that is not happening on your end.


# re: TSQL Challenge 21 - Reading, recognizing and processing graphical representation of digits.

Monday, January 11, 2010 10:19 PM by Jacob Sebastian

@whitneje,

we have updated the challenge description with more clarification as per the discussion done in this thread.


# re: TSQL Challenge 21 - Reading, recognizing and processing graphical representation of digits.

Monday, January 11, 2010 10:21 PM by Jacob Sebastian

I would like to invite some of you as 'early challenge reviewers' where you can review a challenge before it is published. This might give you a chance to storm us with questions and comments and we could use it to fine-tune the challenge description and fix the areas that we missed while setting up the challenge rules.

Those of you are interested to help us with this role, please send me a private message.

regards

Jacob


# re: TSQL Challenge 21 - Reading, recognizing and processing graphical representation of digits.

Monday, January 11, 2010 10:36 PM by Jacob Sebastian

@leszek_g,

All the data that we use for the testing will always have a CR + LF pair as line delimiter.


# re: TSQL Challenge 21 - Reading, recognizing and processing graphical representation of digits.

Tuesday, January 12, 2010 2:19 AM by dishdy

For the performance part of the challenge, you should specify an upper limit for the number of digits (as suggested by whitneje) or you should specify that there is 'essentially' no upper limit. The approach that needs to be taken with one rather than the other can have a notable impact on performance.


# re: TSQL Challenge 21 - Reading, recognizing and processing graphical representation of digits.

Tuesday, January 12, 2010 7:13 AM by Rui Carvalho

to all,

We paid a lot of attention to publishing problemns due to printing plain text in html but sometimes mistakes should appear when you try to download the sources.

In all the cases I suggest to use the provided function of the syntax highlighter to copy/paste the content (mouse over on the right top corner of the displayed source, then copy to clipboard). I just check it and it is correct right now.

As dishdy said the format of the scanned digits should have a length of 116 -> (27+2)x4. Each cell 3x3 should be complete with its exact size and you should not trim spaces.

For the size of the scanned number it should fit exactly 9 cells digits but your code should handle any number of digits. But that's more an information on how you should produce your query than a requirement! If your query produce exactly the values in output provided with the provided sample data it will  be valid!

Again, I see sometimes this remark, the printed output represend the values expected you are not supposed to make a query to produce characted by character the printed output. This is only a textual representation of the table data expected .It is easier for publishing reasons to set output in Management studio to text then make a copy/paste for edition than set the output to standard table and make a screen capture of the results. If this is a trouble point we can change this in our future challenges.

As Jacob said we should also provide in the future early access to the challenges for a few of you as reviewers (2-3 different people for each challenge), contact us if you are interested.

Cheers!

;-)


# re: TSQL Challenge 21 - Reading, recognizing and processing graphical representation of digits.

Tuesday, January 12, 2010 1:44 PM by whitneje

Is the output required to be sorted by ID?


# re: TSQL Challenge 21 - Reading, recognizing and processing graphical representation of digits.

Tuesday, January 12, 2010 5:28 PM by Rui Carvalho

@whitneje,

yes even if it was not explicitely mentionned you should sort it by ID as in the expected output.


# re: TSQL Challenge 21 - Reading, recognizing and processing graphical representation of digits.

Thursday, January 14, 2010 7:38 AM by dishdy

I still think for the performance part of this challenge you should give us un upper limit for the number of digits your code should handle.

Or simply stated, can we assume that ScanNumber is stored in a varchar(n) column and not a varchar(max) column?

Thus we can assume n to be at most 8000 and take it from there? Meaning that we need to deal with at most (8000-8)/4 or 1998 digits?

On the other hand, if n=max then we need to potentially deal with over 500 million digits.


# re: TSQL Challenge 21 - Reading, recognizing and processing graphical representation of digits.

Thursday, January 14, 2010 8:16 AM by Jacob Sebastian

@dishdy,

There is no upper limit for the number of characters in the string. Your solution should not assume a fixed number of characters in the string.


# re: TSQL Challenge 21 - Reading, recognizing and processing graphical representation of digits.

Thursday, January 14, 2010 9:02 AM by dishdy

Jacob,

I'm new to this TSQL challenge thing. Thus each solution will go through the basic, logic and load testing. We have the data for the basic testing but you keep secret the data you use for the logic and load testing. The winner has to pass the first two tests and be first in the load testing.

I presume that it's in the logic testing that you will have variable length rows plus at least one row with a huge number of digits. And if it can't handle it I'm out. Is that correct?


# re: TSQL Challenge 21 - Reading, recognizing and processing graphical representation of digits.

Thursday, January 14, 2010 9:10 AM by Jacob Sebastian

@dishdy,

Your understanding is correct. The solutions need to pass the basic testing and logic testing to get to the load testing.

It is at the logic testing phase that we test the solution with all kind of tricky data. The purpose of the tricky testing is to see how well the logic is written to handle unexpected cases. There is nothing 'secret' about the data. However, currently we are not able to provide the tricky data along with the challenge description.

We are working on improving the way challenges are managed and probably we will be able to present the tricky data in advance for the future challenges.


# re: TSQL Challenge 21 - Reading, recognizing and processing graphical representation of digits.

Thursday, January 14, 2010 9:35 AM by dishdy

My I suggest the following for the tricky data. A string containing just 'ABRACADABRA'. An empty string. That should get heads rolling.

Naturally, I'm joking. :-))


# re: TSQL Challenge 21 - Reading, recognizing and processing graphical representation of digits.

Thursday, January 14, 2010 12:05 PM by karinloos

"There is no upper limit for the number of characters in the string. Your solution should not assume a fixed number of characters in the string. "

Well in the example provided, the scannumber is Nvarchar(116)..making thus 116 chars in the string the maximum.

Following this the maximum number of "digits" that can be placed in the string on a 3x3 basis is 12

ie ( (116 / 3 ) - 2 ) / 3  = 12

This could be then relevant in terms of neat formatting of end result.


# re: TSQL Challenge 21 - Reading, recognizing and processing graphical representation of digits.

Thursday, January 14, 2010 3:40 PM by dishdy

karinloos,

From what I have understood so far, I think you should expect the logic test data that will be applied to your solution to NOT be constraint by a mear nvarchar(116) column. Thus your solution should handle at least a nvarchar(4000) column. In fact, there is no need to use nvarchar for representing this data. Thus you might get hit with a varchar(8000) column or even a varchar(max) column. In the latter case, you need to deal with a number with a potential 500 million digits.

Maybe this challenge is trying simply to solicit a solution based on a VARIABLE number of digits without the need to really support 500 million digits implied by a varchar(max) column.

Until this is made clear, the load testing part will be absolutely meaningless.


# re: TSQL Challenge 21 - Reading, recognizing and processing graphical representation of digits.

Thursday, January 14, 2010 4:52 PM by daxzeal

Should ScanNumber = char(10)+char(13)+char(10)+char(13)+char(10)+char(13) (zero 3x3 digits (zero is divisible by 3)) result in 'X' or '' or NULL?  


# re: TSQL Challenge 21 - Reading, recognizing and processing graphical representation of digits.

Thursday, January 14, 2010 6:23 PM by Rui Carvalho

We should suppose that scanner will produce errors sometimes but always by putting a '|','_',' ' in the wrong place. In the case of a problem where no character is provided it will move the characters to left or right depending on the missing characters! then in this case, your query should focus on the 3x3 cells, if a character is missing it will produce the wrong digit and you put X. If the scan is really empty with only the CRLF, there is no digit, so '' or NULL is ok for me. That's not a tricky point.

Hope it helps


# re: TSQL Challenge 21 - Reading, recognizing and processing graphical representation of digits.

Friday, January 15, 2010 6:44 AM by Serg54

Does the script which produces the following results calculates code length correctly? (it's based on the length of the substring between 1st and 2nd CR)

DECLARE @t TABLE (Id int, ScanNumber NVARCHAR(216))

INSERT INTO @t

SELECT 7, --> 012 345 678 X90 867 716

'                               _     _  _     _  _  _  _     _  _  _  _  _  _     _

| |  | _| _||_||_ |_   ||_|| ||_|| ||_||_   |  |  ||_

|_|  ||_  _|  | _||_|  ||_|  | _||_||_||_|  |  |  ||_|

'

UNION SELECT 8,

'_

'

UNION SELECT 9,

' '

UNION SELECT 10,

' _     _  _     _  _  _  _     _  _  _  _  _  _     _

| |  | _| _||_||_ |_   ||_|| ||_|| ||_||_   |  |  ||_

|_|  ||_  _ ||_|  | _||_||_||

';

Results:

id number

-- ----------------------

7 012345678X90867716

8 NULL

9 NULL

10 012XXXXXXXXXXXXXXX


# re: TSQL Challenge 21 - Reading, recognizing and processing graphical representation of digits.

Friday, January 15, 2010 7:34 AM by dishdy

Serg54,

I think you must assume the presence of at least one cr/lf. If not you should return an empty or null string. This is also true if the first cr/lf is found at position one.

Once you know the location of the first cr/lf you can assume the location of the second cr/lf even if it is not there. From this, everything else can be determined.


# re: TSQL Challenge 21 - Reading, recognizing and processing graphical representation of digits.

Saturday, January 16, 2010 7:27 AM by Jacob Sebastian

@Serg54,

I think dishdy's post answers your question.

As mentioned in the challenge description, your code should process the first three lines from the input text. The length of the first three lines will be the same and will divisible by 3. So it will not be hard to correctly break the blocks of digits.


# re: TSQL Challenge 21 - Reading, recognizing and processing graphical representation of digits.

Monday, January 18, 2010 6:50 AM by Serg54

Sebastian,

"2. The length of the first three lines of each recrd will always be the same and will be divisible by three."

" #2 - There may be spaces (leading or trailing) in the string values."

Do you mean the spaces (leading or trailing) are never added to the lines, but only can substitute some symbols not changing the line lehgth?


# re: TSQL Challenge 21 - Reading, recognizing and processing graphical representation of digits.

Monday, January 18, 2010 7:10 AM by Jacob Sebastian

@Serg54,

What I meant to say that it is possible to have spaces at the beginning or at the end of the data. If there are leading spaces, they should be considered as part of the first line.

It is possible that the input data contains more than 3 lines of text. In such a case, you can ignore everything after the first 3 lines.


# re: TSQL Challenge 21 - Reading, recognizing and processing graphical representation of digits.

Monday, January 18, 2010 10:56 AM by sgmunson

These SQL Challenges are starting to look more like hopeless pursuits than reasonable challenges, and I'm going to not bother participating in any more of them until the authors realize how simple it is to eliminate 99% of the repeated problems by doing the simplest of things - namely: 1) provide a link to download the sample data SQL file, and 2) TEST the sample data and SQL file to ensure that selecting from it's table produces the correct result.

The more serious problem would still remain, however, and that's the fact that the authors just don't realize how thorough they need to be in constructing these challenges.   There should never EVER be any questions of the nature of those being repeatedly asked on these things.   The guidelines should NOT be on a separate page, but included in every challenge page, or perhaps available in a pop-up window, and easily printed.   I don't mind doing the work to solve the problems offered, but you as authors have to at least take the responsibility of ensuring there can be no unanswered questions, and that probably means that these problems need to be "vetted" through someone who's primary language is English, and is the kind of person who can be anally thorough in ensuring the problems are properly constructed.

I was already a good hour into working on this when I then discovered that once again, the rules had, for all practical purposes, changed completely, and I was going entirely in the wrong direction.

Unfortunately, the comments to date have yet to provide sufficient detail as to how to parse the digit data, as there's a statement indicating the possibility of leading and/or trailing spaces, but without saying whether each digits data or only the entire line of digit data is what that applies to, and if it's not just the entire line that this applies to, then this problem is pointless, regardless of whether it's solvable or not.  No person with 2 ounces of common sense would make the attempt to get this data if it were so inconsistent as to have possible leading or trailing spaces for every digit, and also the possibility of no intervening spaces, as then there'd be no way to guarantee that you could accurately parse the digits, as there'd be no way to determine where a given digit starts and finishes.   Yes, these are the kinds of issues associated with unstructured data, but some of them CANNOT be solved, and based on what's been said so far, it's not clear to me that your problem is in the category of being able to be solved.

Come to think of it, I'm not even sure you can reasonably allow leading spaces on the line as well as trailing spaces, because then you need a strict rule that says exactly where to begin looking at characters because 3 spaces is considered a valid

piece of data and needs to be represented in the output as a space.   Admittedly, you'd need 3 lines of it, but when parsing one line, you'd need to know if the other lines for that same digit are also 3 spaces in the same position, and at that point, dealing with unstructured data is a problem that's at the very least, extremely difficult for the most skilled string manipulators, who would no doubt opt for better data validation rather than invest in continuing down a road where the most likely outcome is reaching the conclusion that parsing the data is actually impossible.  I've been parsing unstructured string data for the last 30 years, so please give this problem a LOT more thought than it's been given.  I think you'd be better off withdrawing this one and re-creating it from scratch, but that's just me.


# re: TSQL Challenge 21 - Reading, recognizing and processing graphical representation of digits.

Monday, January 18, 2010 11:58 AM by dishdy

sgmunson,

As far as I'm concerned, this problem is very clear now. Yes it took a bit of hackling backwards and forwards and some of the specs are now in the comments.

Why should the presence of spaces (leading or trailing) be confusing? A space is just like a digit.

In general, give me one sample line of data that you think has an ambiguous answer given the current specs.

Regards.


# re: TSQL Challenge 21 - Reading, recognizing and processing graphical representation of digits.

Monday, January 18, 2010 12:16 PM by sgmunson

Now that I've had some time to re-analyze, I've concluded you have zero chance at a solution if you allow both leading and traling spaces and no specific number of digits to be found.  There is no way you can find the starting point for all possible inputs under those conditions.

You're going to have to specify some very exacting conditions or have no solution.  Just one leading AND one traling space, with either or both being optional, destroys your chance at the problem being solved.   Think it through, folks.  This one is dead as a doornail.   There's no way you can consider 3 consecutive spaces a valid part of the digit 1 and also allow for leading and trailing spaces, and for all inputs, have a valid solution, even if the number of digits is specified, and the scanner be considered "less than 100%".

If you want to prove otherwise, you'll have quite a task ahead of you, as you'll have to generate ALL POSSIBLE INPUTS, including all possible complete 100% screwups by the scanner (pretend it can be broken in all possible ways, including absolutely no consistency in which text elements it outputs based on its input.


# re: TSQL Challenge 21 - Reading, recognizing and processing graphical representation of digits.

Monday, January 18, 2010 12:31 PM by sgmunson

Ok, now I see where my problem with this began.  A re-read of the original problem guarantees no leading or trailing spaces in the digit data, yet that became a significant concern within the comments, that I misunderstood - the wording used was vague, and my interpretation wasn't helping.  I'm not even going to try to re-interpret, as that has nothing to do with the challenge solution.

My apologies for any ruffled feathers.  I see now what needs to be done.


# re: TSQL Challenge 21 - Reading, recognizing and processing graphical representation of digits.

Tuesday, January 19, 2010 3:53 AM by Rui Carvalho

@sgmunson,

I am happy that you are now in touch with the challenge ;-)

about your first comment, please note that we are a small team of volunteers trying to do their best to provide great challenges, but sometimes things that seem clear for us are not finally so obvious and some mistakes can also appear.

That's why our team is growing and that's why we purpose in the last newsletter to people to get an early acces to challenges in order to beta test it.

hope it is clearer now and that you will enjoy challenges!

I am very interested in your story about parsing unstructured data ;-)


# re: TSQL Challenge 21 - Reading, recognizing and processing graphical representation of digits.

Tuesday, January 19, 2010 4:09 PM by kongstad126

Very good challenge this time!

I cannot se anything in the status about tally tables. Can we use the taly table your defined in an earlier challenge, tsqlc_Tally, or must we make our own?


# re: TSQL Challenge 21 - Reading, recognizing and processing graphical representation of digits.

Tuesday, January 19, 2010 5:26 PM by dishdy

kongstad126,

Looking at some of the solutions to past challenges (very interesting for a new-comer to this challenge site like me), it appears that table master.dbo.spt_values for type='P' is used for a tally table. Thus I presume it can be used for this challenge.

But the size of your tally table will probably impact the performance of your solution and it will probably also impact the actual maximum number of digits your solution can handle. The specifications say that there should be no upper limit on the number of digits your solution can handle. For this to be true, your solution should handle a varchar(max) column containing more or less 500 million digits. And I presume their logic testing data will have a column containing 500 million digits.

Sebastian,

Thus can you please confirm or clarify this issue of 'upper limit' for the number of digits.


# re: TSQL Challenge 21 - Reading, recognizing and processing graphical representation of digits.

Tuesday, January 19, 2010 6:02 PM by Rui Carvalho

@kongstad126, @dishdy

yes, you can use spt_values or an inline query or anything else you need for you query if it fit the rules ;-)

About the upper limit for a load testing, remember some few points:

- first of all, we test the logic, before thinking about load testing ensure you have a solution that works with the sample data.

- then you should try your solution with a significant rumber of rows. This should be from 1000 to 10000. It is not enough for a very hard load testing but it is quite relevant to check the bad solutions and see differences between them.

- if you can do a high load testing with 1M rows, this  is far enough. Maybe we go ahead, but its kept secret ;-)

Hope it helps


# re: TSQL Challenge 21 - Reading, recognizing and processing graphical representation of digits.

Wednesday, January 20, 2010 1:51 AM by dishdy

Rui,

If the approach you use is a tally table to handle a variable number of digits then you necessarily need to place an upper limit on the number of rows this tally table holds. Thus if you use a tally table with 2048 rows (master.dbo.spt_values for type='P') then your solution will handle at most 2048 digits. But if your logic test data has a row with 2049 digits, you're out.

In practice, you can always increase the number of rows of the tally table by cross joining it with itself as many times as is necessary. But increasing the number of rows of the tally table to handle some unknown maximum will impact the performance. If you set your tally table to 1 million rows and your logic test data has at most 100 digits then you're wasting a lot of cpu. That's why I think the specification should indicate a maximum.

On the other hand, if you guys have a solution which does not make use of a tally table and rather makes use of some technique (e.g. recursive CTE or something very clever) which is truly dynamic then my request for specifying an upper limit is not valid.

Please note, this is has nothing to do with the number of rows your load testing data will have.


# re: TSQL Challenge 21 - Reading, recognizing and processing graphical representation of digits.

Wednesday, January 20, 2010 4:38 AM by Sergejack

I agree with the interest of having an upper limit of digits (and I consider it actually implied within the rules as they are stated).

I used a limit of 64 digits and (as commented in my query code) : I hope it will be enough.

You can't have more than 12 "digits" in a NVARCHAR(116), anyway.


# re: TSQL Challenge 21 - Reading, recognizing and processing graphical representation of digits.

Wednesday, January 20, 2010 6:05 AM by dishdy

Sergejack,

This is true for the sample data. I don't know if you can assume their logic test data and load test data to have ScanNumber defined as nvarchar(116). But if this is the case, then your solution needs to handle a variable number of digits but it can assume a maximum of 12 digits.


# re: TSQL Challenge 21 - Reading, recognizing and processing graphical representation of digits.

Wednesday, January 20, 2010 9:59 AM by kongstad126

My solution using a tally table eats 10000 rows with 1000 numbers in no time.

Using only recursive CTE's the same query is slow (I don't know how slow, I am impatient, but more than 10 minutes anyway, which means I suck at doing recursive CTE's, or just that I should test my code on something else than a virtualbox virtual machine running on a ubuntu laptop ;)

One way to make sure my solution is not canned bacause my tally table is to small is to submit multiple solutions. One each with a tally table of perhaps 1000, 10.000, 1.000.0000.

This ensures that my solution can be evaluated on the largest set of digits they through at it, but still with the best performance I can get with a tally table.

It does seem a waste of my time, and more important a wate of the judges time.

Its just that it given the description of the challenge it would seem silly to go so far given that we are supposed to be scanning customer numbers, any test with more than 20 digits would be silly, and a million digits is just stupid. I can see the scanner messing up the format, but not suddenly multiplying the number of digits by several orders of magnitude.

So all in all I will join the choir of voices asking for an upper limit on the number of digits, if not 20 then perhaps 1000 should be more than enough to test the wits of us all?


# re: TSQL Challenge 21 - Reading, recognizing and processing graphical representation of digits.

Thursday, January 21, 2010 1:04 AM by Jacob Sebastian

Hi All,

I agree with many of you with your comments that it makes sense to have an upper limit. It is also that in a real-world scenario the customer number may not be 1 million character long.

However, one of the goals of TSQL Challenges is to expose new/better ways of solving problems and one of the problems here is 'how to handle the query when the upper limit is not known' and at the same time 'come up with an optimized solution'.

regards

Jacob


# re: TSQL Challenge 21 - Reading, recognizing and processing graphical representation of digits.

Thursday, January 21, 2010 3:33 AM by dishdy

Sebastian,

Well done and said! Now that we all understand that the 'no upper limit' thing is part of the challenge, the solution jumps right at me. I think the whining of the upper-limit choir will now stop.


# re: TSQL Challenge 21 - Reading, recognizing and processing graphical representation of digits.

Monday, January 25, 2010 8:02 PM by Jeremy Giaco

I submitted a solution that starts with "WITH", but not ";WITH"...I assume that this will be ok...since i either had to break the "a single query that starts with "WITH" or "SELECT" rule or risk the query throwing an error if you didnt add and ";" to the sample data.  I  know this is being nitpicky, but i also assume that you will have many submissions and will automate the validation, and one such validation might have been checking that the first character is a "W" or "S"...


# re: TSQL Challenge 21 - Reading, recognizing and processing graphical representation of digits.

Tuesday, January 26, 2010 12:32 AM by Jacob Sebastian

Hi Jeremy,

Your solution is fine. Both WITH and ;WITH will be accepted.


# re: TSQL Challenge 21 - Reading, recognizing and processing graphical representation of digits.

Sunday, January 31, 2010 11:23 PM by Parth Patel

Hi Jacob,

I have one question regarding how you compare the result. For an example,

Id          Value

----------- ---------

1           000007059

2           490067715

3           680X68279

4           490867716

5           X90867716

6           012345678

Id          Value

----------- ---------

2           490067715

1           000007059

3           680X68279

4           490867716

5           X90867716

6           012345678

both are valid results? If NOT then please add in the rule to sort the result by Id column.


# re: TSQL Challenge 21 - Reading, recognizing and processing graphical representation of digits.

Monday, February 01, 2010 4:15 AM by Rui Carvalho

Hi Parth,

If you have the right value associated with the right id, and as we didn't explicitely specified an order rule, it should pass the logic test.

But usually when we said "this is the expected output", even if there is no explicit rule, keep in mind that we hope people give solutions to fit exactly the values and orders. And most of the time in real life, you don't have complete specifications and only 'this is the result I want to get'...

So, consider that it should fit in logic test but may have a lower note that the same solution with equal performance (as soon as we can consider that doing the final order to show the results as expected is a difficulty).

Hope it helps


# re: TSQL Challenge 21 - Reading, recognizing and processing graphical representation of digits.

Monday, February 01, 2010 5:52 AM by Parth Patel

Thanks Rui. I understood your point.


Copyright © Beyondrelational.com