TSQL Challenge 21 - Reading, recognizing and processing graphical representation of digits.
Here is a challenge that takes you away from those repetitive boring type of queries that you write over and over again, several times a day. All of us, the database people, are familiar with thinking in set based manner as well as row by row style. Here is something that is very interesting where you might need to process records in a 'three-line-at-a-time' fashion.
For the purpose of this challenge, imagine that you are working for a bank which just decided to scan all the banking documents. Assume that they have an old fashioned scanner which scans the documents and produces a text file with the customer number. So far so good. Well, not really! Unfortunately the scanner produces a graphical representation of the customer number using three lines of symbols: space, unerscores and pipe characters. Here is an example of the output produced by the scanner.
_ _ _ _ _ _ _
| _| _||_||_ |_ ||_||_|
||_ _| | _||_| ||_| _|
Here are the rules to keep in mind while reading and recognizing the output generated by the scanner.
- Each digit is represented using 9 cells (3x3)
- Only spaces, underscores and pipe characters are used
- The number of digits in each account number may vary.
- The Scanner is not 100% reliable and it might produce some digits that are invalid
The Challenge
Your job is to read the output produced by the scanner and identify the the customer number represented by each image. Remember that the scanner is not very reliable and it might produce invalid digit representations. For each digit that is not valid, set the value to 'X'
Sample Data
Here is the sample data for this challenge. Please take care with spaces, tabs and carriage returns as each digit is represented by three lines of text and if a space, tab or carriage return is misplaced, the whole image will be distorted.
Id ScanNumber
----------- ---------------------------
1 _ _ _ _ _ _ _ _ _
| || || || || | || ||_ |_|
|_||_||_||_||_| ||_| _| _|
2 _ _ _ _ _ _ _
|_||_|| || ||_ | | ||_
| _||_||_||_| | | | _|
3 _ _ _ _ _ _ _ _
|_ |_|| || ||_ |_| _| ||_|
|_||_||_||_||_||_||_ | _|
4 _ _ _ _ _ _ _
|_||_|| ||_||_ | | ||_
| _||_||_||_| | | ||_|
5 _ _ _ _ _ _ _
| ||_|| ||_||_ | | ||_
| _||_||_||_| | | ||_|
6 _ _ _ _ _ _ _
| | | _| _||_||_ |_ ||_|
|_| ||_ _| | _||_| ||_|
Expected Results
Based on the sample input and the rules discussed earlier, here is the expected output.
Id Value
----------- ---------
1 000007059
2 490067715
3 680X68279
4 490867716
5 X90867716
6 012345678
Sample Scripts
Use the following script to generate the sample data for this challenge.
DECLARE @t TABLE (Id int, ScanNumber NVARCHAR(116))
INSERT INTO @t
SELECT 1,--> 000 007 059
' _ _ _ _ _ _ _ _ _
| || || || || | || ||_ |_|
|_||_||_||_||_| ||_| _| _|
' UNION
SELECT 2, --> 490 067 715
' _ _ _ _ _ _ _
|_||_|| || ||_ | | ||_
| _||_||_||_| | | | _|
' UNION
SELECT 3, --> 680 X68 279
' _ _ _ _ _ _ _ _
|_ |_|| || ||_ |_| _| ||_|
|_||_||_||_||_||_||_ | _|
' UNION
SELECT 4, --> 490 867 716
' _ _ _ _ _ _ _
|_||_|| ||_||_ | | ||_
| _||_||_||_| | | ||_|
' UNION
SELECT 5, --> X90 867 716
' _ _ _ _ _ _ _
| ||_|| ||_||_ | | ||_
| _||_||_||_| | | ||_|
'
UNION
SELECT 6, --> 012 345 678
' _ _ _ _ _ _ _
| | | _| _||_||_ |_ ||_|
|_| ||_ _| | _||_| ||_|
'
Notes
- Each record may have more than three lines of data (each line is separated by a CR and LF). Your code should consider only the first three lines.
- The length of the first three lines of each recrd will always be the same and will be divisible by three.
- There may be 3x3 blocks of spaces in the string. In such a case, you should generate an empty space in the output. If a 3x3 block does not create a valid digit (except for the case of a 3x3 block of spaces), you should generate an "X".
- The number of 3x3 blocks in each record may vary
- Read the Submission Guidelines and make sure that your solution follows them.
- The solution should work on SQL Server 2005, 2008 or later versions
- Use this forum for any questions related to TSQL Challenge #21
Last date for submitting your solutions
We will close this challenge for evaluation on 25 January 2010 Midnight GMT. All the solutions we receive till that date will be processed and the results will be announced in the format given here. However, you can continue to send us your solutions till the day we announce the evaluation results.
How to submit a solution after the results are published?
You can submit solutions for challenges even after we publish the results. However, there are a few things you should do before submitting your solutions.
- Do a basic testing of your solution using the sample data we post along with the results of the evaluation. See an example here.
- If your solution passes the basic testing, do a logic testing of your solution. We will publish the data to be used for the logic testing along with the challenge results. You can see an example here.
- If your solution passes the logic testing, do a load testing of your solution using the load testing data that we publish along with the challenge results. You can see an example here.
- After load testing, match your results with the results we have published for that challenge. (see an example here). If the results of your solution comes in the top 10 solutions listed on the results page, go ahead and send us your solution. Make sure that you send us the results of your load testing along with your submission.
- We will re-evalute your solution and if it comes in the top 10, we will list it in the results page and you will be added to the winners page (see an example here) and will get a certain number of SQL Stars based on the grade of your solution.
About the Author
Challenge Evaluation Details
Though the evaluation of this TSQL Challenge is completed and winners are announced, you can still submit a solution and we will be very happy to review it. Before submitting, make sure that your solution passes the basic testing and logic testing and the performance statics are good. See the submission guidelines before submitting your solution.