Back 

DECODING AND TRANSLATING DATA

WHITE PAPER

 

Specially designed data tools are available that will translate data from its existing format or representation to a selected format or representation.  This paper discusses these tools in general and identifies specific skills required to perform data translation effectively.  Other terms that generally refer to data translation are: data migration, data conversion, data manipulation, data mining, etc.

 

To use data translation tools effectively, the user must have a good understanding of various data formats, i.e. packed signed, floating point, hexadecimal, Binary, Text, EPCDIC, ANSI, etc.  The data translation Specialist also needs an accurate map of the source data that identifies the data fields, the length and type of the data fields, special identifiers such as a byte that identifies a field length, etc.

 

Armed with this information, the data translation Specialist can then map fields in the source data record to translation routines in the data translation tool.  When properly mapped, the data translation tool will display or print the information in readable format, or create a file on the medium of choice.  At this point, adjustments need to be made to correct for erroneous mapping.  This step is required because even the best data maps available have some inaccuracies and may be missing critical data elements. 

 

The data translation Specialist must evaluate the results of his/her best guess of data formats and make adjustments, some of which are guesses.  An example is an incorrect field length for a numeric field.  When given an incorrect field length, the data translation tool will produce an unrecognizable data translation.  Another example is a text field that has a length byte at the beginning.  If this length byte is not identified to the data translation tool, it will consider the length byte as part of the data and produce inaccurate results.  One can readily see this is an iterative process that can go quickly or painfully slowly.  Among other things, the complexity and duration of the iterative process is dependent upon the quality of the data mapping information and the skill of the data translation Specialist.

 

The average programmer (Visual Basic or other current technology programmer) does not have the skills to do this work.  The skills that are lacking have to do with understanding and manipulating data formats, data representations and methods of data storage.  Today’s technology does not require programmers to have or use these skills.  Consequently, few programmers have them.  However, programmers with skills dating back to Assembler, COBOL and other second-generation languages tend to have basic data manipulation skills and they tend to understand how data is stored and represented in various technology platforms.  While data translation tools are excellent for transforming data from/to a vast array of data formats, using them effectively requires an individual skilled in the many ways data can be stored, formatted and represented.

 

Most data coming into a data conversion reside on old and poorly documented systems.  Even if the documentation is sufficiently detailed AND accurate, good programmers have difficulty defining it for the data translation tool or for their own programs, with enough accuracy to deliver a reliable conversion.  However, an individual skilled in data formats and data manipulation can get accurate data from poorly defined and even undefined databases.

 

Knowing the data translation work to be done and focusing only on that job makes an onerous task quite doable.  Quality and proficiency take special experience, unique training, discipline, well-defined processes and rigorous quality assurance tools that are designed specifically to trap data transformation errors.  Armed with these tools, the trained data migration Specialist can take information from virtually any source and deliver accurately translated conversion data.

 

Some examples of the many data formats commonly used in legacy and new technology applications are listed below.  All of these and many more are handled by good data translation tools that are commercially available.

 

DATA TYPES

32 bit Floating Point

80 bit Floating Point (and 8 other Floating Point types)

Binary (multiple lengths with or without implied decimal points)

Packed Decimal

Display number with embedded sign

Pascal String

Date and Time (multiple formats)

 

DATABASE TYPES: (about 100)

 

Access DataEase
Act DBase (all)
Acucobol 85 FoxPro
ASCII Delimeter fixed Paradox
Btrieve SAS
Clipper  

 

DATA ENCODING FORMATS:

 

ASCII ANSI
EPCDIC (by country) MS-DOS
 

 

FILE CHARACTERISTICS:

Fixed length blocks Labeled files
Variable lengths blocks Unlabeled files

 

 

by K.W. Norris

Technology Consultants

2006