Transform - Using Tokens and Text Strings

This topic explains the use of Tokens with text operators.


A token is one or more characters that is found in a text field between token separator characters. Any character may be specified as a token separator character in the Tool Properties pane. By default, a variety of brackets and other punctuation marks are listed as token separators. For example, the semicolon ; character is listed as a token separator by default. If you don't like the default list of token separators, you can specify your own separators by entering them into the Separate tokens with box in the Tool Properties pane.


Using the default token separator characters, the string John;Paul;George;Ringo would contain the tokens "John", "Paul", "George", and "Ringo". If this string occurred in a field in the value box, the text operator Copy First Token to would copy the string John into the subject field.


Suppose we have a table where dates are stored as text strings in the form 09/21/98 and we wish to extract separate month, day and year fields. Or, perhaps we have a table that stores text strings in the US format of month/day/year and we wish to convert to European format of day/month/year. Or, perhaps we wish to extract the year field so we can prepend a "20" to year dates such as "00" and "01". All of these cases can be managed using a slash, /, character as a token separator. In this case, the string 09/21/98 would consist of three tokens, "09", "21" and "98". (Note that if the data were stored as a Microsoft Access Date field, we could use Copy Day To and similar Commander operators to extract the date, year, etc.)


Token separators may be combined. Order is not significant. A token is parsed as one or more characters between any two (or preceding or following at the beginning and end of the string) characters listed as a token separator in the Options dialog. Extra token separators are ignored when they occur together.


So, if / and * are listed as token separators, /John*/*Paul/*George*Ringo is still parsed as the Fab Four tokens. Constructions such as 30°42'13.0" can be parsed as three separate tokens if the °, ' and " characters are listed as token separators.


Caution: If we wish to parse as a single token numeric values that are written as strings using a decimal point, we must make sure to remove the period, ., from the list of token separators if we have added it to the list as a token separator. Otherwise, text strings such as "3.1415" will get parsed as two tokens, "3" and "1415"


Tip: how to extract the "nth" token? Repeatedly use the Move First Token To command n times to get to the nth token. For example, suppose we have a field called location that contains values in the form 30°42'13.0" and we wish to extract this information into three separate fields called degrees, minutes, and seconds. We would perform the following commands, where the Target box is in blue, the operator is in black and the source / argument box is in violet.


location Move first token to degrees

location Move first token to minutes

location Move first token to seconds


Tip: Don't be intimidated by the use of tokens. It's a lot easier than it appears from the above. Just dig in and try a few examples. Also, don't hesitate to change the characters listed in the Tool Properties pane as token separators. One often wishes to use only a single character as a token separator. This takes but a moment to change to suit any need.


See the Extract Last Names using Tokens example for a simple step-by-step procedure using tokens.