Extract Last Names using Tokens

This example uses Transform toolbar token operators to create a new column in a table that extracts the last name from another column that contains a full name. We will work with a table associated with a drawing that shows US congressional districts.

 

images\eg_tokens_01.gif

 

The table's Name field lists members of the 109th Congress in the US. We plan on using this drawing and table with the Manifold map server in a web page. To make it easy for browsers to find a member of Congress with a simple Find command, we would like to have a column that lists the last name only for each member. We feel it will be easier for browsers to find by entering "Peterson" than to have to try to remember the first name as well.

 

Our task is therefore to extract just the last name from the Name field and to put it into its own column, called Last Name. We can use tokens and the transform toolbar to accomplish this step.

 

Step 1: Get ready

 

We open the table in a table window. Next, we make ready with a few housekeeping items:

 

images\eg_tokens_02a.gif

 

We open the Tool Properties pane using SHIFT-ALT- T (or by clicking View - Panes - Tool Properties in the main menu). The default setting of the Separate tokens with box lists the space character (the blank location to the left of the \t), a tab, a carriage return and a newline character as the default token separator characters. Thus, every group of characters separated by these whitespace characters will be treated as a token. If the Separate tokens with box does not include a space character, add it.

 

images\eg_tokens_02.gif

 

It will be more convenient to work with the table if we hide all fields that are not necessary. We click on View - Columns and in the Columns dialog uncheck all the boxes except Name. This will hide all columns except Name.

 

images\eg_tokens_03.gif

 

That's better. We will now add two text fields to the table. Right click onto the Name column and choose Add - Column from the context menu.

 

images\eg_tokens_04.gif

 

In the Add Column dialog we create a new column called temp that is an ANSI variable length text column. Press OK. We will use this column as a temporary working space.

 

We also need to create the Last Name column that will hold our results. Right click onto the Name column again and choose Add - Column once more to create another column.

 

images\eg_tokens_05.gif

 

In the Add Column dialog we create one more column called Last Name that is also an ANSI variable length text column. Press OK.

 

images\eg_tokens_06.gif

 

We now have two new columns in our table.

 

Step 2: Remove last token

 

We observe that the names can be thought of consisting as a sequence of tokens. For example, "Collin C. Peterson (D)" consists of four tokens: "Collin", "C.", "Peterson", and "(D)". We would like to move the next-to-last token into the Last Name column. Unfortunately, within Manifold there is no "Copy next-to-last token" command. There is, however, a Copy Last Token from operator. If we get rid of the "(D)" and "(R)" last tokens we can then use the Copy Last Token from operator to extract the last names.

 

We begin by copying all of the values from the Name column into the temp column.

 

images\eg_tokens_07a.gif

 

To do so, load up the transform toolbar as seen above and press Apply.

 

images\eg_tokens_07.gif

 

Like magic, Manifold copies all of the Name values into the temp column.

 

images\eg_tokens_08a.gif

 

To delete the "(D)" and "(R)" tokens at the end of the values in the temp column we use the Delete last token operator in the transform toolbar as seen above. Press Apply.

 

images\eg_tokens_08.gif

 

We now have a column of names in temp where the final token is the last name we seek.

 

Step 3: Copy last token into last name field

 

images\eg_tokens_09a.gif

 

To copy the last token we use the Copy Last Token from operator in the transform toolbar to copy the last token from the temp field into the Last Name field. Press Apply.

 

images\eg_tokens_09.gif

 

It seems we are done… but are we?

 

Step 4: Correct mistake

 

Unfortunately, we did not notice that the table contains names of the form "Henry E. Brown, Jr. (R)". For such records, copying the last token in the step above results in a Last Name field that contains the value "Jr.". No need to panic: we will find all such records and repair them.

 

To find all such records, we click on the Last Name column header to sort the column and then we scroll down to the section where all of the "Jr." records are located. We want our repair action to operate only selected records, so we will select all of the fields with "Jr." in the Last Name field. A fast way to do this is to click on the record handle of the first such record to select it.

 

images\eg_tokens_10.gif

 

We can then SHIFT-click on the record handle of the last "Jr." record.

 

images\eg_tokens_11.gif

 

That will select all records from the previously selected record to the record where the SHIFT-click occurred. We now have selected all the records with "Jr." in the Last Name field. We could have used the Query toolbar to make this selection, but since we have all the records together in view in the table it was easier just to click and then shift-click.

 

The transform toolbar for tables uses autoscope: if any records are selected, it will automatically restrict the action of any transform operator to only the selected records. So long as only these records are selected, any operations we do in the transform toolbar will apply only to them.

 

images\eg_tokens_12a.gif

 

We will begin by getting rid of the "Jr." tokens at the ends of the temp column values using the Delete Last Token operator as seen above. Press Apply.

 

images\eg_tokens_12.gif

 

This gets rid of the final "Jr." tokens at the ends of the fields in the selected records. Note that no changes were made to unselected records. However, we have a slight problem: there is a comma at the end of the selected names in the temp column. If we simply copy the last token into the Last Name field we will end up with some last names that have an unwanted comma at the end. We must first get rid of the commas.

 

images\eg_tokens_13a.gif

 

To do this, we load up the toolbar with the Delete Right operator with a parameter of 1. This operator removes the last n characters on the right where n is the value given in the parameter box. Press Apply.

 

images\eg_tokens_12.gif

 

Nothing happens! The commas are still there…. what's going on? What's happened is that after we deleted the terminal "Jr." tokens a space character was left behind in the values in the temp field. When we used Delete Right with a parameter of 1 it deleted the space characters at the end of the strings. We need to apply Delete Right one more time.

 

images\eg_tokens_13a.gif

 

As before, we load up the toolbar with the Delete Right operator with a parameter of 1. Press Apply.

 

images\eg_tokens_13.gif

 

Much better… we have now removed the final commas and are ready to copy the last names.

 

images\eg_tokens_14a.gif

 

The final operation is to use Copy Last Token from to copy the last tokens from the temp field to the Last Name fields for the selected records.

 

images\eg_tokens_14.gif

 

The table is now almost in final form. All that remains is to right click onto the temp column head and choose Delete to delete the temporary field. We also deselect all records.

 

images\eg_tokens_15.gif

 

Our table now contains a Last Name field that has been extracted from the Name field.

 

Notes

 

This example used the default list of separators, where a space character denotes white space. A more sophisticated use of tokens might add other, non-whitespace characters to the token separator characters in the Tool Properties pane.

 

For example, suppose we have an address field containing text values such as "123 Main Street, Sioux Falls, South Dakota". We could remove the space character from the token separator list and add the comma character "," as a token separator. We could then copy the different parts of the address to different fields. In this case "123 Main Street" would be the first token, "Sioux Falls" would be the second and "South Dakota" would be the third token.