Subscribe to this thread
Home - General / All posts - Converting text lat/lon dd mm ss to decimal degrees

642 post(s)
#16-May-24 02:52

Working in M9, I have a text file with two fields lat and lon. They contain text strings dd mm ss. I wish to create two new fields styled as decimal degrees so as to create a drawing of the points.

My M8 muscle memory was to create two new fields of type lat/lon, then copy paste the records over. Solution discussed way back in time here.

I'm stuck doing the same in 9.


7,452 post(s)
#16-May-24 07:01

Easy. For decimal degree latitude and longitude fields create two computed fields of type FLOAT64, and use expressions for those fields which a) extract the dd, mm, and ss substrings of the lat or lon strings, b) CAST them into FLOAT64 numbers, and then c) divide them by 60 or 3600 and sum them (using d, m, and s for degrees, minutes, seconds numeric values):

decimal_degrees = d + m/60 + s/3600

The details of extracting the substrings depend on the precise text format you have for the lat and lon strings. For example, often the seconds part isn't literally ss, that is, two characters, but is usually meaning two characters plus a decimal point (or , in Europe) with additional characters past the decimal point.

Also, it's often the case that the degree substring isn't just dd but dd followed by an N S E or W character to indicate north/south and east/west values, or it is prefixed with a "-" to indicate south or west values, or it might use 0 to 360 numbering instead of 0 to +/- 180.

Or, substrings might not always be mm but might be just m if the minutes substring is less than 10 minutes, or there might be additional ' or " characters tacked on to indicate minutes or seconds. It's almost a rule of thumb with non-decimal degree text representations of lat / lon values that there will be something unique going on.

All that can be handled easily with regular expressions. Suppose we are extracting substrings from a longitude value that is

-71 6 59.1

That's approximately the longitude at the middle of Harvard Yard in Cambridge, Massachusetts. It's easy to get the various substrings using the StringRegexpSubstringNth SQL string function (see the doc). I'll give examples below as ? examples so they can be tested easily with a copy and paste into the command window, highlighting the ? expression and pressing Alt-Enter. For example, get the first substring using:

? StringRegexpSubstringNth('-71 6 59.1', '[^ ]* ', 0, 'i')

For reasons discussed in the "The Repetition Count * Character is Greedy" section of the Regular Expressions topic (always an exciting read... don't start it late at night or you won't be able to sleep for wanting to read it some more...) the regular expression used is

'[^ ]* '

Note that there's a space between the ^ and the ], and also a space between the * and the final '. That expression matches any sequence of characters that are not a space character, with the sequence of non-space characters followed by a space. The "nth" value of 0 in zero-based numbering means to get the first such substring found.

Try that example and it returns the string -71 (with a space character at the end). To make a floating point number out of that you can CAST it:

? CAST( StringRegexpSubstringNth('-71 6 59.1', '[^ ]* ', 0, 'i') AS FLOAT64 )

To get the second substring, the mm substring, you can use

? StringRegexpSubstringNth('-71 6 59.1', '[^ ]* ', 1, 'i')

It's the same regular expression but with an "nth" value of 1, meaning the second such substring found using zero-based numbering. You can CAST it and convert into decimal minutes for adding up into an overall decimal degrees value:

? CAST( StringRegexpSubstringNth('-71 6 59.1', '[^ ]* ', 1, 'i') AS FLOAT64 ) / 60

To get the seconds substring at the end, you can use:

? StringRegexpSubstringNth('-71 6 59.1', ' [^ ]*', 1, 'i')

The regular expression pattern used is

' [^ ]*'

With a space character between the initial ' and [, and no space character between the * and final '. So that matches any sequence of characters that begins with a space character followed by one or more non-space characters. The "nth" value is 1, meaning the second such substring found. The first such substring will be the mm substring, and the second will be the ss substring, which is what we want.

CAST that and make it ready to add to the total decimal degrees value using:

? CAST( StringRegexpSubstringNth('-71 6 59.1', ' [^ ]*', 1, 'i') AS FLOAT64 ) / 3600

The final decimal degrees value will be the sum of the CAST and readied values for the three substrings, written all on one line:

? CAST( StringRegexpSubstringNth('-71 6 59.1', '[^ ]* ', 0, 'i') AS FLOAT64 )

+ CAST( StringRegexpSubstringNth('-71 6 59.1', '[^ ]* ', 1, 'i') AS FLOAT64 ) / 60

+ CAST( StringRegexpSubstringNth('-71 6 59.1', ' [^ ]*', 1, 'i') AS FLOAT64 ) / 3600

In the expression written for the new longitude computed field, you'd refer to the [lon] text field, as in:

CAST( StringRegexpSubstringNth([lon], 0, 'i') AS FLOAT64 )

+ CAST( StringRegexpSubstringNth([lon], 1, 'i') AS FLOAT64 ) / 60

+ CAST( StringRegexpSubstringNth([lon], ' [^ ]*', 1, 'i') AS FLOAT64 ) / 3600

I've just copied and pasted the above without testing it, so apologies if any typos.

All the above, of course, is "easy" if you're familiar with 9 tools like regular expressions and writing computed fields. If that's new, the first time through will take more attention. But in exchange you get a very powerful way to handle any text format that is used for latitude and longitude values written as text.

For example, if the text field uses degree characters or adds characters like "m" for minutes or "W" or "S" all those are very easy to add into the regular expression used to extract those substrings. That gives the above approach a universality and power that the 8 approach lacks. 8's copy and paste approach is very easy for that subset of such text formats that it can parse, but it won't work or can make mistakes when formats are different.

The 9 approach also is a lot fasterthan copy/paste when working with bigger data (surprisingly, modern data sets often use absurdly inefficient text formats even for large data). Instead of using a computed field (also very fast) you can do a one-time expression in a Transform operation to copy data out of text fields, CAST and sum it and place the result into FLOAT64 fields that have been created.

You can also use expressions in the Transform pane to do the above job in a sequence of manual steps, if you want to break it down to better learn how to use regular expressions, to see what happens with each step, while using the Preview pane to figure out the right regular expression and to avoid surprises.


690 post(s)
#16-May-24 10:19

The manual has a topic that matches your problem. The easiest solution can be found at the bottom. search for "All in one query".


642 post(s)
#16-May-24 10:24

Thanks Dimitri, Chris,

perfect, exactly what I'd missed.

Manifold User Community Use Agreement Copyright (C) 2007-2021 Manifold Software Limited. All rights reserved.