To explain the specific case you raise... For example, [Geol_symb] = Pg2_vas_ap was correctly matched to the pattern '_v',
No, in the fragment WHEN '\_v' THEN 'Igneous-volcanic' the string '\_v' is not a pattern. It's just a two character string value, consisting of an underscore character followed by a v, which you've listed as one of the possible substring results you're picking out from the substrings that may be produced by your use of the StringRegexpSubstring function. In your use of '\_v' the backslash character is redundant. It just means "take the next character literally" which would happen anyway since the underscore character has no special meaning in a string value that is not being used as a pattern by a LIKE operator. It would be like writing the string 'abc' as '\a\b\c' where the backslashes are also redundant. The only pattern in the CASE expression you cited is the argument to the StringRegexpSubstring function, the pattern string '\_.' That string is processed by SQL to feed the two character pattern _. (an underscore followed by a dot) as a regular expression pattern argument to the StringRegexpSubstring function. That regular expression matches any two character sequence of an underscore character followed by any single character (which is what the dot . character wildcard means in regular expressions). As you've correctly noted, the underscore character is not a wildcard in regular expressions. It's just an underscore character. When the StringRegexpSubstring function uses that pattern to match against the tech character text value Pg2_vas_apit picks out the first two character substring that matches the regexp pattern, in this case consisting of _v, that is, an underscore followed by a v. In the CASE statement the only WHEN that matches that substring is the one that calls out a substring _v, that is, an underscore followed by a v. Note that it may be risky to use regular expression pattern strings like '\_.' with your data set and CASE statement because that pattern matches the first occurrence of any two character sequence of an underscore followed by any other character, and so what is picked out by the WHEN cascade depends on the positioning of possibilities within the string being matched if the underscore character appears more than once followed by some other character. For example, suppose the string you're matching against is the text value Pg2_vas_p and the SQL is CASE StringRegexpSubstring([Geol_symb], '\_.', 'i') WHEN '\_o' THEN 'Reclaimed' WHEN '\_s' THEN 'Sedimentary' WHEN '\_v' THEN 'Igneous-volcanic' WHEN '\_p' THEN 'Igneous-plutonic' WHEN '\_m' THEN 'Metamorphic' END The result of the CASE test will depend on whether that two character sequence _p comes before or after the sequence _v in the text value to be matched (and not when it occurs in the WHEN cascade). The StringRegexpSubstring function reports the first match to the regular expression, so in the Pg2_vas_ptext value the substring _v is reported and that WHEN is matched. But if the text value was Pg2_pas_vthe substring _p would be reported, because it occurs first. If the '_' had been interpreted as a wildcard for any single character, then the substring extracted would presumably have been the first two characters in [Geol_symb]; ie. 'Pg' in the example above?
That doesn't happen because the underscore character is not a wildcard in either the regular expression pattern used with the StringRegexpSubstring function or in the explicit strings used to match WHENs. A few more comments... Possibly misleading?
No, because it's talking about SQL dealing with backslash escapes in all strings. That's a different matter than wildcards which are different in regular expressions or the LIKE operator. because '_' is not a special character in the case of regular expressions?
Yes, the underscore character is not a wildcard when used within a regular expression pattern. See the Regular Expressions topic for all wildcards used in regular expressions. Would '_.' worked just as well as '\_.' in my substring expression???
Yes, because the string '_.' is the same as the string '\_.' when used as a pattern string for a regular expression. There's nothing special about an underscore character in regular expression pattern strings. (Note that referring to the pattern string as a "substring expression" is incorrect terminology in the context of a pattern argument to a regexp function.) --- A few words on the evolution of all this. It's understandable if anybody is confused by this stuff at a first meeting because the evolution of SQL, the LIKE operator, and the use of regular expressions are a blending of separate technologies. At the risk of simplifying a bit... SQL string literals, sequences of characters delimited by single ' or double " quote characters, had to have a way of being able to specify a quote character in them so you could test for words like don't and such, and that's where backslash escaping came from. So it made sense to have a simple rule that when a backslash is encountered in a string then take that backslash not as a literal character but as a command to take whatever follows literally, for example, taking literally a single ' quote character and not interpreting that single ' quote character as meaning the end of the string. Regular expressions are a UNIX thing coming out of tools like GREP. They weren't as far as I know a consideration in SQL syntax way back when, and tools like GREP had a very elaborate set of wildcards with the need to have character escaping to be able to specify patterns that also included what otherwise might be command characters. But if you have functions in SQL that use regular expressions, you get the complication that SQL still wants to preprocess any string for the special meaning of a backslash character used as an escape. And then there's the third complication that comes in from the LIKE operator, which seems like an attempt to get the general benefit of regular expressions but using a simpler wildcard system, using the underscore and percent characters, but which are different wildcard characters than the dot and asterisk wildcard characters used to do similar things in regular expression patterns. Add all three complications together and you can have an SQL statement that uses a regexp function with patterns meaning one thing, and a LIKE operator with patterns meaning other things, and all that using literal strings as patterns within which SQL is going to insist on preprocessing backslashes. Big fun. That's why Manifold introduced the @ innovation for strings, to make things a lot simpler. Alas, as useful as that is it's not at all portable.
|