Subscribe to this thread
Home - General / All posts - New topic and downloadable project showing pandemic map example using Edit - Join
Dimitri


7,413 post(s)
#01-Apr-20 03:45

There's a useful new topic showing the new Edit - Join dialog in action to create a pandemic map that gets data from a CSV server. Using Edit - Join makes it a lot easier than writing out a JOIN in SQL.

The project used in that example can be downloaded as pandemic.map.

Stay safe!

wvayens108 post(s)
#01-Apr-20 13:43

Thanks for this! Examples like this that I can download and see how things can be done in M9 really help my transition from M8!

vincent

1,972 post(s)
#01-Apr-20 21:23

[solved] - The 169 build is still on my computer ... Worked fine with 171.0

vincent

1,972 post(s)
#01-Apr-20 22:22

"Last Report" value seems odd for many countries (Canada for example), altough "cases" number is good.

vincent

1,972 post(s)
#02-Apr-20 00:13

the datetime format is not well handled by the Max and Min operators.

Dimitri


7,413 post(s)
#02-Apr-20 02:05

"Last Report" value seems odd for many countries

the datetime format is not well handled by the Max and Min operators.

The problem appears to be that the dates are being parsed inconsistently by the csvserver dataport. The Max and Min operators are OK.

Looking at the actual text stream served by the server:

17/03/2020,17,3,2020,5,0,Afghanistan,AF,AFG,37172386 16/03/2020,16,3,2020,6,0,Afghanistan,AF,AFG,37172386 15/03/2020,15,3,2020,3,0,Afghanistan,AF,AFG,37172386 11/03/2020,11,3,2020,3,0,Afghanistan,AF,AFG,37172386

Open the csv file in the data source and scroll down a bit in the Afghanistan entries to see how the above was parsed. You'll see dates of 3/17/2020, 3/16/2020, 3/15/2020, and then... 11/3/2020. That seems wrong, as though the csvserver dataport insisted on converting the date string into month/day/year and then for the 11/03/2020 date it didn't, and left it in day/month/year.

I've reported it.

I updated the update query

Yes. You could also do that in Edit - Join, which can update an existing field in the original table from the joined table.

vincent

1,972 post(s)
#02-Apr-20 02:17

The date format change as soon as the first digit is > or <= than 12.

adamw


10,447 post(s)
#02-Apr-20 14:51

On parsing dates, this is not a bug, as strange as it may seem to be.

The problem is that dates on some lines parse as MDY and on others as DMY. But that's actually desired behavior because we want to allow dates of different formats - eg, if one line has a date spelled as '12/31/2019' and a different line has a date spelled as '12-Dec-2019', we want the data source to parse them both. Given that each date string is considered separately, the question is - how to parse '11/03/2019'? Is it 03-Nov-2019 or 11-Mar-2019? We try to detect the language used by the date field and use whatever is the default for that language. But even then the default MDY will switch to DMY on strings where the first token is clearly not a month because it is bigger than 12.

The solution is to (a) import the field as text, then (b) look at the format it uses with a human eye, then (c) convert text strings to dates using a fixed format. Right now you can do all of that, but (c) requires writing a script function = too complex. We will add built-in conversion functions which will take a fixed format to the query engine and will add transforms for them to the Transform pane.

vincent

1,972 post(s)
#01-Apr-20 23:55

I have the feeling that population data are more recent in the csv, so I updated the update query :

UPDATE (

  SELECT

    t.[mfd_id] AS tkey0,

    t.[LastReport] AS t0, t.[cases] AS t1, t.[deaths] AS t2, t.[POP_EST] as t3,

    s.sjoinkey, s.s0, s.s1, s.s2, s.s3

  FROM [countries Table] AS t LEFT JOIN (

    SELECT

      [countryterritoryCode] AS sjoinkey,

      Max([dateRep]AS s0,

      Sum([cases]AS s1,

      Sum([deaths]AS s2,

 MAX([popData2018]AS s3

    FROM [data]

    GROUP BY [countryterritoryCode]

  ) AS s ON t.[ISO_A3] = s.sjoinkey

SET

  t0 = s0, t1 = s1, t2 = s2, t3 = s3;

This way, statistics per million person are more accurate.

dyalsjas
157 post(s)
#02-Apr-20 12:25

Johns Hopkins University is maintaining an ESRI based dashboard of COVID information that aggregates data from multiple servers. The underlying information is stored as five .csv files inside a Github.

Here is a link:

https://github.com/CSSEGISandData/COVID-19/tree/master/csse_covid_19_data/csse_covid_19_time_series

I tried pointing at one of the .csv links using the webserver csvserver dataport but haven't been able to get it to work.

Any interest or suggestions?

dyalsjas
157 post(s)
#02-Apr-20 12:45

As a side note; with respect to date formats above, I've noticed that the .csv files from the JHUCC Github don't seem to follow a standard date format, switching between m/dd/yy and yyyy-mm-dd with some obvious parsing errors, (COVID-19 wasn't reported between 2003-01-20 and 2003-12-20).

It's trivial to download the file. open it in Excel, convert the first five or six dates to the Excel number format, then autofill incrementally across the time series.

In the five W's (Who, What, Where, When, Why), time follows location.

Manifold is quite elegant with the "Where", I wish the datetime, ("When") capability in Manifold was at least as elegant as it is in Excel.

Dimitri


7,413 post(s)
#02-Apr-20 13:28

Hopkins doesn't use datetime in the data set, at least not for the global deaths link (the only one I've looked at in detail). Instead, they just name fields using a string that means a date to US readers.

So, those don't get imported as datetime data. They're just imported as the names of columns using the usual CSV convention that the first row in a CSV contains the field names. It's a really terrible way to represent key data, not a good way to organize the data. In this particular data set, there is nothing to convert in terms of datetime values to any numeric data type.

If there were datetime values imported into Manifold, you can do pretty much anything you can imagine in terms of converting them to numbers, as the datetime data type in Manifold is precise, standard and freely open to the full power of SQL / scripting / conversions / etc.

It can be intricate, I grant, using various DateTime... SQL functions to extract the numeric parts of what you want from a datetime value into other numeric forms, but Excel also uses functions, like YEAR() and MONTH() to extract constituent components of datetime values.

It's pretty much the same deal if you are using DateTimeMonth([date]) to get the month from a date in Manifold or MONTH(A2) in Excel. In Manifold, if you want to extract that for ten million rows you can do it easily using the Transform pane.

Excel, being a spreadsheet, takes a different UI approach to some pre-packaged visualization things like time series, which is very good for users seeking a good spreadsheet experience. But those approaches tend to work only for very limited situations, which is why Excel basically stops functioning for larger scale data, and they can sometimes lead to constructions that make data very much less useful than it should be. Representing essential data values as the names of columns (!) instead of as values within records might be just one such example of that. I get the feeling Hopkins dumped their data out of Excel and not out of a database.

I don't think anyone is opposed to grabbing whatever clever ideas can be harvested from Excel and implemented for ease of use within Manifold. An example might be a "repair dates" transform that takes dates which were wrongly entered in some incorrect format way back when and provides options to convert them into a list of possibilities that might have been intended. But those clever ideas might end up taking somewhat different form, or, in some cases, might already be in there, so they can resonate with and build upon more ways of working with data that at once is more precise and easier to more reliably recycle in different ways.

Dimitri


7,413 post(s)
#02-Apr-20 12:56

You have to use the github link from the Raw button... the others launch the data within github viewing windows.

The link for global deaths is:

https://github.com/CSSEGISandData/COVID-19/raw/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_deaths_global.csv

Attached is a .map with the data source using the above, that opens up and shows the table.

The Johns Hopkins data is a bit awkward to work with, because you have to create a query that automatically sums all the data in the various columns to get a total. Not easy to do given that the number of columns increases. The data would be much easier to use if it were organized more like the ECDC data, with separated rows for each date for each country.

Attachments:
johns_hopkins_data_source.map

volker

1,086 post(s)
#02-Apr-20 17:37

Maybe i'm to stupid for MFD 9...

How i change the decimal places in a column ?

Like i do it in Mfd8 -> Column->Format->Decimal places

This source look actually:

https://www.worldometers.info/coronavirus/#countries


http://www.thegisservicesector.de

Dimitri


7,413 post(s)
#02-Apr-20 18:36

8 allows you to specify an output format for numbers in a table (basically, "styling" them) despite what the underlying data type is. 9 does not yet do that (yet), but simply prints numbers in tables using whatever data type they are. So, if you don't want decimal places, use integers, not floating point numbers.

You can also round. See the Converting between Different Data Types section of the Edit - Join topic for an example.

pslinder1
228 post(s)
#04-Apr-20 00:46
Manifold User Community Use Agreement Copyright (C) 2007-2021 Manifold Software Limited. All rights reserved.