Subscribe to this thread
Home - Cutting Edge / All posts - Manifold System 9.0.163.6
adamw


10,447 post(s)
#21-Oct-17 19:10

9.0.163.6

manifold-future-9.0.163.6-x64.zip

SHA256: 96eb35b14938f07ccf18f4a17b27a91be288145d1faacfc1c03072236dda118b

manifold-viewer-9.0.163.6-x64.zip

SHA256: 3bb9397732b0f9f338dfc2a86f5617a8d8f01e8bb4b32e89db368bfb16a6d77a

adamw


10,447 post(s)
#21-Oct-17 19:11

Tables

Table window fills itself differently.

There are three fill strategies:

  • Tables in MAP files and MFD_xxx tables in file / web data sources use an advanced fill method which stores very little data per record and is super-fast. Record values are taken directly from the table and so what is displayed on the screen is always what the table has.
  • Tables on data sources other than MAP files and tables returned by queries (regardless of the data source) use a regular fill method which stores full record values. Records are fetched in bulk and without order, which all data sources optimize for.
  • Tables on web data sources that have tiles use the regular fill method but delay fetching tile values for each record until it becomes visible in the table window. Since these tables are virtual and all fields except the tiles are synthetic, the table window fills very fast and then starts fetching tile values for individual records. If a record that has been scheduled for fetching is scrolled out of view, the request to fetch it is removed from the queue.

The table window puts a limit onto the number of records for all three methods. The current limit does not depend on the method and is set to 50,000 records. (We may allow editing this limit in the future and setting it to, say, 1,000,000 records or even higher. However, we find that in practice 50,000 is already plenty. If the table is larger than that, it is not very practical to work with it without applying some filtering using a query anyway. In fact it is much more common to see such limits in other tools set to 5,000 records or so.)

The scrollbar is no longer dynamic. The size of the scrollbar handle correctly reflects the relative size of the screen to the fetched portion of the table (at 50,000 records the handle is already about as small as it can go). The position of the scrollbar handle correctly reflects the relative position of the screen in the fetched portion of the table.

The table window displays up to two special records:

  • The new record placeholder. Displayed at the very bottom of the table, and only if the table supports adding new records.
  • The fill record placeholder. Displayed right above the new record placeholder. The fill record indicates where the records fetched from the table go. If the table is fetching records, the fill record is the bluish preview color. If the table finishes fetching records before hitting the limit of 50,000 records, the fill record is removed. If the table hits the limit of 50,000 records and the table contains more records than that, the fill record is kept and its color changes to gray. The fill record also has an icon on the record handle.

When the table is filling, if the cursor is put onto either the fill record or the new record, the cursor will stay on that record and the table will grow upwards. Otherwise, the cursor will stay on the regular record and the table will grow downwards.

The benefits of the new design apart from better scrollbar:

  • Tables from slow data sources are being slow in background, not in the UI. Previously, a slow table (eg, a table from a web source) was hanging the UI with "Program unresponsive"-like messages. Now the table window stays responsive and the worst thing that can happen is that it is slow to close waiting for a single outstanding request.
  • Tables returned by queries always use the fastest reading strategy and never become slow just because the result table managed to expose an index. This is different from the above in that the table window is genuinely faster at displaying these queries (frequently much faster), not just more responsive.
  • Tables from Google Maps and the likes are no longer prone to fetching way too much data from the web, running into issues with quotas. (Many of the big image servers like Bing display NULL values for tiles in the first fetched portion of the humongous virtual table - these NULL values are normal and occur because web servers do not store data for the extreme values of X / Y.)
  • Records with invalid values no longer stop the fill. They display NULL values in all fields and the fill continues. (We couldn't have done it before if the table was being filled via an index that was disallowing NULLs, now we can.)

Table window is much better at updating after changes. Previously, many types of changes were making table window refill. Now small operations like inserting a new record or editing values of an existing record never involve a refill regardless of the type of the table, there are big optimizations for updates to tables in MAP files, etc.

Editing a value of an existing record that has not been put into the edit mode by clicking on its handle applies the edit immediately. (Editing the first value of a new record still continues to put it into the edit mode which requires committing changes via Ctrl-Enter or the context menu.)

adamw


10,447 post(s)
#21-Oct-17 19:13

Selections

Windows can and do share selections whenever this makes senss.

The rules:

  • Selection for a table in a data source is kept until the data source is closed. If the user opens a MAP file, opens a drawing, selects an object, then closes the drawing and reopens it again, the drawing will have the object selected. If the user closes the MAP file, then reopens it and reopens the drawing, the object will not be selected because closing the MAP file threw it away.
  • Selection for a component based on a table uses the same selection as the table. This applies to all drawings based on the same table, they all share the same selection. Opening a table, a drawing made on that table, and another drawing made on the same table will allow selecting in any of the three windows and when the selection is altered in one window, the other two will update to show the changes.
  • Selection for a query or a component based on a query is per-window and is not shared. This is because (a) multiple runs of the same query produce different tables and it might sometimes be desired to keep an older table and a newer table, and because (b) queries might have parameters and so running the exact same query will produce different results because of different parameter values.

Changes to query functions:

  • Selection(table, selectionID, selected) -> table is reduced to Selection(table, selected) -> table, which is used for tables and components based on tables. SelectionForWindow(window, layer) -> selectionID is reworked to SelectionWindow(table, window, layer, selected) -> table, which is used for queries and components based on queries. The selection IDs are eliminated.
  • SelectionKeys(table, selectionID) -> table is reduced to SelectionKeys(table) -> table, plus there is a new function for queries: SelectionKeysWindow(table, window, layer) -> table.
  • Same for SelectionIsEmpty, which is now also two functions.
  • Same for SelectionIsInverted, also two functions.

(All dialogs and all generated queries have been adjusted to use the new functions, automatically choosing whether to use the variant for table or for query.)

tjhb
10,094 post(s)
#21-Oct-17 21:48

Ability to use the name of a saved selection in a WHERE filter in SQL is also a very nice bit of design!

E.g. if I save a selection with the name '< 2' then this query will return all of the member records.

SELECT * FROM [table]

WHERE [< 2];

(Thank you Edit Query button.)

tjhb
10,094 post(s)
#21-Oct-17 21:56

I think making selections per-table instead of per-window was a good idea. It was more flexible to allow different selections in each drawing from the same table--but probably hardly used? More intuitive now.

(And we can still achieve the previous behaviour using queries--where it makes most sense.)

apo
171 post(s)
#25-Oct-17 13:35

All those new functions as defining selection on the comparison of columns, event though available by SQL, and naming those, making those callable by SQL work like a charm and are a dream to work with. Just a little stupid question.

Is there any chance that the saved selections are injected in the mfd_meta? That would empower a lot some of the analysis we do by automising the injection of saved selection on new tables.

This question can be pushed a bit further on the style to be filtered on a selection set only. I know we can do that adding a new drawing on a table using the selection in the link query but...

Thank's a lot for the impressive new possibilities your are bringing in our domain... I would call this innovation compare to others, great!

adamw


10,447 post(s)
#25-Oct-17 14:48

We store selections in boolean fields and we consider any boolean field a saved selection. If you want to compose a list of all saved selections in a component, that's a list of all boolean fields in the schema of its table. No additional information in MFD_META required.

Now, we might extend the concept of saved selections to go beyond boolean fields because some file formats store such fields inefficiently, and, say, use integer fields and bit masks. When and if we do this, we will save the required info in MFD_META, although perhaps we will just set up a virtual table around these integer fields which will list individual bits as boolean fields.

apo
171 post(s)
#25-Oct-17 15:02

Thank's for the explanation so it means if my understanding is correct that the result of the selection schema is saved but not the schema itself. In this case I understand that the info is not required in the MFD_META. If it has been the reverse, being the schema or formula and not the result to be saved then storing this in the MFD_META would have interest me.

adamw


10,447 post(s)
#25-Oct-17 15:07

Yes, the result of performing a selection (a selected / unselected flag per record) is saved, and how this selection was established is not saved.

You can save the latter by clicking Edit Query, this produces a query which returns the same records as the selection operation. You can save that as a query component and run whenever you want.

adamw


10,447 post(s)
#21-Oct-17 19:13

Sorting

Table window can sort records.

Clicking on a field handle sorts records using values in that field. Clicking on binary fields (geom / tile / varbinary) has no effect - the query engine can sort on them, but the produced sort order does not make a lot of sense for normal uses, so we just ignore attempts to sort on binary values. Values in text fields are compared without case. The field handle displays an icon to indicate that there is sorting.

Clicking on a field handle of a sorted field again reverses the sort order. The field handle changes the icon to indicate that the order has changed.

Shift-clicking a field handle adds it to the sort order *after* the already sorted fields. (Click on X to sort by X, Shift-click on Y to keep the sorting by X and sort values with the same X by Y.) If there are no sorted fields yet, Shift-clicking a field sorts on that field normally. Shift-clicking a field handle for an already sorted field changes the sort direction on that field. If there are multiple sorted fields, all but the first field display gray-ish icons.

Starting to sort values by clicking field handles stops filling the table if the process of filling it was still ongoing. (We might change this in the future and allow the incoming records to appear unsorted at the bottom.)

There is a new submenu named View - Order with the following commands:

  • Clear Order - clears current order (does not reorder records into the 'native' order if there was some order before), followed by
  • current fields used for order - this allows seeing the position of each field in the current order, clicking on a field excludes it from the order (by 'unchecking' it conceptually), followed by
  • fields not used for order - clicking on a field adds it to the current order, for tables with many fields the menu lists the first 32 unused fields.

adamw


10,447 post(s)
#21-Oct-17 19:14

Filtering

Table window can filter records.

There is a new submenu named View - Filter with the following commands:

  • Clear Filter - clears current filter, followed by
  • All / Selected / Unselected - show only selected or only unselected records, mutually exclusive, disabled if records cannot be selected because the table does not have a unique index, followed by
  • current value filters, which are added by clicking on table cells and using commands in the new Add Filter submenu, clicking items for existing filters removes them.

Value filters are used to quickly filter fetched records to only those with value, eg, equal to a specific cell value, without involving the selection. Value filters are designed to be very quick to use and require no typing.

Available functions: NULL / not NULL (all types), = / <= / >= / <> (all types except binary), same geom type (geom types).

The Add Filter submenu for a cell first lists filters that will keep the current record visible and then filters that will hide it.

Adding a filter automatically removes redundant filters on the same field. For example, if the user first filters by POP1990 : not NULL and then adds POP1990 = 500, the not NULL filter will be removed because = automatically filters out NULLs. If the user filters by POP1990 >= 100, then adds POP1990 <= 500, the window will keep both filters and they will filter a range of values. If the user then adds POP1990 >= 200, the POP1990 >= 100 filter will be removed because it is no longer needed.

artlembo


3,400 post(s)
#21-Oct-17 20:54

love the filtering capabilities. This makes is very easy to see a selected set of features. Perfect for the big data stuff that 9 will move into.

However, I would argue that it is somewhat cumbersome to use (View -> Filter (wait for the submenu to come up) -> Selection

I wonder if it would be better to include the little filter icon that 8 had, or even that Access has, so you click on the icon, and can select or deselect the selected features. Of course this binary approach doesn't allow you to see just the unselected features, but I think that is something that can be put in.

What I like about the icon approach is that it is one click, and you don't have to jump through three hoops to get what you want. With the quiet cockpit design, you can just have the filter icon show up when appropriate.

tjhb
10,094 post(s)
#21-Oct-17 23:15

Something I initially find confusing.

There is a new submenu named View - Filter with the following commands: ...

  • All / Selected / Unselected - show only selected or only unselected records, mutually exclusive, disabled if records cannot be selected because the table does not have a unique index, followed by ...

Currently, filtered records will only be shown if they are within the first 50000 records. The 50000 limit is applied first, then the filter(s).

I think it would be better if the 50000 limit were applied to the result of the filter. (I.e. show up to 50000 filtered records.)

Say there are 4 selected records, but only one of them happens to fall within the first 50000 in the table. We can see 4 selected objects in the drawing/map window, but in the table window the filtered result will show only one. (It does show the continuation button at the bottom, which helps.)

Attachments:
Image.png

tjhb
10,094 post(s)
#22-Oct-17 00:19

The same design choice applies for value filters as for filtering selected/unselected records.

Let's say I have a table with millions of rows. They might be areas, LiDAR points, contours, whatever.

As Adam says,

If the table is larger than [50000 records], it is not very practical to work with it without applying some filtering using a query anyway.

But isn't it more useful if UI filtering is a means to limit the record set to a practical size, rather than being limited by it in advance?

Putting it another way: is it better to have a defined subset of an arbitrary subset (limit, then filter), or an arbitrary subset of a defined subset (filter, then limit)?

I think the latter, if it is possible (and can be fast).


In the same vein (sort of): it might be nice to have an Edit Query item at the bottom of the View > Filter menu, to write us a command to apply the current set of filters to the source table. This would not be redundant, since compared to the Edit Query button in the Select panel, it would supply code for all current filters, not just one.

A similar option might be good at the bottom of the View > Order menu, to write code to reproduce the current ordering.

artlembo


3,400 post(s)
#22-Oct-17 01:07

I can confirm this. I have a large drawing, and I selected two features: one which was object 1,200 and one that was object 86,000. When I use the filter to see the selected features, it only shows 1, because the second object is outside of the 50,000 range.

I think Tim is right on here, apply the 50,000 to the result of the filter. This is actually somewhat misleading to the user. If I select a bunch of features based on a query (spatial or otherwise), and then go to look at the table to see information about the selected features, I will have the wrong impression because any feature that is beyond the 50,000th object will not show up. This is one of those examples where the utility of the filter might cause more harm than good if one expects to see the selected features. At this point, my confidence in whether the filter is showing me what I want is actually valid.

So, I agree that if a feature is selected, it should show up in the filter. Now, if there are over 50,000 features selected, well, it doesn't matter much because a human operator can't get their head around that. But, if there are only a few, and one wants to scroll through and examine the records, it should include them.

Dimitri


7,413 post(s)
#22-Oct-17 09:40

Putting it another way: is it better to have a defined subset of an arbitrary subset (limit, then filter), or an arbitrary subset of a defined subset (filter, then limit)?

I think the latter, if it is possible (and can be fast).

I think that is something to consider, but the key question is how it works against all the many data sources, some of which are slow or really huge. Let me think out loud for a bit...

Tools like interactive Selection (ctrl-click) on a row or sorts on column handles have utility in two cases:

1. Where tables are so small we can treat the table window as the table itself.

2. When we use the table window as a visual proxy, an approximation for something too large to grasp interactively.

The first case is how small GIS, non-DBMS-aware GIS as it has existed for decades has trained most of us to think. It is a highly convenient way of working with data in a very fluid way, shifting between interactive selections and manipulations of data in table windows and visual moves in map windows. It's truly wonderful.

But a hallmark of the habits that experience with small GIS tables breeds is they fall short of the skills and habits that are required for working with the scale of data as it exists in 2017 and not 1997.

You might say "well, that is the job of computer hardware and software, to catch up, so I can work with my LiDAR data of billions of records the way I have become accustomed to working with itty-bitty shapefiles of just a few million points."

There is a point (ouch) to that, but given that data is increasing in size far more rapidly than computers are increasing in power. Think about it... Intel's flagship CPU is still a Core i7, and a Core i7 not all that different today than what went on sale nine years ago. But big data today, exactly the kind of data to which Future routinely connects, is petabytes in size (Google tile tables, for example).

Consider the discussion Tim and Art have rightfully launched: OK, suppose for rational reasons we want to limit what the table window proxy tries to show to, say, 50000 records or a million records or whatever. We really do not want our system to try to pull down all petabytes in a Google tile table or all the terabytes in an Oracle corporate database. We just want a rational sample, enough to give us a feel for what we have on hand, so that after some ad hoc exploration with things like sorts and selections and filters we can use the right tool, SQL, to work the magic we want.

We might all be all happy with 50000 records as a sample: too big to browse rationally but small enough so that scroll bars and other conventional interactive tools still have some meaning. Great. Now, we use the Select panel in the Contents pane (modeless! always on! cool!) to highlight all records where a text field includes the phrase "my cat is modal, my dog is modeless".

Using the table window as a proxy highlights those records in the 50000 record sample for which that expression is true. That's how it should be, because "selection" in this sense is not SELECT in SQL sense. It is simply the interactive highlighting of what is before you that meets the conditions you picked out. If you were to ctrl-click on records you could see in the table window by scrolling around you wouldn't think twice about it. You'd think, "well, sure, I'm selecting from those records I see in front of me by ctrl-clicking on the ones I want. I can't see the billions of other records in the database, just these 50000, so that's what I'm working with."

Note by the way...

When I use the filter to see the selected features, it only shows 1, because the second object is outside of the 50,000 range.

... although it is showing only the one selected in the visual proxy preview in the table window, *all* of the records that meet the criteria have been selected. You can see that using, say, the Australia Hydro example of about 800,000 lines showing rivers in Australia. Open the drawing in a window and also open the drawing's table. The drawing shows all 800,000 lines even though the table window shows only a sample of 50000. Use the Select panel to select some records and the selected records - all of them - will light up in red selection color in the drawing window whether or not they occur in the sample shown in the table window.

I think in some ways Manifold brought confusion into the picture because we used the word "selection" for the interactive method. We should have used "highlighted" or "pick" or some other word that does not automatically cause people to think it is the SQL thing when it is not.

People are accustomed to using Manifold "selection" as if it were SQL SELECT, and indeed the picking out of items or sets of items using selection can be used in such an analogously similar way that it is natural to think of it as an interactive version of SQL SELECT. So it makes perfect sense to me that both Tim and Art naturally raise the notion of using it that way, so when you launch a select template in the Select panel of the Contents pane it reaches out into the *entire* data set and not just the proxy of 50000 records in front of you and then it reconstructs the data it displays if there is a filter command to show only the selected records.

But if you keep in mind it is still just an interactive thing, just an assisted interactive way of not having to literally look at each record and decide to ctrl-click on it or not, well, then it may be more apparent why what is displayed in the table window is pulled from the proxy sample before you and not the entire database.

It is something to think about if it should apply to the entire database at all. When the 50000 record sample (make it a million if you think 50000 is too small) is a proxy for billions or trillions of records, then telling the Select pane to go search through a trillion records spread out all over the world in Google's server farms is one heck of a gigantic command. Even if Google were willing to play along with you and ignore all quotas for access to their server farms, you might not want to wait for the year or two such a command might require. Manifold has some "safeties" in place for that right now, like limiting what is grabbed from Google to only those tiles already in play, but still, it is something to think about.

That is where I think Tim has a really good idea:

it might be nice to have an Edit Query item

That could be the best of both worlds: use the interactive, ad hoc form for exploration against a proxy set of data in the table window that is human sized, and then use a generated query and full SQL to hammer away at very big data in the table itself to extract a custom, filtered, view regardless of what was the original proxy. So maybe an "edit query" item or "launch as query" or "filter and re-fill" item would be just the ticket.

artlembo


3,400 post(s)
#22-Oct-17 14:30

good discussion, thanks for clarifying some of the thinking.

I hear you with regard to the big data exploration. But, I don't think the approach/examples you lay out are as sound as they could be. I think (1) it is operationally possible to get us data outside of the first 50,000 records, and (2) they way you describe this above is not necessarily a good example of data exploration.

(1) The reality, if you perform a SELECT Count(*) FROM..., you obviously are working on the entire table, just as any other selection is working on it. So, in this case, you have to hit that billion record database anyway. We are just saying, if you are then returning values to the user, don't limit it to the first 50,000 records of the original table. BTW, this is another example of why the EXECUTE [... statement is so good - let Oracle do that heavy lifting, and return the results back to us.

(2) The first n records of a table are usually not representative of that table. For example, I might have information on every doctor in the USA, but if the original table was entered in order by the zip code, then the first 50,000 records would only be the doctors in the Boston area and into central MA. I would never see the California doctors when using the filter. You are making an assumption that the data is distributed evenly throughout the table. In fact, if I wanted to select the doctors in Palo Alto, I would never see them in the selection filter!

In this case, for big data exploration, what you might want is to return a random sample of 50,000 from the 1,000,000 selected records (1 in 20). Or, a systematic sample of the 1,000,000 selected records (in this case, you would select every 20th record to get an even distribution of 50,000 records). This, by the way, would be a killer optional addition for big data analytics - returning random or systematic results from an enormous table (you probably wouldn't want 50,000 records, but perhaps 300 records for sampling).

BTW, if you don't build the random and systematic approach, I may do so myself - it really isn't hard if you include a random number in the selection, and then sort by the random number.

Dimitri


7,413 post(s)
#22-Oct-17 16:11

(1) it is operationally possible to get us data outside of the first 50,000 records

Agree 100% and that is what Future does. Don't confuse how Manifold connects to all the data with how the data is presented in different settings.

Take a look at the new video just published. At the end it shows how fast selection is using the Australian waterways sample data set. That's about 1.2 million lines, many of which are very intricate objects. Obviously, if they are all displayed in the map Future is getting all of the data. You can see that in action in the video when the selection is done and objects are selected from all of the data, not just from the sample shown in the table window.

Future always connects to all the data but it does not make sense to feed all that data to a user in a form the user cannot consume.

Manifold can effortlessly feed the user billions of records. But the user cannot effortlessly browse tables that are larger than the Earth. In fact, you can't really browse interactively 50,000 records. Go ahead and try it. Take a half-second look at each record. That will take you seven hours just to glance for half a second at each record.

Now, I know there are going to be some speed readers who are reviewing this thread who right about now are thinking "heck, in half a second I can easily browse ten records, not just one, reading all of them to comprehend what they contain." OK. So maybe you'll browse that 50,000 record table in 40 minutes. I doubt that, but if you really think you can do that, fine. Let's dial up the size of the table window to one million records. That's now 140 hours at half a second per record - no breaks, no sleep.

We are just saying, if you are then returning values to the user, don't limit it to the first 50,000 records of the original table.

Returning the result of an SQL query is not limited to the first 50000. What is displayed in a table window is, because beyond that an interactive browsing interface doesn't make sense. If you think it does, what's the number you want returned? A million? A billion? Just how are you going to use interactive controls like scroll bars when to make sense they have to be a millionth of a pixel high? Touch them in a table that is larger than the Earth and you move them more than a continent... how exactly are you going to use that control to zero in on one screen's worth of records?

You can't... instead, you're going to use the same SQL techniques that everybody uses with larger data. Everybody uses those techniques because they work great. By the way, you don't need to just limit yourself to letting Oracle do the heavy lifting either as all those techniques work perfectly well within a huge Radian project as well.

(2) The first nrecords of a table are usually not representative of that table.

That's often true of smaller data but not in bigger data where data is not stored in ordered form nor is it ever assumed to be ordered outside of how you command it to be ordered with SQL. When Future stores data it does not store it in ordered form. When Future pulls data from external data sources it also pulls it without order.

In fact, if I wanted to select the doctors in Palo Alto, I would never see them in the selection filter!

The point of sample data is to see what the fields are like. If you can see there are towns and things like that in there, you can SELECT for whatever town you want.

You would never find them in a table that takes 140 hours nonstop to browse either. Suppose you have an Oracle table with every doctor in the US. Are you going to try to open that table and page screen after screen until you find an entry for Palo Alto and then try to click on that so you don't have to keyboard 'Palo Alto' into a trivially small query? Nope. You're going to use SQL: SELECT * FROM [doctors] WHERE [town] = 'Palo Alto'; Done.

It could be I've missed what you intended to convey. If what you are saying is that you'd like to use the infrastructure of Manifold's nifty selection capabilities as an alternative to SQL, that's a compliment for sure. But, I think the way to do that is how I closed my other post, to use Tim's suggestion to have a query built for you that implements what the selection does.

In terms of returning random samples, that's what you get when you pull data from databases that are not ordered. It doesn't really matter whether you get the first 50000 or 50000 because you pulled one in 20 out of a million. "Not ordered" means all of it is not ordered. That's one reason why big DBMS systems that provide table browsers typically limit those to only 5000 records, since if you can't get a feel for what you are looking for in 5000 records you're not going to get any better feel by getting five million.

My approach, by the way, to all this is not to argue with people but to give them the option to do what they want. If somebody wants to see a million records with the idea that interactive browsing of a million records is something they want to do, I say, fine, provide an option that allows the user to do that.

artlembo


3,400 post(s)
#22-Oct-17 16:44

It could be I've missed what you intended to convey.

Or, more likely, I didn't convey it well myself. Since a picture tells a thousand words, lets try illustrating one aspect of what I was saying with a short video (attached).

In this video you can see that a bunch of records in the south were selected. And, at that point I'm real eager to see information about those points. Unfortunately, I can't do it because those records were originally outside of the 50,000 first records. I hope this makes further sense.

Also, one other thing, not to belabor the point (I may in fact not be understanding you):

(2) The first nrecords of a table are usually not representative of that table.

That's often true of smaller data but not in bigger data where data is not stored in ordered form nor is it ever assumed to be ordered outside of how you command it to be ordered with SQL. When Future stores data it does not store it in ordered form. When Future pulls data from external data sources it also pulls it without order.

Actually, the video shows data that was pulled in from a gdb by Future. Is is certainly in order. Similarly, a very large database of all the health care professionals in the US is stored in some order (i.e. typed in by thousands of workers in India who enter it in zip code order). In my experiences, rarely is a structured database put together and then stored in random order.

Notice in the example above, I am not interested in looking at 50,000 records. I am interested in looking at about 50 or 60, but, I can't because they don't show up in the filter. In fact, I might want to copy those records and then paste them into a new table. Currently, in Future, we can't do that even from the drawing window (unless I've missed something).

Hope the video helps.

Attachments:
Untitled.mp4

Dimitri


7,413 post(s)
#22-Oct-17 16:59

Aha... now I understand. I had completely misunderstood what you were doing earlier, thinking you meant a query result.

I am interested in looking at about 50 or 60, but, I can't because they don't show up in the filter. In fact, I might want to copy those records and then paste them into a new table. Currently, in Future, we can't do that even from the drawing window

Yes, there's the new copy/paste implementation coming to a Future near you. In Radian and earlier future builds copy paste required consistency with coordinate systems, etc. That requires thinking when it would be better to just do a copy / paste with the system doing the thinking for you, converting coordinate systems on the fly and such.

There is also more coming in the synchronization and refresh of how filters work, basically re-applying filters and doing a re-fill to cover situations like you've shown in the video.

adamw


10,447 post(s)
#22-Oct-17 17:58

Replying to the entire subthread.

1. Yes, filtering / ordering is performed on the fetched portion of the table, not on the entire table. To perform filtering / ordering on the entire table, use a query.

2. This is specific to the table window and the tools that it has that other windows don't have. Tools shared between windows, like the Select and Transform pane, Edit - Select All / Select Invert / Select Clear, etc, operate on the entire table.

In the video, there is a big drawing with the number of objects bigger than the record limit for the table window. When you select some objects that happen to be at the "bottom" of the table, then open the table and filter the displayed records to only include the selected records, the table window does not display all of the selected records because they are "after" the fetched limit, but it indicates that there are more records in the table that may contain the data you want by showing the fill record.

We agree this might be confusing. Some options:

(a) let the limit apply after the filter - this will force a refill of the table on each change to the filter, this is kind of fine for tables in MAP files, but much less fine for tables in databases / web sources where the tables are much slower due to network communication, frequently refetching records from such tables is a bad idea,

(b) allow increasing the limit for the window by, say, clicking the fill record - we will have to refetch the records the table window already has, but that's probably fine, because these records are likely a small portion of what the table has as a whole,

(c) allow creating a query with the current filter options, say, by clicking the fill record and choosing Edit Query in the context menu (nod to Tim) - the query will apply the limit after the filter essentially doing (a) but the potentially slow action will be done with users consent and there will be no refills after.

There might be other options.

We think (c) is the best way to go. (And part of the original vision - to operate on the entire table, use a query, quick tools for filters / orders in the table window are for the fetched portion of the table - which can be made large, but not infinitely large.) With some effort we might allow doing Edit Query for such a filter query as well if its table manages to run into the limit, too. We might have to make the text of the query read-only though.

tjhb
10,094 post(s)
#22-Oct-17 21:56

This is a great discussion, fun to think through together.

There's something to like about all of Adam's options (a), (b), (c).

(1) After reading them, I first noticed that we already have a rough analogue in SQL. A SELECT query normally computes and returns enough records to fill the screen (plus a bit more); we can switch on the !fullftech option to require all records to be computed then shown.

So my first thought was that a mix of Adam's (a) and (b) might allow clicking the fill record to turn on something similar to "fullfetch" for filtering a table.

(2) On second thoughts, though, I think I was partly wrong above in saying

The same design choice applies for value filters as for filtering selected/unselected records.

The design questions may be the same, but the answers can be slightly different.

Selections, after all, are special. First because they are binary/boolean, whereas value filters can be multi-criteria. Secondly because we can see them with the naked eye. (That is why it can be confusing to see multiple objects selected in a map window, while a selection filter in the table returns, say, one.) Also they are native Manifold structures, so while long fetches can still be an issue (if a child datasource is accessed over a slow network), that is less likely than for value filters.

So I would add two more suggestions, one for selection filters, one for value filters. Hopefully they could feel roughly the same, while doing slightly different things.

(d) For selection filters (positive and negative), filter first, then limit records for display. Or more accurately, fetch all selected/unselected records up to arbitrary limit N (currently 50000), show these, and wait. If the user now clicks on the fill record, drop the current batch of records, fetch a new batch starting at an offset.

For a positive selection, the second run would look something like

SELECT * FROM t

WHERE [mfd_id] IN CALL SelectionKeys(t) XOR SelectionIsInverted(t)

OFFSET 50000 FETCH 50000;

If the user clicks on the fill record a second time, then the same thing but with OFFSET 100000, and so on.

(e) For value filters, limit first, then filter (as now). That is, scan the first N records, apply the value filter(s), list just the filtered subset. If the user now clicks on the fill record, scan a second batch of N records (with offset), apply the same filter(s), display. And so on.

For both (d) and (e) I don't think it matters that the current filtered batch would be lost when fetching the next batch. That seems better than refetching the current batch and adding more records each time. It is a cursor.

As to an Edit Query option, it doesn't much matter for (d), since we can already get the same thing through the Select panel. But it would be really handy for (e), since value filters can have multiple criteria, which might be found by trial and error, and may be difficult to reproduce accurately from memory. Really helpful if we can get the equivalent SQL to filter the entire table into a subset.

Dimitri


7,413 post(s)
#23-Oct-17 08:08

Great ideas Tim... this captures the essence of much of it for me:

Really helpful if we can get the equivalent SQL to filter the entire table into a subset.

The original idea of selections was casual, ad-hoc, interactive browsing, which implies small data sets. Can't browse 50000 records casually by ctrl-clicking those you want. For producing subsets, then, SQL is the ticket so you can pull a browsable subset of a few hundred or a few thousand records from many.

That boils down to SQL should be easy, and where it is not easy or could be assisted, then the task is to provide helpers. If people like the style of selection and value filters as an interface, so be it, and let that be something that is a helper whether or not it is SQL that does the work behind the scenes.

A table window is just a glimpse. We could lie to people and tell them the table window fills with all the records and they'd never know it didn't if we had a few sneaky tricks in play such as a Ctrl-End fetches a few screenfulls of records from the "end" of a big table and if every "selection" template just ran as an on-the-fly constructed SQL query to fetch just enough records to fill out the tiny few that a human can browse in the window.

It's almost like a Turing test for "is the whole table filled into the window or not?" The Turing test for whether artificial intelligence is achieved is if you have a conversation with some unknown entity on the other side of the monitor and you can't tell if you are conversing with a human or an AI. Just so, if what you see is in the table window a Turing test for whether you are working with the entire table or just a fill which a representative subset, is if by browsing, doing Ctrl-End, Ctrl-Home, selection templates, value filters, etc. you can't tell if the window is filled with a big subset or the entire table.

If we didn't tell people "there's more" with that fill record icon, but just on the fly loaded in a few thousand more records, as you suggest in (d) above, they'd never know. So it could be a marketing mistake we don't just do that. :-)

(e), since value filters can have multiple criteria, which might be found by trial and error, and may be difficult to reproduce accurately from memory.

We plan on introducing history mechanisms where that gives good rewards for the effort. For example, recording a list of all value filters that were tried during a session takes near-zero space. The table data isn't changing, just the filters you use to look at it, so at any point it would be effortless to go backwards and forwards in the stack of what was done.

We want to do that with the application of Style as well, where some pleasant combinations also might be found by trial and error, so you could always go back and forth between prior "looks" without relying on memory. Style is just a JSON string, also requiring near-zero storage space and also not changing the data, so you could save very much trial and error tinkering within a history list that could be accessed instantly.

adamw


10,447 post(s)
#23-Oct-17 11:13

A quick note regarding trying to fetch additional records on top of what the table window already has with OFFSET / FETCH - this does not work. First, the table might change from the time the first portion was fetched and so some records before OFFSET might be new and some records after OFFSET might be duplicates. Second, SELECT without ORDER is free to returns records in whatever order it wants and so even if the table doesn't change, the order might be different. Third, due to the two things above, OFFSET will already fetch however many records it has to skip, it will just not put them into the output, there is no other way to do it in the general case. There's no way to avoid re-fetching what the table window already has without having to fetch records in specific order, unfortunately, and having to force a specific order has other issues, it is a bad trade-off.

If you meant doing something special just for tables in MAP files, then yes, we can do that (and for tables in file data sources with partial cache kept in .MAPCACHE, etc).

All in all, we understand the desire (there should be an easy way to use filters with tables of any size, not just those that fit into the window limit), we agree it makes total sense, we will think about how best to implement it.

bclement
275 post(s)
#23-Oct-17 20:20

In my experience, confusion in computer systems is often a function of the computer helping you without your knowledge. It seems to me that the issue of a drawing that shows you millions of records but a table that is limited to thousands of seemingly random records is just another example. Yes Future can show you more features on the drawing than you can humanly peruse in a table so it makes sense for the table results to be limited. But what does not make sense is that you don’t have any way of knowing which of the features you can see in the drawing made it into the records that you can see in the table because the computer was “helping” you.

How much of a performance hit would it be to visually (grid/tile) the drawing into 50,000 record sized chunks so that if we select one feature in this tile and one in that tile, we would know/expect that the table will only show the one selection in the current table view since the table is restricted to that tile? Then, if we want to see the other selected record, we could cycle the table view to that tile.

adamw


10,447 post(s)
#24-Oct-17 07:46

How much of a performance hit would it be to visually (grid/tile) the drawing into 50,000 record sized chunks so that if we select one feature in this tile and one in that tile, we would know/expect that the table will only show the one selection in the current table view since the table is restricted to that tile? Then, if we want to see the other selected record, we could cycle the table view to that tile.

This is doable, but I am not sure it is a good idea.

In the previous paragraph, you said:

But what does not make sense is that you don’t have any way of knowing which of the features you can see in the drawing made it into the records that you can see in the table because the computer was “helping” you.

There is no need to know which of the features made it into the table. You only have to know that the table contains more records than you are seeing. This is what the fill record is for. The fill record signals that the table contains more records than what you see above it. Maybe we should put some text onto it and make it more prominent.

Say you apply a filter and you see a couple of screens of records. You scroll through them. When you come to the bottom you will see the fill record and this is a signal that there might be more records matching your filter. What we think we will do is allow you to right-click the fill record and select 'Show More Records' or something like that and after you click it, we will rescan the table and reapply the filter and stop after 50,000 (or whatever the limit is going to be) filtered records. (In a new window, allowing to apply more filters on top of the already filtered data if you want to, etc.)

tjhb
10,094 post(s)
#23-Oct-17 22:35

Thanks again Adam. I don't want to clutter this up much more than I have, so will be brief.

On the "quick note... this does not work", I don't think the first two points matter much in themselves, if the user can be trusted to understand that that SQL tables are in principle unordered. All we would be asking (by pressing the fill record) is to see N more unordered records--yes, they might sometimes overlap with the previous set(s) returned and shown, and we might never get all records, no matter how many times we clicked. Neither of those things would be wrong in principle. But your third point (the necessary re-fetching before OFFSET) seems to matter more (fetches would get successively slower, no free lunch, at least for the general case).

If you meant doing something special just for tables in MAP files, then yes, ...

Yes, I think it would be great to do the best (most intuitive) thing you can with your own infrastructure (there's no harm in encouraging people to use it!), with a good fallback for the general case.

Brian (whose first paragraph I think nails the issue nicely) reminded me of an idea from a conversation with Dan. How about this:

If there is at least one open window showing data for the current table, then the data for that window (or those windows) has already been streamed and cached (probably a bit more data besides, if we have recently zoomed or panned). Why not start with that data?

(f) For the selection filter, first show all selected records for the current window(s) and related data in current cache, without limiting the number arbitrarily. For a value filter, first apply the value filter to all data for the current window(s) and related data in cache. (Possibly, after the initial display, allow some means to fetch more data, not in cache. But possibly not--after all, the user can pan and zoom, refreshing the cached subset.)

That only applies where it applies--it can't be a universal rule--but seems an efficient speed-up for those cases, and is very intuitive, since I think there is I think a strong sense in which the data that we are currently looking at (or have recently looked at) in geometry form is also the data we are most immediately interested to see (filtered) in the table.

Anyway I like where all of your ideas are going on this and I promise I will shut up now!

tjhb
10,094 post(s)
#23-Oct-17 23:01

[Sorry: Ben not Brian, my apologies. Now shutting up.]

Dimitri


7,413 post(s)
#24-Oct-17 07:56

Interesting conversation!

I agree 100% in theory that if you select something in a drawing the corresponding table should show the selection.

In practice, I have to ask the question... suppose you open Australia Hydro and select 1.2 million lines... Suppose to limit your work you flick on Filter - Selected to just a mere 1.2 million rows in the table. How are you going to browse a table that consists 1.2 million red rows? Are you really going to sit there for hundreds of hours looking at each row?

Probably not. You're probably not even going to be able to further refine that using value filters because you're not going to sit there for ten hours at a stretch until, through page up / down scrolling (1.2 million records ensure the scroll bar is unusable), you find the cell with a value that you right-click on to invoke a filter.

Instead, you're going to use the Select pane or SQL. But heck, if you are going to do that, what's the point of scorching your eyeballs trying to page through 1.2 million records in a table? Just select in the drawing as you see fit, apply the Select pane or SQL and go forth and multiply.

Thinking out loud a bit more...

ifthe user can be trusted to understand that that SQL tables are in principle unordered.

Well, if the user knows that he or she probably already is using SQL. I'm not putting down selection, Manifold style, as it is a wonderful thing. But, for the most part, it is not the way people who have enough contact with SQL to understand the above go about it. Usually they just use SQL.

then the data for that window (or those windows) has already been streamed and cached

Ah, but quite often (depends on the data source) it might only seem that way. Drawings are fair game for illusion because whatever you see on your monitor of a million or so pixels is fakery compared to what is in data set with billions of vertices. Can't draw them all, since you would need to draw thousands of vertices within the space of a single pixel.

So whatever you see on the monitor is an abstraction, with the whole salt of the art being in the cleverness with which you abstract from millions of features the rendition shown, so it is a plausible representation. With some data sources that is easier than with others, but it is never simple and it is never 1:1.

The general issue here, by the way, often is not really performance, which is where folks have a tendency to bark up the wrong tree. In most cases (excluding truly slow and stupid data sources) Manifold is happy to fetch millions more records than you could look at in your lifetime, and to do that instantly. The issue is about rational user interfaces that can, in real life, be used.

So consider where we started, with the proposition that "what is selected in the drawing should be shown as selected in the table."

OK. What you see in the drawing is not the data. It is only a tiny fraction, an abstraction, of the data. What you selected in the drawing based on mouse motions somewhere in the data really is selected, but those red dots you see in the monitor are likewise an abstraction. You don't see the whole data, and you might not even be seeing 50,000 objects worth of the data.

Just like you can't cram billions of vertices into a million pixels, just so it's pointless to cram a list of records that literally is larger than the Earth into a series of page displays that each fit on a computer screen. Whatever you think you are looking at, in reality you are looking at a microscopic abstraction. The idea of working with the whole table is cool and is the way Manifold does things, but the idea of "loading the whole table" is simply absurd. Heck, far from browsing the entire Earth one screen at a time you wouldn't even be able to browse from where you are sitting to the nearest convenience store one screen at a time.

So... you look at a big drawing and make a selection. From a limited sample of the data, the abstraction you see on screen, you selected a real subset of the data, but what is shown to you on the monitor is red dots that again is a subset. Click open the table window. Again you see a sample of the data, not the whole thing (can't browse the Earth one screen at a time), and some of those records are selected. With big data you might be sitting there for days, browsing one page at a time, before you hit a selected record.

Click on the "Selected" filter and you get a table that is all red, but still, you could browse for days before hitting a particular record of interest.

I don't think it is a good idea to get hung up on how we can hang a handle on data that is too big to see all at once, whether it is in a drawing or in a table. Instead, we have to ensure that the instruments which can work with that data, such as the Select pane, can do so rationally at all times.

For that in table windows I would go for an analogy to map windows: As you zoom into a map window you get less abstraction until finally you are operating on a scale where the number of pixels on screen suffice to show actual objects and vertices, and not an abstracted representation. Just so, if you zero in with selection in tables (perhaps using the Select pane, or filters) what you get is what makes sense within the "resolution" of human scale.

So if you select 1.2 million records it shows you a sample. Call it 50,000 or a million or whatever. If you use the Select pane in a way that chooses only 5000 records, then those will be displayed when you Filter - Selected regardless of whether they were in the original abstracted sample.

tjhb
10,094 post(s)
#24-Oct-17 10:00

I hesitate not to shut up, and any detail will have to wait till tomorrow for me, but...

I agree 100% in theory that if you select something in a drawing the corresponding table should show the selection.

That is not something I would agree with even in theory, since it can’t work for a streaming data model. I didn’t suggest it and I wouldn’t.

I would agree with this: if you have M objects selected in a drawing, N of which you can plainly see, and you go to the table and say “show records for selected objects”, you should see records for at least the visible N.

(Whether you should see all M is more open, mainly a matter of performance.)

It makes no difference whatsoever whether you can or will browse those N (or even M) objects. That is a complete red herring, not relevant.

Because consider what you might reasonably do next. For example, add a second filter, or just sort the objects to see the largest or smallest records of the N (or M). Or maybe just notice, gosh, there are lots, let’s try something else.

It’s my data, I’m in charge, I ask a simple question, and I expect an answer to that question (even if it turns out to be a stupid question), not an answer to something else. (If I understood him correctly then that is just what Ben said in different words.)

ColinD

2,081 post(s)
#24-Oct-17 11:52

if you select something in a drawing the corresponding table should show the selection.

That is not something I would agree with even in theory

I use table and drawing in M8 in both directions. Today I spent some time determining the most likely lat/lon for a few in a bunch of images where the camera didn't record them. I matched the photo time with the time on the GPS track points. I needed to see where those points sat over the aerial image to confirm it was in approximately the correct location. Much of what I do is not big data, but necessary data never-the-less.


Aussie Nature Shots

Dimitri


7,413 post(s)
#24-Oct-17 13:12

Let's drill into this to see how much we agree or disagree:

if you have M objects selected in a drawing, N of which you can plainly see, and you go to the table and say “show records for selected objects”, you should see records for at least the visible N.

For the sake of argument, let's say you've selected more than a few dozen records. Say, more than 100 but less than 1 million. (n that case, you will not see records for at least the visible N. You'll only see the few dozen records at a time that fit into a single table window on your monitor.

I make that distinction to emphasize how no matter what you think is in the table window or what is really in there, you only see what is there one screen at a time.

The question is what happens when you want to see more selected records than can fit into a single screen. I agree that if the tools you use have effect against the entire data set, and Manifold plays tricks to show you what you want out of the all the data, then whatever internal fill strategy is used doesn't matter.

Applying my Turing Test of whether the table window is filled with the entire table or if it is just filled on the fly with a few dozen records at a time, on demand, to make you think it is filled with the entire table, well, you'll never know which it is. You might guess sometimes based on lags for certain operations against big data in slow data sources, but you'd never really know.

It makes no difference whatsoever whether you can or will browse those N (or even M) objects. That is a complete red herring, not relevant.

I strongly disagree. Data that is small enough to browse interactively allows a very different style of work. You can actually scroll through it, for example, to pick a cell you can right-click on to invoke a filter without typing. Consider Art's example of doctors listed in order of zip code. If there is only a few thousand of them you could actually scroll through using page up / down or the scroll bars to zero in on the section where there are a few in Palo Alto to right click on one of them for a field filter.

But you cannot do that if there are a million results. In cases where you cannot browse you must use a more formal style, perhaps with SQL and perhaps with the Select pane, but not browsing. That picks up on what you wrote in the next paragraph:

Because consider what you might reasonably do next. For example, add a second filter, or just sort the objects to see the largest or smallest records of the N (or M). Or maybe just notice, gosh, there are lots, let’s try something else.

If the data is too big to browse you can't add a second filter with a right-click because too big to browse means too big to scroll to the cell you want for that filter. Filters in that sense are an interactive convenience for relatively small tables, those which are the results of something else or which are data sets small enough to browse interactively.

What's going to happen if the table is too big to browse is that you will go straight to the last sentence, noticing there are lots and it is time to try something else.

At the end of the day, if table windows automatically fill in with some results in situations where those can be seamlessly presented, such as when working with big data in a map file, so no one ever can tell what the fill strategy was well, then all operations happen against the entire set of data all the time and nobody every knows or needs to know what the fill strategy is.

If there are exceptions to that because of very slow data or exceptionally big data (petabytes of Google tile records), then some hybrid approach may be required where to a greater extent the display fill is a limited proxy and where in some cases the operation might not work against all the data.

For an example of the latter, suppose somebody has open the entire world as a Google street map image and they select, say, all of Eurasia. That's still probably petabytes of records as tile data so in that case the usual tricks of jump to the beginning or jump to the end are not going to be realistic. That's a case where that last phrase "lots, let's try something else" applies right from the very first interaction.

artlembo


3,400 post(s)
#24-Oct-17 14:32

I guess (c) is the best option of the three you presented. But, as my short video shows, selecting features for something important (i.e. the areas that are the most flooded, or those near the hospital) that is beyond the original fetch count, would not allow one to peruse the selected records. This would be maddening to any data exploration analyst. If I lasso a series of roads or buildings in the map, the very next, human intuitive thinks I'll want to do is discover what is going on in those locations. Allowing people to see those selected records is critical to their analysis.

Secondly, I made it easy in the video, as the answer was binary: the records were either shown in the table, or they weren't. The more dangerous case is if I straddled the 50,000 record limit, some below the limit, some above). As I explore the data, I might quickly determine the flooding only seems to effect expensive homes (assuming the expensive homes are the ones that are within the 50,000 limit). It could be that low income or mid income housing is also part of the selection set, but I would never know it by using the filter.

This is setting people up for making mistakes, and as I said, natural human intuition is to see something interesting and then clicking on it to see what is there (which you guys have implements with the Alt-click), but the next thing is to select a group of objects and be interested in what is there.

So, I think the problems are one of being frustrated you can't see the data, or worse, making erroneous conclusions because you don't realize all the data isn't there, or that it is severely biased due to the order the data was entered in the table.

Dimitri


7,413 post(s)
#24-Oct-17 17:47

that is beyond the original fetch count, would not allow one to peruse the selected records.

Consider that a temporary limitation. I think we're all agreed that no matter what is the original fetch count one should be able to peruse records selected by other means (in the drawing window, using the select pane, etc.).

artlembo


3,400 post(s)
#24-Oct-17 18:28

thanks. In light of your other discussion of changes that get made to Manifold vs. other systems, I just showed this thread to a friend. He was shocked that this issue came up about 3 days ago, and the software is being addressed to change it. He could not believe the responsiveness. Nor could he believe the respect that Manifold gives its user community to argue with them over and issue, make the case, and then they go off and fix it.

I agree. This wasn't a bug (which is expected to be fixed). This was an implementation decision, and a few of us didn't agree with it in totality so we made our case. Here we are in the 3rd day, and you've assured to that this is only a temporary limitation.

pslinder1
228 post(s)
#24-Oct-17 23:55

It would also be nice to show at the bottom of each table the total number of records in the dataset (whether fetched or not), the total number of records fetched, the total number of selected records, and the total number of filtered records. That would tell the user the status of the table. It might be good to have this at the bottom of a drawing based on that table as well (except for the filtered number, unless a filter in a table 'disappears' objects from the drawing).

Will Future be adding the ability to filter by some inputs a la Excel? I love the new fast filtering options but if those are not adequate it is a little bit cumbersome to go the selection pane to make a selection then go back to the menu bar View>Filter>Selected. I think it would be better to be able to do more filtering (as much as you can do in Excel)without having to go the selection pane.

Why can you only do selections and not filters in drawings and not filters? I would have thought hiding some data might be just as useful in drawings/maps as it is in tables.

adamw


10,447 post(s)
#25-Oct-17 07:28

Agree that the record counts are useful, we will show what we can (not all counts are always available, but we should show all that are available).

What specific filters from Excel you want to see added?

We don't think the Select pane is all that cumbersome, but of course using filters is easier, because the filters have been specifically designed to be easier to use (just click / select, no typing, no modes, no multiple controls). Still, is there anything we can do to make the Select pane easier to use as well? Because no matter how much we put into the filters, they won't be able to match what the queries can do.

We filter tables and not drawings because with tables you can tell the window quickly what specific filter you want to apply by clicking a value and choosing the filter operation. With drawings you can't do that, the values aren't visible. We are going to have a way to establish a filter in a table window and carry it to all other components with the table window exposing filters as a query.

pslinder1
228 post(s)
#25-Oct-17 19:00

I like the idea of filtering the table and having it carry over to the other components.

In terms of mimicry of Excel I would say that there should be a separate filter facility that could provide all of the basic filter capability that is in Excel's Custom Autofilter pop-up window. This requires and allows typing an input but is still pretty simple. This basic"input-required" filter capability would be a half measure between your current no-typing filter (which I love) and the more powerful but I think more cumbersome (practically and conceptually) selection-pane plus the view>filter>selected method.

A new thought below:

In general I think that we sometimes conceptually conflate selection and filtering and I think that is a mistake. The purpose of selection is to do something with the selected elements (delete, copy, edit) the purpose of filter is to just look at those elements easily. This can just as useful in a drawing as it is in a table. So why not do filtering in drawings?

In a drawing you can effectively do a filter by selecting elements within a buffer of a line (just a single example) and then copying just those selected elements into a new drawing. But that is cumbersome and not really the point if you are just trying to filter.

Instead of doing a select why not use the same 'selection-style procedure' to do a filter and hide the elements that are not in the 'filter selection'? Conceptually this is no different than in a table. If filtering is valid to do in a table then it is just as valid to do in a drawing. The only difference is that in a drawing you are more likely to want to filter based on position as well as values, than you will in a table.

I would go further and allow the user to do a value filter from within the drawing or map without having to go back to the original table first. For this to work effectively though I think that there needs to be some simple filter facility where you can input the values (or edit them) a la Excel as opposed to just click on them.

Selection and Filtering are very similar but as they have different purposes they should have different facilities to implement them even if the methods are very similar. Just a thought. I would be interested to hear what others have to say.

Mike Pelletier

2,122 post(s)
#25-Oct-17 19:51

I like your thinking and it seems ESRI does this with what they call a definition query, which is a function within the setup for each layer. Your thoughts on the difference between filtering and selecting are good and I'd just add that it would be best if we can quickly switch between doing one or the other without having to restart.

Also, I'm a fan of right clicking to get to more tools because it minimizes having to move the cursor across the screen, which is one downside of the contents pane. It does seem like a right click on a layer tab or a layer in a layer pane could bring up a filtering option. The layer name could change shading to indicate a filter is on.

pslinder1
228 post(s)
#25-Oct-17 20:21

I like that idea. Maybe two context menu options one titled "Select All Filtered" and the other "Filter on Selection"?

Mike Pelletier

2,122 post(s)
#26-Oct-17 15:04

Maybe ctrl-A covers "select All Filtered" and tools like in Mfd 8 to filter the selection and unselect all covers the later?

In thinking about how to organize tools generally, I'm thinking simple commands like these get implemented with shortcuts and right clicks or possibly in the header bar of the table, whereas the contents pane is for more complex stuff.

Anyway regarding your idea of filtering a drawing, maybe just a right click option on a drawing layer that says to filter drawing to match the current filter in its table. If you want to make it more permanent than you select all records in the filter and save the selection or create a new drawing altogether.

adamw


10,447 post(s)
#27-Oct-17 07:21

With 9.0.163.7 one can:

1. Open the table. Add some filters. Do View - Filter using Query to put the filters into a query.

The last command opens a command window which allows applying filters and seeing the first 50,000 records that pass it. If the underlying data changes, ie, the filter is using a selection and the selection changes, re-running the query returns records that pass the filter now.

The filter can be carried to a drawing like this:

2. Create a query and set its text to the filter query. Copy and paste the drawing without the table and edit the Table property in the new drawing to point to the created query.

This is not as easy as we would like, but it is a fair first step.

We can make the process simpler by allowing to save the command window as a query (obvious and in the wishlist) and allowing to create a drawing by clicking a geometry field in a table (also obvious and also in the wishlist).

On top of that, we are interested in ways of applying a temporary filter - the above establishes a persistent filter, not a temporary one - to an existing layer in a map window. Maybe we should have something like Copy / Paste Filter similar to Copy / Paste Format in Office apps. That is, the user would open the table, establish a filter, do Copy Filter, right-click a map layer for a drawing based on the same table, and do Paste Filter.

tjhb
10,094 post(s)
#27-Oct-17 07:43

I am just really perplexed, by the changes to the table window in 9.0.163.6 and 163.7.

I don't know what they were for.

Overall, I more-than-dislike them, compared to what we had in 9.0.163.5 and previous builds.

They are really awful to use. We just can't get at our data! It's like being in a sheep pen.

With one exception: scroll bars make more sense. That's good. And quick filters would be fine... if they weren't plain misleading for tables of ordinary size and larger.

What was the purpose in restricting table listings to (say) 50000 records? Or even if it were 10 million? Why limit them at all? (I know there is a reason, and if I could see it I would start to understand the tradeoffs.)

Why was the situation in 9.0.163.5 and before, using bottomless tables, no good? (Surely it was not just scroll bars--there must be heaps of alternative designs to make scroll bars work intuitively with infinite tables.)

None of this has been explained. What's the purpose?

(I still need to properly digest the changes for table windows in 9.0.163.7. I can't see what they were for yet, after just one day, and don't fully understand the release notes. I might be being dumb.)

adamw


10,447 post(s)
#27-Oct-17 08:18

The changes are for making table window more useable and faster.

Examples of more useable: scrollbar, zero-reload edits, etc.

Examples of faster: fetches are unordered which all data sources optimize for and which allows queries to choose the way they return records, fetches moved to background,etc.

The bottomless tables can only be bottomless if there is a unique index, fetching via a unique index has its own big problems which we think are worse than the limit.

What specifically is the issue?

Applying a filter to a big table makes the table list whatever records from the fetched portion pass the filter and list the fill record to indicate that the table may contain more records passing the filter. Right-clicking the fill record and selecting Filter using Query opens a command window which applies the filter to the whole table. Is that really so bad?

tjhb
10,094 post(s)
#27-Oct-17 08:56

Thanks Adam I can’t reply fully now—tomorrow I will try. But to answer the very last question, I really do think it is bad, or I wouldn’t hint so strongly that I did. [Added: the particular disappointment there is that the query on the “whole table” will still list only up to 50000 records! That is useless, it’s still an arbitrary subset.]

It’s about being able to see, and analyse, my data. (That’s most of what the software is there for.)

This misgiving is only about the shift from bottomless tables to arbitrarily limited tables. I don’t get why that is worth the cost.

I wish we could all have a face-to-face summit on this, that would be really good.

adamw


10,447 post(s)
#27-Oct-17 09:26

... only up to 50000 records! That is useless ...

What are you going to do with a table that lists or imitates listing a billion records that you can't do now? You can't work with huge numbers of records using a list, that's the entire thing. When you scroll to a random place, then scroll away from it, you cannot even scroll back to it, it's a needle in a haystack. You quickly skim through a list, you think you saw something and it takes you twenty PgUp's to realize that maybe you saw it, maybe not, and you will have to spend an hour of more PgUp's to tell for sure, because the table is just too big. It's in everything.

If you apply the filter and the filter still runs into the limit, that just means that the filter isn't specific enough and failed to return the number of records which you can reasonably work with in a list. (If you think 50,000 is too low for that, fine, we'll provide an option to up that to 1,000,000 or whatever but at some point it *will* become unworkable from the user's point of view.) Adjust the filter to be more restrictive.

I don’t get why that is worth the cost.

Performance and usability.

adamw


10,447 post(s)
#27-Oct-17 09:51

Adding to the last post (ran out of editing time):

I don’t get why that is worth the cost.

Very roughly, we traded the ability to theoretically scroll to any record (as long as the table has a unique index, etc) to the ability to scroll several times faster and do sorts / other things. The entire reason we did filters was because we could do them without re-fetching data from a table on the already fetched portion. If we didn't have that fetched portion, we wouldn't have done filters like we did them because that would just be an invitation to click and wait, click and wait again, click and wait some more, and it would be unusable on many tables (like even on slightly slow queries). We'd have to do filters like some designer for queries. Which we have now too with Filter using Query, but you can build the criteria progressively while seeing intermediate results (fast) and doing backtracks and changes (fast), etc. There are other things like that, too.

We understand the appeal of bottomless tables and being able to say - "hey, the data is all there, look" no matter what the table size. But this doesn't come for free, there are many drawbacks. We think the new design is a better tradeoff.

tjhb
10,094 post(s)
#27-Oct-17 13:27

Let’s say I have 50 millon records, not huge. I know that I really only care about 50000 of them (anyway that’s all I *can* care about, realistically)—the rest to me is background or acquisition noise.

I already know a simple criterion that can take my records down to 1 in 100, but I need to experiment (with raw attributes, plus some alternative synthetic measures I can add through SQL really quickly, almost on the fly in SQL9 thank the Lord!) to get that down to 1 in 1000. At the same time, I want to make sure that I don’t obviously skew my selection geographically too much.

Or I might be looking on the same scale for data that are not typical, but possibly errors, and I don’t quite know yet how to separate errors from good data. I need to make enquiries, partly SQL, partly just looking, and check different subsets.

You’ve given us the ideal tool for that—on almost any scale. It’s like a hot knife through butter (or many knives through butter). Except that currently our first step, and every subsequent step, must be to take an arbitrary-though-not-random-nor-representative subset of 50000 records. That means we really can’t explore the data openly.

Mike Pelletier

2,122 post(s)
#27-Oct-17 17:03

Adam, could the subset of 50000 records be made random without a significant performance hit for table viewing?

Seeing the filter on all the filtered records in the drawing would allow for the geographical visual check (since we can see the distribution of over 50000 records in a drawing). Perhaps make Select All work on the filtered records instead of all records. Then we can see the distribution and create a new drawing from there if desired.

Still it would be nice to see the filtered records in a drawing without them selected and I'm happy to see Adam's earlier post indicating they like it as well. Seems like showing it temporarily with a right click on layer > dropdown > show table filter would be good and leave creating a new one for when your more sure you want to keep it.

adamw


10,447 post(s)
#28-Oct-17 10:04

Adam, could the subset of 50000 records be made random without a significant performance hit for table viewing?

Not really. We could do a quick random sample for tables in MAP files, but for tables on other data sources or for queries it would be awfully hard to do a random sample without spending as much time as it takes to read the whole data set.

Overall:

Regarding filters / orders in tables applying only to the fetched portion - we hear the concerns. We added a couple of things to 163.7 based on the feedback already, we are planning a few more changes and are open for more. I suggest we look at the specific scenarios and the specific issues.

Filters in map layers will work on the whole component.

adamw


10,447 post(s)
#28-Oct-17 09:49

Let’s say I have 50 millon records, not huge. I know that I really only care about 50000 of them (anyway that’s all I *can* care about, realistically)—the rest to me is background or acquisition noise.

I already know a simple criterion that can take my records down to 1 in 100, but I need to experiment (with raw attributes, plus some alternative synthetic measures I can add through SQL really quickly, almost on the fly in SQL9 thank the Lord!) to get that down to 1 in 1000. At the same time, I want to make sure that I don’t obviously skew my selection geographically too much.

Since you know a simple criterion that you know you want to use, put it into a query and start with displaying all records that pass that filter. This will get you 500,000 records. Construct the rest of the filter from the data displayed for these records.

How to do that given that you don't see the whole 500,000 records and just see 50,000? But you can't "see" 500,000 records, the limit of 50,000 is there because anything larger than that is unusable (again, we will make the limit editable, because different users might like to set it higher or lower). You can scroll into the middle and see some values, but how characteristic are these values for the whole 500,000 records, whether they are "small" or "large" you can't see.

So, how to filter the 500,000 records. To do this sensibly, you will need some statistics, like top ten unique values in field X and how often they occur, or the range of values in field Y and how many occur in various intervals. Create some queries to get this data. After you do this you will have a better understanding of what the data is and can construct a reasonable filter.

I don't see how bottomless tables help with any of this. They don't help filtering. About the only thing they help with is determining min / max value in a roundabout way - if you order by field and then scroll all the way up or down you can see what the min / max value is with respect to the whole table and not just to the fetched portion. Although even this doesn't always fully work because one of the ends frequently contains NULLs. And I am saying that this is a roundabout way because this spends significant time putting a billion records into an order you won't be able to make use of in the list anyway (because the list is too big) in order to merely find min / max and without filtering for NULLs. Finding min / max directly with whatever filtering is significantly faster.

joebocop
514 post(s)
#30-Oct-17 19:01

On the topic of table statistics, does it make sense for Future to keep track of column content statistics, per table?

In my post here

http://www.georeference.org/forum/t138923.2#138924

I had thought it would be useful to be able to use column headers as a means of quickly seeing what data you've got in a table. The screenshots show that clicking on a column header shows a list of DISTINCT values, as well as their COUNT(), for the entire table.

If those statistics were maintained by Future, and updated on creates/updates/deletes, then those stats could be useful, even when its only feasible to display 50k rows in the table itself.

Icing on that particular cake, for me at least, is that ability to then use that statistical "view" (doesn't need to be a dropdown from the actual column header, could instead be something over in the contents pane or whatever) to multi-select the distinct values you are interested in "seeing" in the actual table window's display. Also, Future needn't keep statistics on every column, necessarily, but perhaps only those columns having a btreedupnull index.

With that filter applied, right-click the fill record, generate the sql, if desired.

That, for me, alleviates the "unwiedly to navigate 1m rows by scrolling" problem, and gives a fast interface to the entire table's data, via column statistics, for at least those columns you've chosen to index/keep stats on.

adamw


10,447 post(s)
#31-Oct-17 05:26

That's an interesting idea.

Yes, we could perhaps collect frequency data in an index and keep it updated there. It would be useful not only for filters, but also for the query engine.

Thanks!

joebocop
514 post(s)
#31-Oct-17 15:56

I believe the jquery table functionality used by Fulcrum uses such statistics, and then fetches the requested data asynchronously. Their tables also use the slick jquery Query Builder (http://querybuilder.js.org/); an interface which quickly abstracts the basic but powerful filtering provided by a SQL WHERE clause.

Delightfully, the "statistics" displayed to the user are based on the current "filter", so as you continue to add criteria to the filter, the statistics update. This actually cuts both ways in Fulcrum, where panning the map view adds a spatial constraint to the table filter. Again, the statistics for the resultant records are displayed, and allow fast and easy further filtering.

Another potentially interesting application here could be to synchronize a Map or Drawing window's extent to the bounding rectangle of the filtered table dataset (provided those data have geometry, obviously). This would allow a drilling-down through table data based on attributes, via the contents pane ideally, while getting to see what that filtered subset "looks like" at the same time. Especially powerful with an undocked map/drawing window. For those filtered results returning vast spatial expanses, ok, not as useful, but perhaps one expects that a specific attribute should only appear in data within a certain area, and this live updating would very quickly prove/disprove that assumption.

I am rambling, but I see real potential in this Future release, and especially in the attention Manifold staff appear to be giving the community suggestions in this forum. Kudos to you all on your efforts; we're all benefitting.

Dimitri


7,413 post(s)
#31-Oct-17 16:25

I am rambling,

More rambling, please. This is wonderful stuff!

I strongly agree with bi-directional filtering using map extent. What somebody is looking at in the viewport is a natural way to know their area of interest, at least for that moment. Zooming in and out and panning in a visual window is a highly intuitive way to declare what you are interested in. Love it!

joebocop
514 post(s)
#31-Oct-17 21:06

Here's a pretty crude video (Sorry Art, sorry Dimitri; I'm not quite at your level in this regard) showing the Fulcrum table editor in action.

https://youtu.be/3Y_AQTwdbzw

pslinder1
228 post(s)
#31-Oct-17 22:48

Those are the same type of filtering capabilities that are available in Excel; I like the addition of the stats. The problem with a very large data set could potentially be that the number of unique fields in a column becomes simply too large to be able to effectively use even if it is sorted. I wonder if that could be solved using a nested list of unique values where the upper levels are ranges for the lower levels?

adamw


10,447 post(s)
#01-Nov-17 05:46

Fields with numeric / date values can be filtered using (automatic) ranges, like you say.

Fields like "Road Type" with not many unique values can be filtered using those values.

Fields like "Street Name" with tons of unique values are hard to filter on and nothing will likely make them useful for a global exploratory filter. They might become useful however when other filters reduce the subset enough so that some values of the "Street Name" start dominating.

There could also be filtering by location (show only objects visible in window X) and filtering by selection.

Not bad.

Dimitri


7,413 post(s)
#30-Oct-17 11:09

must be to take an arbitrary-though-not-random-nor-representative subset of 50000 records.

Adam has discussed this but let me take a cut at it as well...

Suppose you have a table with 50 million records or a few billion records. Whatever the number, just make it big enough so everyone agrees (like in that thread about tables the size of the Earth) that it is completely out of the question to get your head around the table by browsing. Let's call that a "big table."

What is "representative" of a big table? Can you tell by browsing a window that claims to include all of the records in that big table? No, because by definition you cannot characterize a big table by browsing it.

So, how do you get a "representative" sample of a big table if browsing won't do the trick? That depends upon what you mean by "representative." "Representative" means different things to different people, usually at different times based upon the different things they happen to be doing. Quite often, when engaged in a particular analytic task a key part of that task itself is to decide exactly what is "representative" of the data, distilling the essence of it. In such cases you don't know what is "representative" unless you accomplish much of the task you have set out to do.

People might apply various sophisticated means, like Adam discusses, to construct what they consider to be a "representative" subset. But with a big table they're not going to do that by interactive browsing. By definition a big table is too big for that. Reduce it by some means, such as SQL, to a mere 500,000 records and you still won't know by browsing if that is "representative" or not. You just cannot humanly browse enough records to say.

The idea of tables being so big that methods we as ordinary humans expect to be useful are not useful at all, goes against our ordinary human experience. It really is impossible to get your head around a big table to any degree at all by browsing it, and that is one of the issues in this thread. People simply won't accept that.

They naturally think, all their prior experience working against them, that "well, maybe I can't completely understand it but I can get some useful impressions by browsing it. Suppose it has a random sample, etc." But that isn't true at all. The only impressions you can get by browsing it are fake impressions that are as likely to mislead you as be truthful.

For example, you might look at Art's data set that on the basis of a nano-scale browsing sample seems to be ordered by zip code. But that nano-scale sample is just as likely to miss huge swaths of the table where it is not ordered by zip code, and thus mislead you.

Tables are ordered only for that brief moment when you construct a result using ORDER BY. Put that ordered result into a table and then the next millisecond you don't know if that order holds as a result of other people or other processes deleting/editing/etc records in that new table. The beginner makes that mistake and despite reading a hundred times "tables are not ordered" will proceed to write scripts and other workflow that assumes they are ordered, forever. The experienced DBMS person knows that tables are not ordered and routinely applies order when that is desired.

Getting back to grabbing a "representative" subset of a table which you can use to make decisions as to how you might use the data in the table or systematically modify it, learn from it, etc.: As you get into thinking what you consider to be "representative" in whatever particular task you are doing, with big tables you are going to use queries so you can take advantage of the highly refined, endlessly powerful toolset thatqueries provide.

---

About randomness and order and teleporting into a "sample" table that is skewed by pseudo-order of the table that is a latent effect of how it was first loaded:

The fundamental notion not to lose sight of is that if you are working with a big table whatever you see in a table window means nothing in terms of order. If you think whatever you see by browsing a big table means something about the order of that table, you haven't gotten your head around what "big table" really means. Tables the size of the Earth really are a different deal than tables which can be listed on a roll of paper only a few hundred feet long.

In a database of people giving a location for each person, if all of the records for people happen to be from the state of Alabama in the first page of the display of 150 million records that doesn't mean anything at all about the content of the table; for example, it does not imply that all 150 million records should be assumed to be in Alabama. Big tables are unordered. Assuming they are ordered in any way is a mistake, and concluding they are ordered because of what is seen browsing a few screens, a few feet of screens out of a table that is larger than the Earth, as if that were somehow a proof, is simply a blunder. If you want to see order use ORDER BY.

I understand that some data sets in some formats as a side effect of the format or the way they were initially loaded, really do appear to be ordered, as in Art's excellent example of health care data originally stored by zip code. But no matter if the data started that way or however it appears when you first touch it, the moment that data gets loaded into any modern DBMS (Oracle, DB2, etc, etc.) or modern data handling software like Radian that can work with big tables, it no longer can be assumed to be ordered.

There are exceptions, of course, but big tables tend to come from data sources that do not store them as ordered data. For order you use ORDER BY, and you don't take it for granted that you will get reliably lucky with pseudo-order as in the health care data originally stored by zip code, because that's apparently how the records were first loaded into the database system.

I note that even in Art's example, counting on that ordering could be bad medicine because if there are any changes to the table, and new or edited records are inserted in an unordered fashion that is different from the original load, the DBMS could just as easily put a swath of Lost Angeles (that was a typo, but since it seems to fit let's leave it...) records intermixed with zip codes for San Francisco, which is a few hundred kilometers to the North of LA.

Because some apparent order can persist through what seem to be large swaths of records the trap for beginners is to browse a few sections of the table and declare, "well, of course it is ordered! Everywhere I look it is ordered!"

If you really understand what a big table is you know that is nonsense. A big table is like a paper printout of records that stretches from San Francisco all the way across America, all the way across the Atlantic, all the way across Europe, Eastern Europe, Russia, Central Asia and all the way across China to Beijing... a huge, long roll of paper with several records per inch for all of those many thousands of miles. Are you really going to crawl on your knees from San Francisco to New York scanning a few records on that printout every inch of the way? Be honest with yourself and admit you'd give up after a few blocks of crawling on your knees and would never even make it to the on-ramp of the Bay Bridge to Oakland, let alone get past Oakland, the Livermore Valley or into the central valley of California. And then you'd still have a few years of crawling on your knees to get to New York, not much progress toward Beijing.

Even if you didn't crawl every inch of the way, just think how long it would take you to walk the land part of that and to row the ocean part. That is what a big table is. You might think, "well, I'm not going to crawl on my hands and knees to look at every inch of that printout. I'll just sample a bit here and there," thinking you can get your head around that by looking at a few feet of printout in San Francisco, a few feet near a favorite barbecuerestaurant in Kansas City, a few feet in the industrial swamps near Newark, and then a few feet of printout in Dusseldorf, Minsk, Rostov, Alma Ata, Tashkent and so on. But that is just crazy... what about all those records in the many, many thousands of kilometers in between?

But that's not what is the forefront of somebody's mind who has looked at a few dozen screens and thinks he understand the big table. The beginner then goes on to write code that assumes it is ordered and the results might even look OK. If the results are also a big table, you might never know you have trash mixed in, because you won't be able to tell by browsing that trash is in there. And, if you make the initial conceptual mistake of thinking a big table is ordered, despite the frequently-repeated admonitions of DBMS masters that assuming it to be ordered is a mistake, well, you will never write the queries to determine if there is trash mixed in.

---

Shifting gears about away from big tables: Suppose you have a table window showing a selected set of records that is small enough to browse. Let's call that a "small table," meaning it is small enough so that user interface controls like scroll bars make sense, you can browse realistically with page up / down and so on. With small tables you can indeed get your head around the table by browsing it using casual tools.

That is where our experience as GIS people works against us in this new world of big tables. We have been trained by our experience with small tables that things like scroll bars should always make sense. Manifold has helped train people into expecting that with fine, convenient tools in Release 8 that really do make life very simple and convenient, so long as you don't notice the implied deal with the devil that they assume the tables involved are "small tables."

It's like people who have been trained within simple, small scale economies to always use cash. They have tools to help them, like wallets that can carry different denominations of bills, strong boxes and safes, and even tools like money-counting machines where if you need to count out ten thousand in twenties you just set up the machine, push a button and it counts out the requisite number of bills. Easy.

And then one day something changes, like you become a Google billionaire or you are an ordinary person in a country where hyperinflation hits. Suddenly, instead of dealing with hundreds or thousands you must deal with billions. You discover that to finance that new space exploration company you've wanted it is not really physically realistic to be shipping around warehouses full of paper money or in the hyperinflation case to go shopping for a loaf of bread with a wheelbarrow full of stacks of paper cash.

The solution is electronic cash, which is not comforting to those who have been trained to not trust what they don't have "filled" into their strongbox or wallet in the form of tangible bills they can touch, just like people getting hung up on what an interactive interface can show them in the form of tangible records that have filled a particular window. But just like it is not realistic for a hyper-rich person to personally count out a billion dollars for his next hyper-yacht, it makes no sense to think you can browse a billion record table with interface tools, like scroll bars, that will work beautifully at the smaller scales for which they are designed.

artlembo


3,400 post(s)
#30-Oct-17 13:51

thanks for the detail. I agree with you on most points, but I do have two caveats:

1. The case of viewing a small subset. My concerns was not with big tables per se, but rather with the video example I gave where I lassoed a bunch of areas in the southern part of a location and wanted to explore what those things were. It could not be done since those area features were beyond the 50,000 limit. So, it would be very frustrating to be in a meeting, have the big boss say,

hey, what's with all those areas down there? What are the flood heights and the value of the properties

and not be able to select those records and browse them.

So, I agree with you that it is insane to scroll through a table of millions of records - but in this case, I've found my smaller number of oddities, and want to now explore what is going on with them. In the current implementation, this cannot be done.

2. The case of randomness. I work with data in the billions. I'm not at the trillions yet :-) But, for geographic data collection, you better believe it is ordered. Lidar data is ordered from where the sensor starts, to where it ends. Soil data is ordered from one corner of the county to the other. Census data is ordered from one State to the next. They are not random.

I agree with you that when you have really large data sets, a sample is the best way to make an inference to the population. This, however, assumes the data is randomly distributed. Now, calls coming in to a 911 center are generally random if you were to want to ask about what percentage of callers to the 911 center are male vs. female, black vs. white. But, even that is a biased sample because your calls start at day 1, and continue to day N. We make an assumption that the characteristics of day 1 should be similar to day 100 - that isn't necessarily true, and it isn't necessarily sound from an inferential statistics standpoint. That is why systematic sampling is often done. I know a lot of your colleagues have math degrees, so you might want to run what I'm saying by them to see if they agree. I say that because while I understand the statistics, there may be something I'm missing in terms of big data tables.

BTW, all of this inferential stuff I'm talking about is fascinating in that it works (assuming you have a random sample). For those interested, I just popped up a video from my course Statistical Problem Solving in Geography that discusses the Central Limit Theorem. It is my favorite lecture to give in class, because it truly is magic. It works. It always works, even if your data is skewed. But, it really does require that you randomly select the data, and not have bias in the selection. And, when you don't have that randomness, then it doesn't work.

Simply stated, if you are going to only return a subset of a table and make an inference of the entire table's population, then you better be sure that you've removed bias from the selection, and I think the examples I showed earlier would indicate that the subset most likely will not be an acceptable random distribution of the population.

For anyone who looked at the central limit theorem video above, and you are interested in the entire statistics course, you can find it here.

adamw


10,447 post(s)
#30-Oct-17 14:29

In the current implementation, this cannot be done.

In 163.7: open the table -> establish a filter -> see the fill record in the results -> right click the fill record and select Filter using Query (or invoke View - Filter - Filter using Query) -> press F5 to run the query, this applies the filter to the whole table and not just to the portion in the first table window.

artlembo


3,400 post(s)
#30-Oct-17 16:52

thanks, Adam. That works fine. Only one follow up:

It is nice to have the right-click the select Filter using Query. However, if this is something I'm going to do on a daily basis, it might get cumbersome to have a query component pop up, and now I have to run the query. This adds two additional windows (tabs). I wonder if for this situation it might be possible to do it all with a single right-click. That way the query component doesn't pop up, and then I don't have to hover to the run button to run it.

That said, having the query is nice because it starts with SELECT *, but it is really useful to change * to something else, or even add another table:

SELECT * FROM CALL Selection([flood]TRUEAS A, [parcelsdraw]

WHERE GeomTouches(a.geometry, [parcelsdraw].geometry, 1);

so, that is really nice.

Dimitri


7,413 post(s)
#30-Oct-17 15:46

but in this case, I've found my smaller number of oddities, and want to now explore what is going on with them. In the current implementation, this cannot be done.

I think we are in close agreement on most things, but there is room for exploration in that interesting area of overlap between table window habits we would like to keep and the new world of big data where our prior habits may work against us. Consider the quote above. You didn't lasso that bunch of areas using the table window. You lasso'd them using a drawing window. So there is an example where for bigger data the table window is not as good as other interfaces.

I don't think there is any disagreement that once - through other means - you have located some subset that is small enough for a "small table" user interface to be productive that you should be able to use that small table interface. I'm just saying that when tables are big tables and thus, by definition, cannot be comprehended by casual browsing, well, then in that case to reduce down to the smaller subset you use the right tools for that.

I'm also not suggesting that table windows of any kind will allow you to get your head around big tables which, by definition, are so big they cannot be effectively browsed. I'm not suggesting that pulling 50,000 records suddenly makes that table window a great interface, by proxy, for getting a handle on the big table. If you have a big table you need analytic means that are not the same as user interfaces, such as scroll bars, which are just the ticket to get your head around smaller data.

for geographic data collection, you better believe it is ordered. Lidar data is ordered from where the sensor starts, to where it ends. Soil data is ordered from one corner of the county to the other. Census data is ordered from one State to the next. They are not random.

If you bear with me for just a few dozen more pages, let me riff a bit on order as it might apply to tables in databases and table windows that claim to show records from those tables.

Well, first, I don't say records as they are stored in DBMS are random in how they are listed. I don't use that word casually. I said records were unordered. The difference is not one of pedantic terminology: I note that data can easily be unordered in the sense it is not in increasing or descreasing order numerically or lexicographically and thus be non-random but still be outside the reach of some obvious ordering rule. If the data is in the table ordered by some rule that is difficult for you to discover, such as the order of vertex geometry implied by points arranged in a spiral or circle, well, for the sake of interactive table browsing it is unordered.

Be that as it may, whatever you call "ordered" I respectfully disagree that all geographic data is ordered. While I certainly agree it often is, even in the case of simple geographic data that appears to be ordered it is often the case that it is only ordered in some places but not in others, or that the apparent order looking at a portion of it breaks down as you see the entire data set is a patchwork, a mosaic, of data acquired at different times that was later blended together.

Consider a buildings layer for a city, where a sweat shop somewhere digitized the buildings based on a combination of aerial photos and archival property records. They will probably have rectangular regions where the buildings were digitized in some sort of pattern like left to right starting with the top left corner, but overall you cannot count on a table full of buildings being ordered from beginning to end as some sort of regular pattern.

Consider soil data for a county. Given the irregular shapes of many counties, what, exactly is an order that consists of going from "one corner of the count to the other"? There are plenty of geographic regions where it is not so easy to discern any "corners" or that there are so many "corners" you can't just say "one corner" to mean a starting point. Where's the "one corner" of France, for example?

Because of irregular patterns in the geography (states and countries have highly irregular borders, for example) despite a loose, arm-waving description of a regular pattern as starting in one corner and then working down to the rest, that doesn't really describe an order for records that can be exploited, say, using a script.

Tell me soil data for counties is "ordered" from one corner of a county to the other and I'll find you dozens of counties where no two people will be able to agree what that "one corner" is or be able to detect in the table of records some sort of order from "one corner to the other."

That matters because the lack of such order means that even from the very beginning, even before records are loaded into a DBMS, the data is not ordered. There can be sudden discontinuities in what might otherwise be a regular progression of, say, X or Y coordinates. For that matter, if you talk about LiDAR data you will have discontinuities from one raster sensed row to the next, that is, a fairly regular delta in X from one point to the next until the next row starts, at which record the X value discontinuously jumps back to a lower value. At that next point also, the heights are likely to be discontinuous as well.

If you teleport into a table to see one screen full of records, looking at that boundary you're going to say, well, maybe there is some order here but it is not a regular progression...

But all that is within the original format. However "ordered" or unordered the data was in the original format, the moment you load it into a DBMS you can no longer assume it has that original order. When a DBMS like Oracle says that records are not stored in ordered fashion in a table they are telling the truth.

Suppose you had a LiDAR data set that originally had some sort of order. For example, suppose it was a rectangular region that perfectly aligns with X and Y axes so that it has a regular raster order where from left to right to the bounds of the box it is the same Y and monotonically increasing X. Then comes another line, with a regular interval added to Y and a repeating, monotonically increasing X, and so on.

Setting aside that such things don't happen in real life (there is irregularity from oblique aspect, that is, a rotation which does not align precisely to X and Y geography, plus effects from projections used), the moment you import that data into, say, Oracle when you list the table of records, one record for each point, you cannot expect to have the points come out in the same order. Plot them using their geometry and yes, they will fall into position. But list them as records in a table and no, they cannot be assumed to have the same order. That's true of every big-time DBMS that has the muscle to handle things like LiDAR data.

I agree with you that when you have really large data sets, a sample is the best way to make an inference to the population.

Here is where I must emphasize I am not saying that interactive windows are an effective way to browse big tables because big tables are by definition so big you cannot gain comprehension by interactively browsing them. Whether or not the right tools (not interactive windows) can be used to create subsets of big tables which thereafter can be used as proxies to explore in different ways, such as making an inference to a population, is a very different question.

My own view is that if you know how to use statistical tools (which, to emphasize once more, are *not* interactive browsing tables but are other tools) to construct such as representative population, you might be able to get it down to a small enough size where as a second step the use of browsing windows might be productive. I write "might" because more likely you'll just continue using whatever tools you were using to make that subset to refine the population further.

Simply stated, if you are going to only return a subset of a table and make an inference of the entire table's population,

100% agree, which is why I say that using interactive browsing windows to try to get your head around big tables is a lost cause.

People insist on using that paradigm because they have become so used to it from small tables that they demand to use it even in settings when trying to use it is crazy. The question before us now is how that interactive tool can provide at least some utility even when launched in wildly inappropriate settings. Right now, you can open a table window for any table - what should Manifold do when people open table windows in settings where it is insane to do so?

An example might be opening a table window for a Google tileset for the entire world, not realizing that if you really do want that "bottomless" table it will your network connection many years to deliver those petabytes of data to you, even if Google were willing to do that and even if you had the storage capacity on your computer. So what should Manifold do for such folks?

For example, should Manifold just start loading the table using whatever order tiles come in from Google, faithfully growing the table as tiles come in, even if that takes decades? Should the scroll bars be eliminated since obviously they can only be moved in quanta of a single pixel, (probably representing billions of records per pixel... not exactly a good use of a scroll bar to get to what you want to see plus or minus a billion screens?

The current approach is simple: gross it up to whatever is visible on screen and take the first 50,000 records Google provides. Those are still far more than whatever somebody can use interactively, so nothing is lost.

People then immediately say, "well, suppose I select... or use a query... or " whatever, but they are not talking interactive means, they are talking controls other than point and click and browsing. Even things like filters depend on browsing to the extent that if you can't scroll the screen to see a record value of interest, you're not entering it in text form into the filter. That's the whole point of the filter, a point which makes it a superb tool for small tables that can be browsed, but not as good a tool as a query (however you get to that query) for big tables.

pslinder1
228 post(s)
#27-Oct-17 16:02

Could you handle this dynamically? Leave it as it is now as a default but allow users in table to toggle on the bottomless table feature if they want it.

adamw


10,447 post(s)
#28-Oct-17 10:09

We could do this, it's a question of where best to put effort. That's why we are interested in the specific scenarios and how they apply to what we already have / how they would work with bottomless tables (or whatever else, really).

Dimitri


7,413 post(s)
#27-Oct-17 09:14

[deleted... cross posted and now unnecessary]

tjhb
10,094 post(s)
#27-Oct-17 09:34

I would like to see what you would have said, Dimitri, all the same.

Purpose is really important. (That almost sounds silly.)

I think there is serious goal displacement going on here, which is close cousin to death by entropy.

Dimitri


7,413 post(s)
#30-Oct-17 11:51

To continue in something of a fork from my other lengthy post today (in the thread a few posts up from here), I think it boils down to managing the intersection between two very different worlds that are now overlapping.

The first world, where casual browsing of tables in a window can tell you something and be a highly effective user interface, is for the most part the world of our experience with GIS, where to date tables have been small enough for that small-table, casual browsing, user interface to be a very good thing. We're not stupid people so we've absolutely gotten used to the insight and convenience things like the table window constellation of capabilities in Release 8 provides. We want to be able to continue having that convenience in the work we do now.

The other world is that of big tables and DBMS. People in that world know perfectly well that for what they want to do, whether it is to gain insight by constructing some "representative" or "random" sample, or manipulate tables or do other analytics, well, all that is done using the hyper-powerful tools that queries provide. Things like PGadmin provide a limit of 5000 records in their interactive table windows for exactly that reason. Nobody in their right minds in the DBMS world expects to use scroll bars to navigate around the big tables with which they are comfortable. Heck, their results tables are so big they don't expect to use scroll bars there, either.

Those big table, DBMS people usually want to see more tools that work with big tables. They want more tools that are like the Command Window with enhanced ability to construct and launch queries. For them, SQL is the weapon of choice and they want to see new methods that let them pull that weapon from the scabbard even faster to cut through whatever task they confront.

None of that makes the desire for "personal database" style tools that are casual, interactive tools like Release 8's table window provides a bad thing. That desire is a very good thing, because those tools really are very useful when applied at the scales for which they were designed.

What puzzles those of us who have been trained to think those tools are the way to go all the time is that our world has changed. Instead of working with many thousands or at most a million or so records we now routinely encounter data sets with billions or even trillions of records. The tools we expect to use all the time now can mislead us instead of helping us. They need some adjustment, to say the least.

All this talk about working with "bottomless" tables is somewhat off the mark. Suppose Manifold just told you, "hey, you're always working with a bottomless table." But, for the sake of convenience and speed offered a switch with every command, filter, etc., in a table window that was labeled "quick view" or "apply to all" or perhaps "draft sample" or "all records"? Add a few tricks like a Ctrl-End jumping to some random disordered part of the table ("End" has no meaning when tables have no order...), and you'd never know if a table window was "filled" with a subset or with the whole, bottomless table.

The real thing that is going on is a decision whether you want to apply some command to just some sample of the data that is small enough to browse interactively, or to all of it. If you have that choice, it doesn't matter what you think you are seeing when you look at what, in effect, is just a few feet of paper printout that could be as long as your local city is wide, or as long as crawling on your knees from the North Pole to the Southern tip of South America.

Could Manifold help the situation by having two different user interfaces? Suppose Manifold banished the term "table window" and instead had two other terms, one for the rigorous case and one for a casual case?

Suppose the rigorous case was called the Table Console. It's like a Command window with a few lines of records for some simple previews, aimed at providing some additional assistance for query building. But clearly the focus would be on using queries and the vast power of SQL to cut through whatever task was involved using big tables. Having such a console would make it clear that any confusion about nomenclature or preference for user interface does not mean you in any way give up the ability to have total mastery over every record in every table. If bottomless is what you want, bottomless is what you have at your fingertips.

The other case could be called the Table Browser or some other term giving the connotation of casualness and ad hoc viewing or dipping into something to take a look. That would be more like the current table window, an interactive instrument that is fine for looking at all of small tables or fine for looking at parts of big tables.

Those of us who work with small tables and prefer to have a casual interface could use that, just like we have become used to doing in GIS given the smaller data that has characterized GIS in the past. As we encounter bigger tables we might continue to use that casual interface, but taking advantage of tools like the ability to generate queries that take it a step beyond, albeit at the cost of stepping away from super casual, interactive workflow. But that would be a transition to learning about using more powerful tools, tools designed to facilitate cutting through all of bottomless tables, like the Table Console.

I'm just thinking out loud here because I get the strong sense of what is going on is a collision between two worlds and one of the things we can all help out on is how to manage that collision so the result is constructive and an opening of doors to better ways for both worlds.

bclement
275 post(s)
#30-Oct-17 15:38

Dimitri,

I appreciate the fact that you folks have been so forthright in explaining your thinking and the necessity of moving into the “real” world of big data. For the manager of a small county GIS, it has been enlightening to say the least. I believe that I understand that tradeoffs are necessary. But I think what you are getting here is pushback for what we (GIS folks) see as more of a sell-off than a tradeoff.

To make sure I wasn’t completely off the mark, I talked to my staff and I thought about the matter of traditional GIS vs Big Data carefully and I think the question of my most recent hire is the one most GIS folks are asking. He asked, “If I have all of the roads shown for the US, and I want to look at attributes of the roads in Utah but the table only has records for the roads in New York, why even have the software?”

I believe (and please don’t let me erroneously speak for others here) that the magic of GIS is data visualization. If we can’t visualize the data (i.e. if we can't trust that what we are looking at drawing-wise on the screen is what we see in the table), then we are no longer GIS practitioners. That is why I suggested that you synchronize what we see in the drawing with what we see in the table with some type of selection or tiling scheme. I hear you loud and clear that there are big data sets out there and that you cannot show them to us all at once. And to that, I answer, if you cannot show me all the roads in the US in a table, then don’t show me those roads in the drawing because the ability to visually query that table is what makes me a GIS person.

I mean no malice in the above. I have met both you and Adam personally over the years and I have nothing but respect for both of you. But I do not believe it to be an over statement that the idea of what I see in the drawing not necessarily being in the table, makes me panic. Maybe I am over reacting. If so, please show me where I am wrong.

Sincerely,

Ben

adamw


10,447 post(s)
#30-Oct-17 16:06

He asked, “If I have all of the roads shown for the US, and I want to look at attributes of the roads in Utah but the table only has records for the roads in New York, why even have the software?”

Let's go through this specific example.

(Just in case, please don't take any of my posts as "here, told you that what we did is the best way", I am just trying to walk through the scenario every time looking for things that seem bad, exploring as much as every other reader.)

We open a table and the table is big. If we see no records for Utah to click on, we are probably looking for a place to write the word "Utah" and ask the software to search for it. There are several such places - we can use the Select pane (the Equals template) or, say, write a SELECT query and put "Utah" into the WHERE section. Say, we use the Select pane. We select the records, then filter the table window to show just the selection. The table window shows only a couple of records for Utah or none at all plus it shows the fill record. The fill record is a signal that the table contains more records that might be matching the filter that we don't see. In 163.7 we can right-click that record and select Filter using Query. This will pop up a command window with the query that filters for Utah. We run the query by clicking F5 and see the records for Utah.

That's the current workflow. The only bridge between the "normal" tables and the "big" tables is in right-clicking the fill record and choosing to Filter using Query. We don't do this automatically because this can take significant time and we want the user to confirm that he wants to spend that time before spending it. Is this acceptable or does it break the workflow / expectations / whatever too much?

Mike Pelletier

2,122 post(s)
#30-Oct-17 16:19

The fill record indicator probably needs to be more apparent as in that can be easily forgotten when you don't work on big data often. In Arcmap, they tell you at the bottom of the table the number of records provided vs the total records in the table. Perhaps that is doable or maybe just say " 50,000 out of * records" if the total isn't known.

adamw


10,447 post(s)
#30-Oct-17 16:39

Yes, that's something we are considering. Making it clear when the results are partial in this and other ways / allowing getting to full results from where the indicators are.

Dimitri


7,413 post(s)
#30-Oct-17 19:56

The fill record indicator probably needs to be more apparent

Good idea. If table windows were docked all the time I'd say put a "fill record" button in the toolbar that is enabled. But when table windows are undocked they don't have a toolbar as part of the window, which helps provide an uncluttered window. I don't know about adding info to the title caption or a context menu or status (fill record) icon in the upper left corner where record and field handles intersect.

bclement
275 post(s)
#30-Oct-17 17:22

Adam,

I appreciate the fact that you are spending so much time trying to communicate on this feature of the software. I find that I am not able to communicate off the top of my head why I feel the need so strongly to have the drawing and the table synchronized. I think the reason is that we have a “strange loop” situation here where I keep taking off in what I think is a different direction but keep winding up at the same starting point. I believe that the reason is that I can’t explain what I don’t know.

There have been several times in my career where I was able to find something surprising to me at the time of discovery about data (usually my own data which I should know well) because I was able to make some selections in the table or in the drawing and “see” something in the other window that caught my attention. That is what I call “visualizing” the data. I cannot explain it because as soon as I do, you may counter that if I wanted that data, all I had to do was to ask for it. As in my poor example above. It is painfully clear that if I want to see the Utah roads, I just need to ask for them. And the same will be the case for any example that I use regardless of the example because as soon as I define “it”, the answer will be a simple query which is even within my poor capabilities.

So my real question is what do I do when I don’t know the question? To which you will likely respond, “How are we to create software when you don’t know what you want?” The only answer I have is that it seemed that the thing that allowed me to find those gems in the past was the fact that the drawing and table were synchronized. I do realize that I am now painting some of my work as art and not science. Frankly, I am not sure how I feel about that!

Dimitri


7,413 post(s)
#30-Oct-17 20:37

the thing that allowed me to find those gems in the past was the fact that the drawing and table were synchronized. I do realize that I am now painting some of my work as art and not science. Frankly, I am not sure how I feel about that!

You should feel good about that! I don't know about everybody else but that GIS is often very beautiful, real art, for me is part of the fun.

Now, about that part how the drawing and table being synchronized is helpful for discovery. I agree totally, but with the caveat that often they only seem to be synchronized, and what helps discovery is when the program helps you along by doing summaries.

Consider a data set with all the roads in the US as lines. Each road is a record that has attributes and a geom that gives the geometry of that road line. Can we say the drawing is really synchronized with the table? Well, sometimes yes and sometimes no, depending on what we mean by "synchronized."

Open the drawing and zoom to fit. Looking at the whole US full of roads you see a pattern of dark and light regions as pixels for millions of lines are averaged together, with hardly anywhere that what might be a line can be discerned. Open the table in a table window. You see 40 or 50 records. So right there there is a connection but there isn't visual synchronization in that the table window shows everything the drawing window does.

Select all of the roads in Utah. Are you going to do that within the drawing window? You could sort of average it out and select with a box based on partially transparent layering using state boundary lines or Bing or whatever as a guide, but to do that right you'd probably use the Select pane or some other tool, not trying to do it freehand in either the drawing window or the table window, because there are just too many roads and to repeatedly zoom in / out to select only those that go right up to the edge of Utah interactively, etc., is too much of a pain. You could do it with filters and launch the query from the fill record icon, like Adam discussed, but it would be way easier / quicker just to use the Select pane.

You're not going to find anybody who agrees with you more than me on the insights you can get by going back and forth between visual displays and row-and-column displays. You are totally right about that and I strongly agree. But you only get those insights so long as you retain the visual agility where when you flick back and forth between what you've distilled into the visual display and the analogous table window display is digestible through casual browsing. When what you see in a table window becomes so overwhelmingly huge that you can't digest it or browse it fast enough and enough of it to correlate it with what is going on in the visual window, that smooth insight conveyor belt breaks down.

What I'm recommending is a workflow that is a hybrid work flow where you use the heavy lifting tools to cut the task down to size where your intelligence and insight can go to work with agile browsing tools. I guess it is like exploring a big archaeological site or fossil site: if it is partly buried under half a mountain of overburden you don't try to remove a hundred thousand tons of rock and dirt with toothbrushes and dental picks. Get out the bulldozers and get it down to where you can use more agile, human-scale tools like trowels and dental picks for delicate interactive discovery.

Can the transition from huge data and the tools that work for that into human-sized, interactively browseable data and interactive tools like scroll bars and ad hoc table window viewers be made easier, in both directions? Sure. For example like the idea that if a selection in a drawing window results in, say, 50,000 or fewer records (whatever you deem to be the reasonable limit of interactive tools) you can launch a table window with all the infrastructure already pre-done to show you those records, that is, without the workflow Adam mentioned of seeing the fill record and launching a query on that. That's an example of how the connection between the convenience factor we all love in casual interactive work can be more tightly coupled with a selection made in a drawing window that just happens to be mostly outside what would be filled into the table by default.

I'm just thinking out loud here. But just as it makes sense to show a US full of roads as light and dark areas when the whole US is zoomed to fit, an abstraction, so to speak, that helps us gain insight even if it does not attempt to show every line as a distinct object (in a case where visually, instinctively we understand that to be not right), just so it makes sense for interactive browsing purposes to abstract what a table window represents because that is more useful for the interactive browsing for which that interface is intended. In both cases when data is big the abstraction makes it more useful for human browsing.

But, just like US roads zoomed all the way out to show the US as a pattern of light and dark areas doesn't lose any roads - they're all still there ready to be seen when you zoom in - just so because we look at data in the table through a table window that abstracts huge data into a fill that makes sense for that table window interface doesn't mean we lose any roads in the table either. They're all still there just waiting for their chance to play.

bclement
275 post(s)
#30-Oct-17 22:53

Thank you for the hybrid workflow suggestion. That helps.

dyalsjas
157 post(s)
#31-Oct-17 01:44

I'm beginning to wrap may head around the Radian Studio / Manifold Future focus on big data. I've realized that I have a lot of assumptions to reconsider as I move forward. I've read (avidly) the discussions on the boards about bottomless tables, representative sets, and the number of lines in a table that might be effectively browsed. The need to step back from years of GIS assumptions is challenging. The recognition that table and drawing selections are not automatically synchronized and that I'll need to make a deliberate choice to reflect a selection between table and drawing is a bit disconcerting. In that I prefer a graphic interface, perhaps a right click option in the project pane when I have both a drawing and it's associated table selected that lets me link selections between the two. Other options to link data entities in the project window might simplify discovery of options... e.g. a right click/new query selection when two tables are highlighted in the project pane might bring up the query window with some single table query options grayed out.

I did appreciate the directory tree like association of related elements in the project pane I found in prior Manifold versions.

Right now, Manifold Future hasn't moved to the status of "Geographic Information System" for me; it's still the Radian Studio big data analysis system with Manifold Future improved geographic drawing capabilities. I'm ready for the geoGRAPHIC data discovery capability that will be provided by the powerful multi threaded, gpu enabled Manifold GIS to come.

On a separate note, while it's still in draft with the OGC, it strikes me that adding support for Discrete Global Grid systems might be another way to move ahead with Manifold, (but it's probably too early).

adamw


10,447 post(s)
#31-Oct-17 05:47

The recognition that table and drawing selections are not automatically synchronized ...

If we are talking about selections that are painted in red color, respond to Copy / Paste, etc, then these are synchronized for all components based on the same table / the table itself as of 163.6.

The subthread here talks about something else (the table window limiting the number of records it reads from the table and performing several new operations on these records instead of on the whole table by default).

dyalsjas
157 post(s)
#31-Oct-17 16:38

I misread the topic, and was also encountering an unexpected result. I'll describe it, but I'm not sure that it can be replicated.

I have been working with a Radian Map file (made with the release version of Radian Studio) and have tried the various updated functionalities as they're described in the Cutting Edge threads. I had previously imported a shapefile of points from our county GIS.

Selections made in the drawing did not link to the associated table. I scrolled through all records and used the new Filter function, but had no luck.

After your posting specifying that selections should synchronize between associated components, I deleted the table and drawing, saved, reimported the data and tried a selection again.

The points selected in the drawing synchronized to the associated table. All worked as I expected, I am happier now.

Obviously I'll need to make a point to try working on a clean data set if I'm getting unexpected results.

Dimitri


7,413 post(s)
#31-Oct-17 17:07

No worries. There is a new video on shared selections, just published today at https://www.youtube.com/watch?v=fkrKvDK78Uk

adamw


10,447 post(s)
#31-Oct-17 06:11

There have been several times in my career where I was able to find something surprising to me at the time of discovery about data (usually my own data which I should know well) because I was able to make some selections in the table or in the drawing and “see” something in the other window that caught my attention.

This is only a bit more difficult now than it was in the past and the difference is largely unrelated to bottomless vs non-bottomless tables.

Here, interactions between an opened drawing window and table window:

1. Just to get this out of the way - when you select records in the table, the drawing objects highlight / unhighlight automatically. No changes from 8 here.

2. When you select records in the drawing, the table records also highlight / unhighlight automatically. No changes from 8 so far. But:

3. When you select records in the drawing and the table window is set to display just the selected records, the table window does not refill records automatically. This is where the change is. That the table window can potentially not display all of the records in the selection is largely irrelevant, you take care of this by invoking View - Filter - Filter using Query once and using the new window instead of the former table window. The change from 8 is that after you select some objects in the drawing window, you have to switch to the table window and invoke View - Run to re-run the query, or press F5 to the same effect. That's the change and it's unrelated to bottomless vs non-bottomless tables, it's a requirement to press a button to refresh the table window after changes to the selection.

Now, why the change is there and could it be removed. The change is there because we operate with potentially much bigger tables from many more different data sources than 8 does and so we have to be more careful with launching automatic refreshes because they can take unpredictable time. Still, we *can* add some button to auto-refresh the table window on changes to selection or even on changes to table data which affect filters. As long as it is not turned on automatically, it's perhaps fine, if you turn it on and it's doing too many refreshes which are too slow, you can just turn it off and go back to manual refreshes. Maybe we should do this. There will be a few quirks we don't quite like, like the user editing a value in a record and this record immediately vanishing because the new value does not pass the filter, but, well, we can think of something (ie, in this case, display a note in the status bar that the edit has been accepted).

And one more thing:

Maybe we should have a table window filter for "objects displayed in window X". That way you wouldn't need to select anything in the drawing and can just zoom around.

Dimitri


7,413 post(s)
#30-Oct-17 16:44

He asked, “If I have all of the roads shown for the US, and I want to look at attributes of the roads in Utah but the table only has records for the roads in New York, why even have the software?”

Let me add to what Adam has commented. The right answer to the above comment is:

"I am puzzled you would ask that question because the scenario you describe is not Future. Perhaps I did not explain the matter well, so let me try using different words, perhaps repeating a bit to make sure nothing is overlooked.

If you have a GIS data set for all the roads in the US, say, as a shapefile, and you import that or link it into Future, then you have all the roads in the US in the table that is created in your Future project. Period. You don't get some roads but not others. You get all of the roads. They're all in that table.

All data in Future is stored in a table. There is no such thing as a drawing that has data which is not in the drawing's table. A drawing contains no data at all. It is just a viewport that shows geometric data from a table.

If you can see roads in Utah in your drawing, as well as roads from New York and other states, then you know for sure that the table has roads for Utah as well as roads from New York. There is no need to worry that Utah roads are not in the table. You can be totally sure all of those roads are in the table, because if they were not in the table they could not be shown in the drawing.

If you doubt my word on the above, fire up Command Window and launch an SQL query. Say your table is called Roads Table and it has a field called State that contains the name of the state in which each road is located. You would write a simple query like:

SELECT * FROM [Roads Table] WHERE [State] = 'Utah';

And you'll see that yes, indeed, it does return all the records for Utah roads from that table, complete with all attributes. The Utah roads are there anytime you want them, complete with all attributes. Just say the word and Future will snap to attention and fetch them for you."

I hear you loud and clear that there are big data sets out there and that you cannot show them to us all at once.

What? I never said that. Future is happy to show them to you. What I said is that human people cannot digest billions of records via interactive browsing, so it is unwise to use the wrong interface.

Let's not beat around the bush. If you have a big table that takes a paper printout from Utah to New York to print out, Future is happy to feed that to you so fast your eyeballs will melt. In fact, Future is happy to show you that table even if it is so big it reaches from Utah to Uzbekistan. I never said Future cannot show it to you at once. What I said was that you won't be able to drink from that fire hose.

Interactive browsing of a sheet of paper that stretches from Utah to New York is a lousy interface for humans. You need more sophisticated tools. Which is why Future provides them. Our task is weaning people off the idea of using dune buggy interfaces to operate a cargo plane. Dune buggies are fun. I'm not putting them down. But what is super for a dune buggy is pretty darned useless for a transcontinental cargo plane.

if you cannot show me all the roads in the US in a table, then don’t show me those roads in the drawing because the ability to visually query that table is what makes me a GIS person.

The ability to visually query the table is not the same as being shown all the roads in the US in a table. If you want to visually query a table you don't get anywhere if you have a million records in front of you. I'll prove my point with a thought experiment:

if you cannot show me all the roads in the US in a table

I presume you mean, more accurately, "if you cannot show me all the roads in the US in a table window."

No problem! Happy to do so... are you ready for that? Download and launch the production Radian release and let it rip: One screen's worth of table is about 40 records. Say your screen is one foot high. All the roads in the US add up to over a million records. Say it is just a million. One million divided by 40 is 25000. That's 25000 screens. 25000 screens at one foot per screen is, roughly, five miles worth of screens.

I don't sense any malice in the notion that you are asking to see all roads in the US in a table window. I see, instead, a disconnect in the conversation since I know perfectly well you do not really intend to page through five miles of computer monitor screens one screen at a time. You may have said it, but I don't believe that once you think it through in those terms that you really want to be shown all the roads in the US in a table window.

Instead, what you probably will conclude you want is that you want to see only those screens, a few at a time, that interest you. That's a big difference, because now we can get to the interesting part of the conversation: just what is it that interests you, and can you see what you want to see? That's the key, not being fed a fire hose full of data you can't drink from and which you don't care about anyway.

When a man is dying of thirst and really needs a drink of water you do him no favors by sticking a full-power fire hose in his mouth and turning it on full. That will just kill him. What he wants is a glass of water that he can drink as he pleases.

There would be no surer way of preventing you from being able to visually query a table than to show you all the roads in the US in that table using a table window interface that presents you with miles and miles of records to scroll through. That's millions of roads. If you really are interested in seeing all of the roads at once, a table window is not the right user interface. The drawing is better for that, and you'll need a monitor the size of a very large wall to be able to distinguish one road from another at that, if you really want to see all the roads together at the same time as separate entities.

I make that point to emphasize that even with drawing windows you don't really see all the roads in the US at the same time. You see an impression and then as you zoom further and further in you start seeing separate roads, a limited number of roads on each screen when they finally resolve into separate lines.

What you want is to see those roads that interest you. Having the tools to show you those roads is a good thing. Don't get hung up on how it is done. Instead, focus on mastering the tools that let you get done what you want.

When a man dying of thirst needs a drink you are doing the right thing to keep his shaking hands off the fire hose, saying "whoa! partner... the fire hose may seem like a good idea but this is better..." and putting a glass of water in his hands. If that glass can be filled from a limitless reservoir so he can drink however much water he likes, that's not a limitation. It is a better delivery system for what he wants.

Nothing at all about Future prevents you from working with ALL of the roads in that table. You have every danged record at your fingertips. The only question is what are the best methods to accomplish what you want in the fastest and most efficient way. All I'm saying is that to get the best results with the least effort from bigger data you can do better by taking advantage of tools that people with a lot of experience working with bigger data have evolved to make the slicing and dicing of that data much easier. We didn't invent tools like queries, but we respect the immense expertise that went into evolving them and we can see how bringing such power and convenience to GIS people can help us all take our game to a higher level with bigger data.

bclement
275 post(s)
#30-Oct-17 21:43

Dimitri

I think we are now reduced to arguing semantics. I sense the frustration.

There are still some facts to discuss. Fact number one from my point of view is that I never asked for death by fire hose. Your buggy analogy cuts both ways. I am a dune buggy driver. Altimeters and air speed indicators have no impact on my ability to drive my buggy. What I asked for rather succinctly was that the drawing and the data shown in the table window be somehow synchronized for us dune buggy drivers. The answer I keep hearing is that those who want this are using the wrong tool and perhaps this is the case and I just don’t want to admit it. Maybe MF is the wrong tool for me which makes me sad because I know there are no updates coming for Mfd8.

Fact number two. I do not have big data nor is there any in my foreseeable future. Further, there is no humanly possible way for people to do the things that I am regularly asked to do with my data with “Big Data.” I for one am not convinced that traditional GIS is directly applicable to Big Data. I get from your mantra of big data that you see a world beyond what I can see from where I sit. I trust that the direction of MF is designed to get me there. I am grateful to Manifold for all that it has taught me in the past that I never would have learned from other software vendors who insist that I have and use training wheels. But by the same token, I am struggling to see what to do with the cargo plane.

I will even make a place for the possible fact that you can’t explain the significance of the cargo plane to me because of my buggy driver status. If that is the case, I have a copy of the cargo plane and I will wander around and see if I can make sense of it in the future.

Dimitri


7,413 post(s)
#31-Oct-17 09:07

I don't think this is about semantics at all.

I think it is about a choice of three things: when to use old habits that are very good at what they do, when to form new habits which are very good at new things, and how to get our heads around the transition between the two sets of habits in gray area situations where it might not be clear which set of habits or which *blend* between the two sets of habits is best.

Those are complex matters that are worth discussing at length from all angles. In this thread, along with other threads, all of us who participate are building a framework, are making decisions on directions and emphasis, that can have impact on our fellow users for years. However tedious we may think it to talk endlessly and at times in what seems to be in circles, it is worth it for us to talk things through many times, to measure many times, so that as we cut once, or twice or however times we cut, we can shape the thing for the better.

What I asked for rather succinctly was that the drawing and the data shown in the table window be somehow synchronized for us dune buggy drivers.

Well, that's a bit unfair, in that you are glossing over the hard part that drives the discussion. The hard phrase in the quotation above is "be somehow synchronized" - all of the essence of the matter is in that "somehow" word. That is what we are discussing. Exactly how should it be synchronized so that you can do what you want? If everybody knew exactly what they meant by "somehow" in all cases, including those they hadn't realized mattered, and if everybody agreed on that "somehow," we wouldn't be in this thread.

If it already is perfectly synchronized so already you have at your fingertips everything you want, but that is not obvious and easy to use, then we switch gears from building the machinery of synchronization to to building the machinery of teaching (documentation, videos, etc.,) so that if it really is a simple matter then everybody can see that and think "gosh... all this talk over such a simple matter that is so easy to apply..."

I realize it is asking a lot of people in a beta program to grab their kit and soldier forward into a brave new release for which documentation consists of a few terse paragraphs in release notes. It is easy to forget that the Future process is fundamentally an open beta. While interfaces are in flux and being adjusted based on user feedback there is not going to be the finished, massively detailed, step 1-2-3 ensemble of topics, worked examples and videos that has emerged for interfaces that are more committed to final form.

So that certainly has an effect. Also an important effect is that if the controls work with brilliant logic and power in a way that only 10% of the user base can get their heads around but a small adjustment would expand that to 95% of the user base, then that too has impact. If the synchronization is already there but it is not sufficiently simple to use, that needs to get adjusted.

It is an effect which should be resolved not by trying to train 85% of the user base into brilliant logic machines but by making the small adjustment that greatly broadens accessibility. We all have day jobs on which we focus and I, for one, am not too proud to admit that on many days I do not have the time to learn how to be a brilliant logic machine to use some new thing. There is nothing wrong - and everything right - with making software more accessible via a shorter, simpler learning curve. That's why we have beta builds, to uncover such opportunities.

The answer I keep hearing is that those who want this are using the wrong tool

I don't know from whom you are hearing that. It's certainly not me, since I've been repeating over and over that the tool provides massively deep, broad and powerful synchronization, which is perfectly easy to use if you just apply the few, simple clicks to use it.

It's like your example with the employee who wondered how he could ever see roads in Utah if the tool only allowed him to see roads in New York. I spent many words to write what could have been distilled into... "Of course Utah is there as well, at your fingertips. Didn't you read the instructions?"

Look, if you don't know a capability is there because the documentation is poor, that's fair game to ask for the training materials to be improved. I'm with you there. If you do not like how a capability is implemented because you think it is too cumbersome or not sufficiently obvious without reading documentation in a case where some adjustments could make it more automatic or obvious without needing to learn anything, that too is fair game. It is even fair game if you think it is OK but as a matter of taste you would prefer it to be different. Good taste is very important so it is critical nobody is every shy about saying "ah... it works fine but I think it would be better like this..."

It is a completely fair and productive thing to say "well, OK, maybe the synchronization is there but it is too hard to learn." That puts the focus on what needs to be done, which is to make the power of the tool easier to learn and to use. It puts the focus on where to strike the balance between interfaces which provide maximum power for experts, albeit at greater investment into learning, and those which provide necessary power for harried normal folks.

If a capability is there and super solid for 10% of users but is too hard for 85% of users, that is a very different discussion than if the capability were not there at all. So please put your focus on understanding how the capability is used today and how you would like it to be made easier for you to utilize.

Fact number two. I do not have big data nor is there any in my foreseeable future

Well, if that's true then why did you give the example of a table window that showed you all the roads in the US? When you ask for a table window to show you all the roads in the US, you are far, far beyond tooling around in a dune buggy. .... to echo the above quote....

I never asked for death by fire hose.

Sure you did. You asked for a table window to show you all of the roads in the US. That's a really big fire hose you asked for! That you didn't realize it meant "death" in the sense of being terrible for productivity, well, that's one reason the delivery system provides more productive controls.

If the data is too big for you to get your head around by browsing in a table window, it is big data. If you do not have big data, then you'll never see any difference between 8 and Future. If you see a difference then you know you have crossed over into working with big data.

Plenty of people, companies, universities, consortia and so on have worked with big data for decades before Manifold set out to give ordinary GIS users the ability to work with larger data sets. All of those folks evolved very useful mechanisms to deal with larger data sets and we would have been very foolish not to study carefully what had been evolved and to learn from it. Much of that stuff is with us today because real life experience shows that it works very well, so of course Radian would also utilize those proven ideas.

Part of the march of modern technology is now it is very easy for ordinary GIS people to be working with really big data without realizing it. Have you ever used an imageserver layer in a map? You're working with very big data. Doing LiDAR? Likewise. Working with any of the modern terrain elevation data sets whether from LiDAR or not? Same thing: big data. Asking your staff about a data set that includes all roads in the US? Hey, partner, that's big data, too.

There isn't some sort of alien world of big data out there beyond the stratosphere that only oligarchs with private space companies care about. If you are doing GIS today, that big data world is your world and Radian technology is far and away easier and better at that than Release 8.

Please trust me on this: the huge growth in the scale of routine GIS data is now part of your normal GIS life whether you like it or not, the size of that routine data will continue to increase, and the way to work with that increased scale of data will require all of us old dogs to learn new tricks. With luck and lots of joint effort we have a very good chance of re-cycling our dune buggy happiness into bringing agility even with very large data. It can even be effortless.

The tools are there today within Future to work with big data, remarkably effortlessly in most cases. I ask that you make a maximum effort to learn how to use those tools so you understand how to do what you want. If then you don't like how the workflow goes, by all means speak up on how you want it to be different, where you think it is too roundabout or cumbersome or not obvious. But don't give up, throw down your hands and say "oh, I don't get this, it is not for me." Make the second effort and it will be worth it.

If you doubt my word, try the example you gave to your staff, all the roads in the US, in Release 8. That's the GDT data set which is about what? 10 GB? as a vector data set. Release 8 takes a very long time to open it and is so slow at it you can't do much. Future opens it instantly and allows you do do real work with it, like the hypothetical you posed to your staff, with New York, Utah and all the other states at your fingertips.

bclement
275 post(s)
#31-Oct-17 16:24

Alright. Sometimes a person just has to sleep on a problem to get any clarity. I am a little slow so sometimes I have to sleep on something a few times. This morning, I finally know what I want.

First the semantics. I’ll grant you that the hard edge of “Big Data” is a little difficult to define. I will also stipulate that there are only a couple of things in this world that I know that you don’t and, more importantly, that they probably don’t really matter. What data resides on my servers is one of those things. I do not have big data.

I do understand that there are good reasons why you cannot show in a single table what people like me are used to seeing, but when I go back and review Art’s video clip near the top of this thread, I still have wonder what benefit there is to making a selection on the drawing and then switching to the table only to find that it is empty. Most users, I think, would interpret this as being broken. BTW, the US roads dataset was a thought exercise. My world ends at the county boundaries.

But that gets me to your totally fair criticism. Yes, I totally glossed over what should be shown in that table in Art’s video clip. I have no idea!! I thought I had a good idea. That is why I piped up in the first place. But I was told that it was not and probably for good reason. People who really do big data probably don’t need the things you have expressed in this thread explained to them. They probably know that this is a reality of their world. I appreciate the fact that you have taken the time to explain the problems to the rest of us so that we are aware of them and aware of the reasons you are developing the software in the ways that you are. However, I will argue that there is a large cohort of your users for whom such things are above their pay grade. Of course, history will attest to the fact that you are correct in predicting that many, if not all of us, will eventually be dragged to big data kicking and screaming someday. And that gets me to what I want.

In the meantime, I want you to let us know somehow when we have crossed the line between the two colliding worlds of traditional vs big data. Maybe it is already there. Maybe that is what the fill table button is for and it only shows up when the possibly-in-the-future user-configured limit of allowable table records is reached. But whatever it is, I would just like to know that below that number of records, things behave as they always have. If the data set is smaller, then all records are always shown and if I accidentally hook into a larger data set, the system will alert me that I have done so and let me know that what I see may not be what I am expecting because I have now crossed over into the deep end of the pool.

Please do not interpret this as me giving up or not making an effort. I want to pass my swimming test so that I can swim in the deep end of the pool some day. I am just asking that you rope it off for now so that I don't accidentally drown before then.

Thanks,

Ben

adamw


10,447 post(s)
#30-Oct-17 17:04

Two random things on the topic - not as a reply to anything in particular:

1. Why the map window is happy to show billions of objects and the table window isn't? Because when a map contains billions of objects they tend to be small on the screen and tend to overlap and hide each other and so the map window can still show the rough shape of those objects that makes some sense even though the number of objects is large. The table window can't do that, there is no concept of zoom which makes records smaller / visually simpler / overlapping.

2. There is a very direct illustration of the UI for lists becoming totally unusable due to lists getting large and evolving to adopt - the Event Viewer in Windows. The Event Viewer used to show a simple list of events, but the number of services writing log records kept increasing and it all grew to the point where the list interface just became a (pretty poor) starting point for writing queries for admin tools that could filter / categorize. Microsoft had to change the UI to perform grouping and only show parts of the huge logs that you click on to make it usable again.

pslinder1
228 post(s)
#30-Oct-17 18:46

You probably do not want to hear this but i think the real problem here is sql. Yes it is fast, relatively easy to learn and incredibly powerful. But it is not easy enough to learn. So without a robust and simple method for filtering many people will just never explore their data. Not most people on this forum but most people generally.

The no-typing filters you implemented in Future 163.6 are incredibly cool and very useful but they practically presuppose that you can scroll through your data to find the value you want to filter on. The only other type of filtering then is SQL and that is daunting to most users. I think that a robuster but still simple to use filtering facility would alleviate some of the concerns.

Dimitri


7,413 post(s)
#30-Oct-17 21:16

So without a robust and simple method for filtering many people will just never explore their data. Not most people on this forum but most people generally.

Well I hear you on that. Who could disagree with the idea that making it easier to explore data is a good idea? But really you have to aim that at the target user set. Things like Future are for GIS practitioners and folks who work with spatial data. Those people are competent far and away beyond the general population. The task is to provide more useful tools for them, not to try to help the general population.

The only other type of filtering then is SQL

Well, no, there's the Select pane, a huge capability for selecting / filtering other than SQL. What do you think of the Select pane for filtering data? How is it useful, and how is it cumbersome? What improvements would you like to see?

Beyond filtering and getting into actions like editing and changing data, combining the Select Pane with the Transform pane is a really flexible, easy and powerful way to do what you want, all without SQL.

As a general rule, I find those folks who think SQL is not easy enough to learn usually prefer putting together very simple steps. They often do very well with the idea of selecting a bunch of things and putting them into a bin, and then selecting a different bunch of things and adding one bunch to the other or subtracting stuff from an existing bunch and thus in a series of very simple steps getting what they want.

You can do all that with the Select pane, of course. It has a rich collection of one-liner templates that can be combined and recombined into pretty much as complicated a set of filters as you like, all without SQL. How can it be made even more easy and convenient to use?

pslinder1
228 post(s)
#31-Oct-17 17:12

In the post below Adam makes a lot of my points for me. On Oct 24th of this same thread Mike and I made some suggestions around filtering that I think could be helpful.

Dimitri in the post above you illustrate what I think is a conceptual problem with manifold's filtering. Although filtering and selecting are similar they are not the same because their purposes are different. If you use the selection pane you then need to go to the "View>Filter>Filter Selection" to turn your selection into a filter (and it is still confusing because you still have the red shading artefact on the drawing and the table that indicates something is selected). I think that is a little cumbersome but it is also a conceptual kluge. I think we (bit presumptuous I know) need to provide logically separate facilities for selection and filter.

When you select things you should highlight those things you have chosen in preparation for copy, editing or deleting. When you filter things you should "disappear" all things not chosen. I realize that could lead to duplication of the interface but it is worth it unless you can come up with some clear way to demarcate between the two functions in the same facility.

What do I think are the other issues with the current selection pane? I think that more readily available help would be a great benefit. There should be an explanation of what each function or template does, an explanation of each value and an example. It would be good if this help information could remain viewable when building the expression or using the template. I would like this directly available while i am working in Future; I do not want to have to toggle back and forth with the manual.

To make the construction of expressions easier I would say it would be better once you select a function, a statement or operator etc. it would be better to get rid of the angle brackets for the values and replace them with dropdown box fields where you can choose either relevant fields or type in values. I think that would make it easier to build expressions but the bigger problem is the tyrannny of SQL's syntax. Any automation that eases the burden of making syntax errors would be hugely helpful to less expert users.

Finally in the template tab I think that your wording is odd. But I admit there may well be a reason for it. For instance you have a template for "Less" and in the inputs you have a field for "Value" and another below it for "Compare To". Why not have the first titled "Field" and the second "Less Than"? Clear English seems more obvious to me.

I realize that a lot of these things are aesthetic and may not be universal, but they might be.

Dimitri


7,413 post(s)
#31-Oct-17 19:45

All great points!

Although filtering and selecting are similar they are not the same because their purposes are different.

Agree 100%. The question is how to package access to them in a way that allows back and forth use. For example, apply one or more filters and often you want to select what you now see.

In an ideal world, we would also like to reduce the number of duplicated interfaces so that someone can learn something once and apply it in similar fashion when possible.

I would like this directly available while i am working in Future; I do not want to have to toggle back and forth with the manual.

Agree as well. When we get past betas and into production builds that can be added. Right now we want to keep the interface as open to changes as possible.

Any automation that eases the burden of making syntax errors would be hugely helpful to less expert users.

Absolutely, 100% agree. I personally hate having to put single quotes around text values when using the Text Contains template. :-)

I realize that a lot of these things are aesthetic and may not be universal, but they might be.

Aesthetics are extremely important. The more feedback, the better.

Your comments are extremely helpful. More, please!

pslinder1
228 post(s)
#02-Nov-17 00:43

I came across this quote in the Radian manual and I thought it nicely illustrated my point about the 'tyranny of sql syntax':

It would be easier for beginners if SQL syntax kept the GROUP BY together with the aggregate function in a construction such as:

SELECT Job, {Sum(Payment) GROUP BY Job} AS Total FROM Expenses;

Alas, that is not how SQL syntax works and there are no magic curly { } brackets in Manifold. We just need to remember that despite coming last the GROUP BY is evaluated first and applies to some aggregate function that appears earlier in the statement.

Why do I need to remember that? If it is a rule why can't Radian just remember that for me? I am not trying to write sql code I am just trying to query my data. I should only have to worry about logic, not syntax.

tjhb
10,094 post(s)
#02-Nov-17 03:48

I should only have to worry about logic, not syntax.

Provided that you are only interfacing with your own head. (I think your head is pretty good.)

Otherwise you do need syntax.

Syntax is like a model head out loud.

Dimitri


7,413 post(s)
#02-Nov-17 08:22

Syntax is like one's modal cat. It demands to be fed in a particular way, or it won't eat.

But, on the other side of the coin...

If it is a rule why can't Radian just remember that for me?

... can be understood in the context of the never-ending desire for "wizard" interfaces that purport to give you the benefits of SQL without the inconveniences of SQL. If you read the above in that light it becomes not a claim that SQL should not have syntax but rather that Radian could provide a "wizard" interface that manages the less obvious parts of SQL syntax for beginners.

I've ranted in the past that such "wizard" interfaces can become traps. Satan always comes with a smile. He reaches out a friendly hand on which you can lean when taking those first small steps, and many will go for years not realizing that they now depend on that hand and have become limited to only what Satan allows. They must buy their hand-holding and even their ability to walk from whatever upgrades Satan directs. That's part of the marketing art of getting people hooked on your one-of-a-kind query system instead of learning SQL.

But with all that in mind I still fully agree there is a role for a hybrid interface where some parts are very simple and convenient, like Edit - Find, while others are intermediate, like the Select and Transform templates, yet full SQL is always there, either in the Command Window from first principles or written for you by the more packaged interfaces.

Nothing is perfect, including SQL. But when you trade off the less obvious aspects of SQL against the very many good things of SQL on the balance I don't know of any alternative query system which is "better" than SQL to a sufficient degree to justify jettisoning the manifold advantages of SQL ubiquity. Therefore, when discussing just where to strike a balance between simplified "wizard" interfaces and using SQL, I tend to lean more towards greater use of SQL sooner while keeping simplified interfaces only for very simple things.

pslinder1
228 post(s)
#02-Nov-17 14:50

Over the years I have read your rants and enjoyed them. Because of them I have taken the time teach myself a bit of sql and it has been useful. My problem is that I have too infrequent of a need for it to become second nature. When I have time critical data crunching (as opposed to GIS) to do I reach for one of those Satanic tools - Alteryx. And it annoys me but not because it is too simple but rather because I would prefer just to stay in Manifold (or Radian).

I like the approach you are heading in with Radian of having the query builder but being able to see the full sql behind it. I think that ultimately that might do more for your mission of having people become familiar with SQL than anything else. Keep making the Query Builder simpler and simpler to use. Maybe if you stop thinking of Satan and instead try to be Prometheus you will feel better about this:)

For the last 20 years I have had a simmering resentment that I do not have a single data management tool but instead need a database, spreadsheet, GIS, Tableau (I do not know the generic name for this) and an ETL tool. I think Radian could, in few years, become that single tool. It is priced right, capable, extremely fast, and flexible. It just needs to get easier for normal people to tap all that power and agility.

I have noticed in the last decade that you have cooled your rhetoric about making GIS available for the masses. I hope you do not give up on that goal; it is a noble one. The world needs a single, ubiquitous and simple data management platform.

artlembo


3,400 post(s)
#02-Nov-17 15:08

Just a quick couple of comments, I hope they don't come across the wrong way:

1. If you have been using Manifold for many years, I don't think it is unreasonable to learn some of the SQL. For a cheat-sheet guide, I have a book that shows how to do all the classic GIS tasks in SQL here. Lots of people who have purchased the book have said they keep it right next to their computer to refer to it.

2. SQL is actually very easy, in fact, I teach undergraduates SQL every year, and they learn it in a couple of weeks, and many of them start doing their own consulting work the same semester - it really is that easy. I also have a bunch of online courses on spatial SQL, but I won't post them here so as to not be accused of trolling.

3. If you have been involved in GIS and data analytics for 20 years, I think you should consider learning something simple like SQL. The Wizards really are a glass ceiling.

One thing I would say about the term "normal people" is that it might not be the best term to use. We are not normal people (just ask my family!). Seriously, we are involved in a data science. Manifold 8's ability to work with data sources and even perform data analytics isn't much more complicated than using Excel. And, if we want to be in this game, we have to be data scientists. You mentioned Tableau. I actually love some of the features of Tableau - very cool. I was really excited when I first looked at it, and thought I would introduce a course in it here at the University. As I started to prepare things, I quickly learned that the wizards were extremely limiting, and longed for the ability to do even simple SQL.

I know a lot of people who have become very functional with spatial SQL in a matter of a couple of weeks (shameless plug - they use my book or my course). I also know a lot of people who say I don't want to learn SQL, it intimidates me. One, two, even three years later I hear them saying I don't want to learn SQL, it intimidates me, while the other person who spent a few weeks using SQL is cleaning up in their career. I say to myself, if only that person had bit the bullet and did themselves a favor and learned SQL, they would be so far ahead.

Again, I hope this doesn't sound harsh. SQL is very easy. You can do it. It is very English-like. In a short time, you start thinking in SQL. Really.

Also, with MF, since they generate SQL from some of the Transforms, it is easier to learn how things work. So, I really would encourage you to take the next couple of weeks to learn this, then you will have what you are looking for: ETL, database analytics, spreadsheet capabilities, etc..

Best of luck.

pslinder1
228 post(s)
#02-Nov-17 16:01

Art,

I agree with what you have said. It is actually not particularly hard. But unlike you I am not really a data scientist (in this one sense I am probably more normal than you); I work intensely with data maybe 3 or 5 times a year for a total of no more than about 30 days. The problem is not learning it but retaining it.

My experience is that it is easier to remember the simplistic workflow process 'wizards' in Alteryx than it is to remember sql. I think a large part of that is that the syntax seems less arbitrary and more 'obvious'. I assume that if my work involved working with data full time I could keep sql front and center but the fact is that with my relatively infrequent needs sql is not the best option.

I suspect (I am not sure) that there are a lot of people like me who have non-full time data needs that are perfectly numerate who could make a lot more use out of Radian if the data analysis were simpler.

BTW I use Alteryx because my company has a few licenses but they cost 4K per year per seat which is outrageous. Maybe I would spend more time with sql if iI did not have the luxury of having of Alteryx. But my point holds, Alteryx is just easier and I reach for it first for big data non-visualisation tasks. I think I would also probably just do less analysis if I did not have Alteryx. Not because I shouldn't do it but because it might not be worth the time investment.

pslinder1
228 post(s)
#02-Nov-17 15:04

Syntax is a curse. Yes it is necessary but we are over-burdened with it.

A language that is as natural and as straight forward as interfacing with your own head is exactly the goal we should be trying to achieve. Let's get as close as possible where ever we can to minimal syntax and maximum logic!

adamw


10,447 post(s)
#31-Oct-17 06:37

You probably do not want to hear this but i think the real problem here is sql. Yes it is fast, relatively easy to learn and incredibly powerful. But it is not easy enough to learn. So without a robust and simple method for filtering many people will just never explore their data. Not most people on this forum but most people generally.

We understand this and are adding things that don't require the use of SQL and are adding things that pre-generate SQL so that the user can customize it instead of writing it from scratch, if he feels like doing it, all the time.

For example, if you are using interval formatting in the Style pane, we automatically display the percentile for each interval break, that's useful data and you get it with no SQL knowledge needed.

I understand what you are saying about filtering needing to be able to scroll, but that issue exists with or without SQL, it's just: the table is big = can't really scroll = can't filter sensibly. We have means to pre-filter without SQL. We might want to add something not requiring SQL aimed specifically at exploring the data - some stats / histograms, we agree. There's an interesting suggestion along the same lines up in the thread.

adamw


10,447 post(s)
#21-Oct-17 19:15

Select / Transform panes

The Select dialog is reworked into the Select pane under Contents.

The Transform dialog is reworked into the Transform pane under Contents.

Both Select and Transform panes use cleaner layouts and controls (the combo box for fields in the Transform pane displays field icons - different for regular vs computed fields - instead of the 'Target:' readout, etc). The Window tab in the Select pane is removed because it is no longer necessary (the selections are shared automatically and don't need to be synchronized).

The list of templates in the Select pane and the list of templates in the Transform panes use a new grid control. Both lists contain an item for 'no action' at the top. Performing a selection or a transform automatically switches to that item, allowing to see the effect of the operation.

The list of saved selections in the Select pane uses the new grid control. To add a new saved selection, edit the name of the 'new' item at the bottom of the list. This adds a new field and copies the current selection into it. To rename an existing saved selection, edit its name. To delete a saved selection, select it (eg, with Ctrl-Space or Ctrl-click on the record handle) and press Delete in the toolbar. It is possible to delete multiple saved selections at once. To update a saved selection and set it to be the same to the current selection, select it and then press Capture in the toolbar. It is possible to capture the current selection into multiple saved selections at once.

The Transform pane defaults to Update Field whenever possible. (Otherwise accidentally clicking a template that only allows Add Component was switching to Add Component and then this choice was kept forever because all transforms allow it.)

The Transform pane lists all fields, including computed fields that cannot be modified. If the selected field is readonly, the Transform pane removes Update Field as a choice for the transform (but allows Add Component and allows Edit Query).

artlembo


3,400 post(s)
#21-Oct-17 21:07

The transform pane is definitely tighter. I like it. However, I notice that when looking at the Templates, on my Surface, I can only see about 5 transforms. I have to scroll to get to the transform I'm interested in. So, if I want to use Triangulate All, I have a long way to go.

It would be nice to be able to expand the size of the window, allowing me to see more transforms. Also, once in the Transform window, it would be nice to press the letter 'T', and have it jump to the Transforms that begin with that letter.

adamw


10,447 post(s)
#22-Oct-17 17:23

We are planning to add a filter box, similarly to the query builder / Project pane / properties dialog, etc.

adamw


10,447 post(s)
#21-Oct-17 19:17

Other

New Data Source dialog defaults to per-session cache for web data sources and persistent cache for file data sources.

Dropping a layer into a map window makes it the active layer. Dropping multiple layers into a map window makes one of them (the one that became leftmost) the active layer.

WMS / WMTS / WFS / other servers that use XML as part of the exchange tolerate DTD data inside XML.

All web data sources explicitly allow using TLS 1.1 and TLS 1.2 for HTTPS connections.

Ctrl-Shift-A invokes Edit - Select Clear / unselects all records in the new grid control.

Ctrl-A selects all text in the log portion of the query window / script window.

(There are also many bugfixes, including the really nasty one in the query engine found by Tim earlier - which was fairly frequently making a join return no data if the join was done on a non-spatial criteria and one or both of the joined parts had a spatial index.)

tjhb
10,094 post(s)
#21-Oct-17 22:00

I wish you could slow the splash screen down a bit! I want to look at it!

hugh
200 post(s)
#21-Oct-17 23:16

yes!

Sloots

678 post(s)
#22-Oct-17 08:49

There it is!

Attachments:
splash.png


http://www.mppng.nl/manifold/pointlabeler

hugh
200 post(s)
#22-Oct-17 19:47

Thanks! Art is going to want to be able to access tabulated info from lower eyeballs -- seems it might be only a bit: "http://www.cam.ac.uk/research/news/surprising-solution-to-fly-eye-mystery

tomasfa
182 post(s)
#26-Oct-17 18:36

What a great discussion and support from Manifold. This is exactly why Manifold rules!!! above all. Two way Communication and responsiveness is real and fast. Learned a lot by reading it all. Thank you Manifold staff and Jedi Masters.

Manifold User Community Use Agreement Copyright (C) 2007-2021 Manifold Software Limited. All rights reserved.