Many of us have bumped up against the limit even in routine analysis.
To clarify, display is not analysis. There's no 50000 record limit in analysis, and thinking there is indicates a misunderstanding of topics like Table Windows and Big Data. Don't confuse the conveniences provided for display of small data with the analytic tools that are necessary for working with bigger data. For example: Then I had to search and find the owner for all the soil parcels and they were not visible.
? Nothing stops you from searching the entire table. Use the Select pane for point and click searches or use SQL: SELECT * WHERE [owner] = "John Doe" will select from all ten billion records in a table, if that's what you have. But if you want to search visually, by scrolling a table, you're not going to see all of the records in a table in a single screen. You have to scroll the screen over and over to see all of the records. But that doesn't mean all the other records aren't there. You just don't see them in the one screen you're looking at. Just saying, whether you're looking at one screenful or a 50,000 record chunk, in either case what you're looking at isn't the entire table. You're just displaying some of the records from the table. Seems ages ago now, but the real problem comes when you have 8 to 10 billion LiDAR points in a table.
The only problem you get with 8 to 10 billion LiDAR points in a table is people thinking that manually scrolling through a table that's 30,000 miles long, nearly 8 times the radius of the Earth, is somehow an effective way to work with that data. It's not. Manually scrolling through 10 billion records is a mistake made by people who try to apply paradigms that work fine with tiny data to big data, instead of learning how to use the tools that provide effective workflow with big data. 10 billion records is a lot. It's a fundamental conceptual mistake, at a cosmic level of error, to think you can manually scroll through such a table and learn anything significant. If you take one second per screen and stare at the screen for eight hours a day, seven days a week, it will take you 15 and a half years to get through that table. After 15 years of staring at one screen per second are you going to remember anything? Nope. Spend a few hours scrolling through that table at 1 second per screen and you won't see even a ten thousandth of it. It's not the way to try to understand bigger data. That also goes for the results of any analysis you do. If you do a selection on a 10 billion record table and the result table is, say, 500 million records, you're not going to get any significant understanding of those results by scrolling through them manually. As a practical matter, you're not going to get significant understanding by scrolling through results tables of even a few thousand records. That's why the database community thinks it's nuts to visually cruise through more than a few thousand records, and why DBMS packages like Oracle use a viewing chunk of 5,000 records for interactive viewing. They have a lot of experience at such things, and it is better to learn from that experience than to blow it off. Manifold provides ten times more, with a viewing chunk of 50,000 records. 50,000 records is used in Manifold because there are more beginners in the Manifold community when it comes to bigger data, so they are more likely to use ineffective paradigms beyond where they make sense than in the more experienced DBMS community. That's OK, as a transitional way to learn how to work with bigger data when coming from a smaller data world and smaller data tools. So how to help that transition for people who don't RTFM, so they can learn that casual approaches that might work with Excel don't make sense with bigger data? They don't RTFM so publishing helpful guides doesn't work. What you can do, I suppose, is to provide sensible defaults and then take the "safeties" off by letting them shoot themselves in the foot. One such move would be to let people browse big tables by not only scrolling a screen at a time, but also by fetching the "next" 50,000 chunk for visual display. I write "next" in quotes because big tables aren't ordered, so fetching the "next" chunk is meaningless in terms of order. It's better to write "fetch another chunk of randomly ordered records". After fetching a few chunks and scrolling through them, it won't take long before there is an "Oh, right... this isn't sensible..." moment. You could also really take off the safeties by allowing people to dial up the chunk size to 100,000 records or a billion records if they want, until they discover, "oh, yeah... the way everybody talks about in the user manual that I didn't read really is better..." I refer to such measures as "safeties" because Manifold doesn't want the package to get banned from the bigger data community because it allows users to be un-neighborly. It's highly uncool to provide a tool that can connect to databases with massive parallelism and start pulling tens of billions of records because some beginner thinks it makes sense to visually scroll through a table that's 30,000 miles long. Administrators don't like users (or software that enables them) who pull huge data from their databases for no good purpose. So Manifold would have to invest into some safety measures behind the scenes, like staggered reads or disallowing bigger chunks on anything except local .map files. Such measures might support the illusion that trying to scroll through a table 30,000 miles long is not a gross error while not trashing the efficiency of data warehouses. All that's perfectly possible, but I don't think it should be a priority because there are highly effective methods for working with bigger data, the way experts do it, in 9 right now. Prioritizing ineffective ways of working with bigger data doesn't sound like a good idea. The real issue is an educational one, where lack of experience with bigger data leads people coming from a smaller data world to think in terms of ineffective paradigms. The tools are there to rapidly and efficiently work with very large data. It's better to learn them and to use them. What I write above comes from the many years long experience of the DBMS community. I think it's important but I'm not dogmatic about it, especially when it comes to thinking out of the box about how to help people transition from smaller data thinking to bigger data thinking. I think it would be perfectly OK to do things in Manifold that the DBMS community will roll their eyes at, so long as it is limited to data in local .map files where un-neighborly user choices won't get either the user or Manifold banned from DBMS communities. 9 is so fast that you could certainly pull millions of records for a table from a local .map file for scrolling, so if that's what some people want to do, why try to talk them into a better way? A few hundred thousand records is small data for 9, so may as well let people view it however they want.
|