I have around 30,000 vehicles sending GPS information and Status updates every 30 seconds. The Status updates are Speed, Availability, DateTime, Location etc. Today we process the data on the fly and push it to web apps for 'real-time' visualization on a map. That is working very well for the last 3 years. This data is aggregated by the end of the day by combining the path for each vehicle and calculating the distance as straight line distance. We also do a Point in Polygon with 1,000 underlying areas to aggregate where the Vehicle was at each time of the day. In addition we snap the GPS location to the nearest Road. 80 million times per day 24/7. We are now evaluating how to create an approximate route using OSRM between key points to gather a more accurate real distance traveled during a shift. With the distance we can get the income earned and more accurate know which toll fares have been paid based in the route. So we need to calculate which Tolls the route was passing and add the costs. We also need to know how long each job was, where the vehicle started and stopped each job and if it was a paid job, how much they earned etc. All that can be calculated from the raw data attributes. This we can do quite well in real-time for the 80 million points collected for a 24h shift, but it's becomes very cumbersome to do aggregates and spatial analysis over a year or the last 3 years data we have. 3*365*80mil -> 90 Billion data records becomes a pain and we are now looking at setting up a Hadoop/SPARK cluster to distribute the aggregation calculations and clustering with custom Spatial processing in Scala. We would like to do more spatial queries on this type of huge data sets. The customer is looking at ESRI Hadoop Geoprocessing solution that they promise for next version. We have yet to see it, but the logic is sound. We would like to see a good story for Radian/Manifold to work well with Big Data platforms like Hadoop, HIVE and Spark, to be able to read/save to HDFS as JSON, GeoJSON, CSV, ORC/Avro/Parquet. In the world where IoT (new fancy word for old fashion device data collection like GPS tracking and SCADA) becomes more pervasive, the need to quickly process huge amounts of data with SQL and Spatial tools becomes very important. I have not even mentioned similar Lidar requirements. I can see Radian to be a middleware tool in the ETL chain between ingestion of data and before saving to HDFS/SQL/Disk. I also would like to see that Radian possibly could be run in a highly parallel fashion on a cluster, Hadoop or otherwise in the cloud as Azure HDInsight or Azure Batch HPC jobs. Now that .NET is running on Linux I think it would be a great idea to create a small part of Radian to be distributed for map/reduce jobs, similar to Microsoft Mobius. Then I could use Manifold/Radian to create advanced SQL Spatial queries which is quickly distributed to the workers and processed, the result is seamlessly sent back to my Desktop/Web version of Manifold/Radian. This is also the idea behind Hive, which enables you to send normal SQL commands to a Hadoop cluster to get a consolidated result set back. It hides much of the headaches of figuring out how to read millions of text files stored in HDFS. I also work on another project where I need to create Routes from A>B and aggregate the result of neighboring route distance to scores of accessibility. I also need to generate detailed Isochrones for each location. This is done for up to 100,000 locations for each simulation, which translates to millions of complex route calculations and aggregations. Then the user can change the underlying data and ask us to re-calculate it all again. We can't control when the user request a new simulation and the area affected. We also would like a fast simulation response, not telling the user that they get their answer next week. Today I have to use C# parallel code and custom A* code to run this on 40 cores and it takes days for a single run. So another user will have to wait for the first run to complete, or both will take longer if we divide the compute resources. I did a brute force test to spin up 2,400 cores in Azure Batch to run it faster, but that cost a lot of $$$ for each run. So routing on a dense grid/mesh network using parallel A* would be very beneficial. Features to create good convex meshes, and convert to navigable graphs which can be used for parallel route calculations would be great. That would make it very easy to take a network of lines and polygons that is not normally navigable by Routing engines from Manifold/ESRI/OSRM etc. and then get accurate result out of it. It can be used to drainage, irrigation, flood calculations etc. As the technique uses regular 2D meshes, the logic can be applied to 3D meshes without much change, to add the volume of flow after the direction and route is calculated. To summarize key features and examples we would like to see: - Fast parallel Routing algorithm on large Mesh/Graph based networks
- Parallel Spatial processing on Hadoop/Spark/Distributed cluster managed from a user Manifold/Radian application
<!--[if !supportLists]-->
|