For many operations Manifold will automatically use multiple processors or multiple processor cores if installed in a computer system. In addition to this basic multiprocessing capability Manifold includes the ability to utilize massively parallel multiprocessing in the form of NVIDIA CUDA-enabled products, such as the NVIDIA GPU plug-in card seen below that provides 480 processors and supercomputer performance for under $500.
It is not an exaggeration to say that NVIDIA CUDA technology could well be the most revolutionary thing to happen in computing since the invention of the microprocessor. It's that fast, that inexpensive and has that much potential. NVIDIA CUDA is so important that all Manifold users should insist that computer hardware they procure is CUDA-enabled. In fact, given the ubiquity of NVIDIA products it is quite likely that a recently-procured computer will already include a CUDA-enabled NVIDIA GPU of some sort.
NVIDIA is best known for motherboard chip sets as well as for outstanding graphics processors that have become popular as the basis for graphics cards. In the quest for maximum speed, NVIDIA's GPUs (Graphics Processing Units) have evolved far beyond single processors. Modern NVIDIA GPUs are not single processors but rather are parallel supercomputers on a chip that consist of very many, very fast processors. Contemporary NVIDIA GPUs range from 16 to 480 stream processors per card, delivering incredibly powerful computing bandwidth. The card shown above, for example, provides 480 stream processors.
Although the market impetus behind the creation of such supercomputers on a plug-in board has been the computational demands of the PC gaming market, such "graphics" boards have become so powerful that the scientific computing community has begun using them for general purpose computing. It turns out that many mathematical computations, such as matrix multiplication and transposition, which are required for complex visual and physics simulations in games are also exactly the same computations that must be performed in a wide variety of scientific computing applications, including GIS.
NVIDIA has supported this trend by releasing the CUDA (Compute Unified Device Architecture) interface library to allow applications developers to write code that can be uploaded into an NVIDIA-based card for execution by NVIDIA's massively parallel GPUs. This allows applications developers to plug in a teraflop-class, 480-processor, NVIDIA-based card and upload applications to run within the NVIDIA GPU at far greater speed than possible on even the fastest general purpose CPU on the motherboard. For a mere few hundred dollars we can use CUDA to achieve true, supercomputer performance on the desktop.
CUDA offers such tremendous performance gains that many functions within Manifold have been re-engineered to execute as parallel processes within CUDA if such a card is available. If we have a CUDA-capable NVIDIA graphics card installed in our system, Manifold can take advantage of the phenomenal power of massively parallel NVIDIA stream processors to execute many tasks at much greater speed.
Because NVIDIA technology benefits from enormous economies of scale in the gaming market, CUDA-enabled cards have become very inexpensive for the performance they provide. At the present writing CUDA-enabled cards can be purchased for less than $100 for an entry-level CUDA-capable card and easily under $350 for a high performance CUDA-capable card. It is easy and inexpensive to choose a card with the balance between performance and cost desired (more stream processors running at faster clock rate with more memory gives better performance).
Dozens of vendors provide graphics cards based upon CUDA-capable NVIDIA GPUs and it is almost not possible for a high-performance PC or motherboard vendor to introduce a product that does not do a good job of hosting such GPU cards. The insatiable demand of gamers for more performance has also spawned an industry of vendors offering ever-faster memory, more powerful power supplies and other systems components that are perfect for creating outstanding GIS desktop and server machines.
The easiest way to see if a particular graphics card is CUDA-enabled is to first check which NVIDIA GPU it utilizes. Next, visit the NVIDIA web site at http://www.nvidia.com and in the web site's search box enter "CUDA-Enabled GPU" to find pages that list NVIDIA GPU products which can be utilized for CUDA-enabled parallel processing. Almost all contemporary NVIDIA GPUs are CUDA-enabled.
If anything, the surprise is discovering that quite a few NVIDIA GPUs aimed at motherboard chipsets or mobile applications like portable computers are also CUDA-enabled, albeit with a smaller number of processors per GPU.
NVIDIA at the present writing provides three families of CUDA-enabled GPU products. All three families may be used with Manifold and CUDA:
· GeForce - The GeForce line of NVIDIA GPUs are sold primarily through a wide variety of graphics card and motherboard manufacturers which incorporate the NVIDIA chips into their own graphics cards. Performance tends to be high and prices kept low by fierce competition in gaming markets.
· Quadro - The Quadro line of NVIDIA GPUs is manufactured and sold directly by NVIDIA into very high end professional workstation graphics markets. Quadro cards provide extraordinarily high resolutions and massive graphics memory for the most demanding workstation applications. Some Quadro products also appear in high end portable computers, and some Quadro products are also provided in external cabinets similar to Tesla packaging.
· Tesla - The Tesla line of NVIDIA GPUs is also manufactured and sold directly by NVIDIA to support high performance computing where supercomputer performance through parallel processing is required. Although they are also available as plug-in cards, Tesla GPUs are best known for being packaged into external cabinets (either desktop or rack mount) that provide two or four GPUs per cabinet.
The external cabinets used for Tesla and some Quadro products attach to desktop computers using a special cable that plugs into an interface card plugged into a standard PCI-E slot. This allows the external Tesla or Quadro configuration to appear to software as if it were a plugged-in PCI-E card just like typical GeForce cards. However, because the actual GPUs are hosted in an external cabinet that provides power and cooling, the host computer does not need to be retrofitted with additional power and cooling.
CUDA Limitations and Requirements
There are several important constraints on CUDA use within Manifold:
· We must have a CUDA-enabled NVIDIA card installed in our system. 200 and 400 series NVIDIA cards at the present writing are the best-known CUDA-enabled cards, but other NVIDIA GPUs are also CUDA-capable (check with the NVIDIA web site and with your graphics card vendor's web site to see if a particular card is CUDA-capable). Hardware evolves so rapidly under the pressure of gaming industry economy-of-scale that almost before this documentation can be published there will be even faster CUDA-capable cards. manifold.net recommends getting Fermi-class (400 series) GPUs. May as well get the best!
· The rest of our PC system must have sufficient speed and power to support the NVIDIA card. For example, memory must be fast enough to handle CUDA bandwidth and power supplies must provide enough power to run the NVIDIA card (or cards) with extra PCI-E power cables. Consult any technology-obsessed, 14 year old gamer for advice on configuring a suitably "hot" system.
· We must have installed NVIDIA's most recent set of drivers for Windows, which may be downloaded from the nvidia.com web site. NVIDIA's latest drivers automatically install software required for CUDA use by CUDA-capable NVIDIA-based cards.
· If we are running a 64-bit Windows system we must have installed NVIDIA's 64-bit, CUDA-enabled drivers for our 64-bit Windows system.
· Writing massively parallel algorithms to implement spatial functions is extremely difficult, even for manifold.net. Therefore, at the present time only a few dozen functions have been implemented within Manifold that can leverage CUDA. Many more are on the way.
· Existing CUDA-enabled functions within Manifold are Surface - Transform dialog operators for surfaces. The Surface - Transform dialog is part of the optional Surface Tools extension for Manifold (and also a built-in part of some Manifold System editions such as Universal Edition and Ultimate Edition). If we do not have the Surface Tools extension we will not have the ability to use this dialog and hence no ability to leverage CUDA. New updates and future Manifold releases will likely add many more usages of CUDA in addition to the Surface - Transform dialog operators.
· Functions executed within CUDA cards are virtually instantaneous compared to speed of execution within the main processor. However, the NVIDIA stream processors execute tasks so rapidly that it is difficult to provide data fast enough from disk and memory to keep the processors occupied. The resulting performance in most "real life" applications therefore tends to be limited not by processor speed but rather by the speed with which data can be fetched from hard disk or other memory. In addition, a good portion of various tasks are not bound by computation but instead involve overhead tasks such as writing out results to disk, re-computing levels and other necessary but mundane tasks that are not accelerated by CUDA processors. The net result is that as a practical matter for many tasks CUDA-enabled processors will visibly increase speeds, almost always by a factor of two to ten and at times by a factor of ten to fifty, but not usually by factors of hundreds for the overall task even if the actual computation of parts of the task goes hundreds of times faster.
· We can get the most out of CUDA if the rest of our machine does not slow down the ability to feed the insatiable power of NVIDIA stream processors. For maximum speed we should use 64-bit Windows on at least a quad core machine with lots of RAM and large, fast disk drives. Before configuring a new 64-bit system, check the NVIDIA web site to make sure that 64-bit drivers are available for the Windows operating system you plan to install. At the present writing, Windows XP x64 has been used as a baseline common denominator for development of x64 support by manifold.net with all development in Vista and Windows 7 as well.
Despite the above limitations it is clear that CUDA is a revolutionary technology. NVIDIA GPUs are so fast that a routine comment from developers is that NVIDIA renders the main processor almost superfluous, as if even the fastest multi-core Intel chip is relegated to being nothing but an accessory processor to handle the keyboard and mouse. That is not hyperbole given that NVIDIA GPUs can run jobs 200 or even 300 times faster than even the fastest Intel CPUs. See the demo below for an example.
Such speed advantages are not a competitive challenge that traditional processor vendors can afford to ignore. CUDA is the first of what is likely to be a new wave of massively parallel architectures from competitors such as Intel and AMD. Manifold's parallel code has been expressly written to allow easy implementation on future "many-core" processor solutions from Intel and AMD that will compete with NVIDIA CUDA.
Installation and Configuration
Once we have installed CUDA-capable NVIDIA hardware and NVIDIA drivers there is no need for any other configuration. Note that you must install 64-bit NVIDIA drivers when operating 64-bit Windows.
Make sure to download and install the latest NVIDIA drivers. Current generation NVIDIA software installs CUDA capability as part of the main NVIDIA driver installation. Manifold looks for current NVIDIA software. Because driver installation discs provided within graphics board packages might have been mastered many months ago, it is important to download and install the latest drivers from NVIDIA's web site.
Make sure to download and install the latest Manifold update. Although the nature of CUDA means that few changes need be made to support new NVIDIA GPUs, some changes are required to keep up with rapid evolution. Recent updates have added support for Fermi-series NVIDIA cards, for example. If you don't install the latest Manifold update you won't be able to use Fermi-series cards.
When launched, Manifold will automatically detect and utilize CUDA-enabled hardware. The Use GPGPU technologies (NVIDIA CUDA) option in the Tools - Options - Miscellaneous dialog is turned on by default.
When a CUDA-enabled card is present Manifold will report finding the card in the Help - About dialog in the GPU value. Manifold System Release 8.00 reports CUDA-enabled GPUs either as the more recent Fermi series GPUs or all earlier CUDA-enabled GPUs as pre-Fermi.
The above illustration shows a Manifold Help - About report in a 64-bit Windows system in which two CUDA-enabled cards have been installed, both of which use NVIDIA GPU devices prior to the Fermi series.
If a CUDA-enabled card has not been installed or if current NVIDIA drivers have not been installed or if the Use GPGPU technologies (NVIDIA CUDA) option in the Tools - Options - Miscellaneous dialog has been turned off, the Help - About dialog will report Graphics only for the GPU, as seen above.
If a Fermi series card is present, it will be reported as seen above. All Fermi series cards (for example, GTX 480 or GTX 470 GeForce cards or Tesla C2050 or C2070 cards) will be reported as a Fermi device. The above shows a Help - About display for a 64-bit Windows system that has one Fermi card, a GTX 480, in it.
Functions Utilizing CUDA
At the present writing the following Manifold Surface - Transform dialog functions utilize CUDA if available: Aspect, AvgValue, Blur, CurvGauss, CurvMean, CurvPlan, CurvProfile, DifferenceE, DifferenceN, DifferenceNE, DifferenceNW, DifferenceS, DifferenceSE, DifferenceSW, DifferenceW, Diversity, DiversityIndex, HighPass1, HighPass2, HighPass3, Laplace1, Laplace2, LowPass1, LowPass2, LowPass3, MajValue, MaxValue, MedianCross, MedianSquare, MedianSquare5, MedValue, MinValue, Sharpen, SharpenMore, Slope, SumValue, Tile and TileMedian functions.
Additional functions and use of CUDA are expected to be added with each new Manifold release. New products such as the successor to Release 8 expected in 2010 will include new internal architectures to support CUDA even more effectively.
We often will be asked to demonstrate the speed of CUDA for colleagues, so it is handy to have a familiar example available that can be quickly run to demonstrate the power of GPU computing. This demo uses the example Montara Mountain surface within an easily-remembered demonstration that shows the power of CUDA.
This example assumes we are running 64-bit Windows XP on a machine with at least one CUDA-capable NVIDIA graphics cards installed. It will work fine on 32-bit Windows as well, but why use old-fashioned Windows when 64-bit Windows has been happening for over seven years? It also will work fine on Windows 7 or Vista or other Windows versions. In this particular example we've installed one GTX 480 card, which uses a Fermi series GPU that provides over 480 processing cores.
The demo also requires a Manifold installation that includes the Surface Tools extension (automatically enabled when Universal Edition or Ultimate Edition are installed) so that the Surface - Transform dialog is available.
The greatest performance difference visible with CUDA appears when the Surface - Transform dialog is used for a complex calculation on a surface that is not too large. This shows off the intense computational performance delivered by CUDA without requiring many disk accesses (which take a proportionately larger amount of time for large files) slowing down the works.
For an example of a complex calculation, suppose we have a surface called MySurface. Launch the Surface - Transform dialog and execute a formula such as:
Slope([MySurface]) + Slope([MySurface] * 2) /2 +
Slope([MySurface] * 3)/3 + Slope([MySurface] * 4)/4
This is a nonsensical but complex formula that will execute much faster using CUDA than without CUDA. Although this and the other formulas shown in this example are artificial examples, they share many of the mathematical characteristics of sophisticated "real life" operations on surfaces so they are genuinely representative of real performance gains available through CUDA.
The more complex the formula the greater the advantage from using CUDA. For example, we can compute aspect as well as slope and also use an optional window size parameter for both functions in a formula (using a window size of 5) such as:
Slope([MySurface], 5) + Aspect([MySurface], 5) +
Slope([MySurface] * 2, 5) + Aspect([MySurface] * 2, 5) +
Slope([MySurface] * 3, 5) + Aspect([MySurface] * 3, 5) +
Slope([MySurface] * 4, 5) + Aspect([MySurface] * 4, 5)
Let's apply the above to a specific example.
We will use the Montara Mountain sample surface to measure computing speed with and without CUDA. We begin by importing the Montara Mountain surface, as is illustrated in examples such as the Combine a Surface and a Drawing in a Map topic.
We rename the surface to a very short, one letter name, s, to facilitate quick keyboarding.
In the Tools - Options dialog's Miscellaneous page we verify the Use GPGPU technologies (NVIDIA CUDA) option has been turned on.
Manifold can log reports to the history pane such as the time required to execute functions. We will also turn on the Log transform time option in the Logging page in the Tools - Options dialog.
Launching the Help - About dialog we can see that Manifold has detected a Fermi series CUDA-capable device installed in the system.
No doubt even a fire-breathing GTX 480 as used in this example with 480 processors per device will soon seem slow given how fast NVIDIA is introducing newer and more powerful devices, but this example using a GTX 480 is still highly instructive for the tremendous gains achievable through GPU computing at very low cost.
With the surface open in a window we launch the Surface - Transform dialog entering the following formula (which may be copied from this Help window and pasted into the dialog):
slope(s,5) + aspect(s,5) +
slope(s*2,5) + aspect(s*2,5) +
slope(s*3,5) + aspect(s*3,5)
We check the Save result as new component box for two reasons: first, because that uses slightly less overhead than applying the results to the subject surface and, second, because not altering the subject surface preserves it unmodified for the next trial.
Press OK and Manifold launches into action using CUDA to compute the formula, reporting the time required in the History pane:
The History pane reports how long the computation takes, in this case approximately 1.5 seconds. Times may vary slightly depending on precise system configuration: a Fermi series GPU providing 480 stream processors is so incredibly fast that it computes the task almost instantaneously. The time required overall is mostly overhead such as fetching data from disk and getting it through the CPU to the GPU and not actual calculations; therefore, running the same task again will cut the time slightly by about a tenth or a few hundredths of a second as Windows caches the data used for more efficient provision to the GPU.
The result of the Surface - Transform formula computation is a new surface, automatically named S 2.
Let's now see how long the computation takes without using CUDA. We can measure this in identically the same system for an "apples to apples" comparison by simply switching off the Use GPGPU technologies (NVIDIA CUDA) option in the Tools - Options dialog.
In the Tools - Options dialog's Miscellaneous page we uncheck the Use GPGPU technologies (NVIDIA CUDA) option to instruct Manifold not to use CUDA.
Launching the Help - About dialog again we can now see that Manifold no longer uses CUDA and reports Graphics only for the installed GPU.
Once more we run the same formula in the Surface - Transform dialog. As before, the time required for the computation is reported in the History pane, appearing below the last time logged:
Without CUDA the computation takes dramatically longer, over 424 seconds. Wow! Using the CPU instead of the NVIDIA Fermi GPU took about 284 times longer. It takes so much longer because the computation must happen on the Intel main CPU, which even when run as a true 64-bit device by 64-bit applications code in 64-bit Windows is far, far slower than the phenomenal supercomputer speed of the NVIDIA Fermi device. In this case, the main CPU really is insignificant compared to the speed of the GPU.
It's true the machine used in this example is not the latest, greatest machine: it has an Intel Core 2 Quad CPU and not the latest Core i7 quad. An Intel Core i7 is indeed a faster CPU than the prior generation Intel Core 2 Quad used in this example. However, most users do not yet have a Core i7, and even if they did the Core i7 is not remotely as fast as the NVIDIA GPU. If we spend a lot more money to buy a really fast Core i7 we can drop the time required from about 424 seconds to about 261 seconds, that is, only about 1.6 times faster than the Core 2 Quad instead of being 284 times faster with the NVIDIA GPU. That's no comparison, especially when faster Core i7 processors cost more than the $300 to $500 cost of a Fermi card. Even a less expensive Fermi card like a GTX 470, selling for under $350 as of this writing, will execute the job in only 1.7 seconds, about 250 times faster than the CPU alone.
More complex formulas will require yet longer times. For example, in the same system used above, computing the formula...
slope(s,5) + aspect(s,5) +
slope(s*2,5) + aspect(s*2,5) +
slope(s*3,5) + aspect(s*3,5) +
slope(s*4,5) + aspect(s*4,5) +
slope(s*5,5) + aspect(s*5,5)
...requires 2.389 seconds with CUDA and 740.955 seconds without CUDA, about 310 times faster with CUDA.
The comparison is especially dramatic when considering that the hardware used for the above example is a typical, highly capable base system with a 64-bit, quad core processor and 8 GB of RAM. It's faster than the dual core CPUs used by most people. A fast system makes for a good demo, because it minimizes the time required for overhead chores accomplished by Manifold as part of the demo. The actual computation of surface values using CUDA is very fast with most of the time required for the CUDA-enabled timings going to overhead such as setting up the job and writing out the resulting surface.
Very Important: After doing this demo, don't forget to turn on the Use GPGPU technologies (NVIDIA CUDA) option in the Tools - Options dialog so that future work can take advantage of CUDA!
Experienced demonstrators will usually create the above demo in advance as a Manifold .map project file that has the surface already imported and example computations for the Surface - Transform dialog saved in Comments components as text (a .map file with all that done may be downloaded from the manifold.net web site).
To do the demo, the Comments component can be quickly opened and the desired text copied and then pasted into the Surface - Transform dialog. Showmanship is an important part of good demos so it is important not to allow the audience to get bored while we keyboard a formula into a dialog. Using copy and paste also eliminates the need to remember the exact syntax of slope or aspect functions. The Surface - Transform dialog will "remember" the last formula used in a .map project, but just in case someone changes the formula it is a good idea for demos to have a spare copy of the formula in a Comments component.
Part of showmanship is launching the Help - About dialog after changing the option to use or not use CUDA so that the audience can see for themselves that Manifold is or is not using CUDA. This is the computer demo equivalent of a magician showing the audience "there is nothing up my sleeve."
It is also important to remember to turn on the Log transform time option in the Logging page in the Tools - Options dialog and to have the History pane open, so that the audience can see for themselves the exact timing of each trial.
Although longer demos can show the very much longer periods of time required for non-CUDA performance, it is important not to bore the audience. A comparison of 1.5 seconds to seven minutes usually conveys the intended message. 1.5 seconds goes by instantly, especially if the demonstrator says a few words about what is going on after pressing the OK button, while seven minutes seems endless and unendurable in comparison.
[The unendurable seven minutes make for an especially memorable demo for those who want to contrast Manifold's CUDA-enabled supercomputer speed with legacy GIS software products that do not have CUDA-enabled supercomputer speed but which do cost several thousand dollars per license to run hundreds of times slower.]
The easiest approach is to do the CUDA-enabled trial first, so the audience sees it goes rapidly, and then launching the interminably long process without CUDA on the second trial as the demonstrator discusses CUDA architecture, the wide availability of NVIDIA-based hardware, the breadth and depth of CUDA-capable devices and the extraordinary economics of scale attained by leveraging mass market interest in massively parallel GPUs for gaming. While all this talk is going on the audience will see with their own eyes that in the background the non-CUDA computation is still painfully crawling along for many minutes to do what the CUDA computation accomplished in a second or two.
More experienced demonstrators will often choose to perform the non-CUDA trial first. This requires greater skill with timing of presentations so that the speaker's commentary coincides more or less with the end of the computation. In this case the demonstrator will launch the transform operation and then spend the next six minutes or so talking about CUDA, describing what the Surface - Transform dialog is about, talking about the Montara surface, describing how even with a quad core main processor this is a long and complex calculation and every now and then directing the audience's attention to the lengthy computation with a comment such as, "Nope, not done yet...". After such a set-up audiences are inevitably dazzled by the amazing speed of CUDA.
This particular demo uses a completely artificial formula that serves no practical purpose except to illustrate speed of computation. But the formula is very representative in that it provides an easily-understood example of a legitimately complex calculation that has about the same demands on computation of sophisticated mathematics typical of significant computations on surfaces.
"Real life" computations on surfaces tend to use very complex formulas that are difficult to explain to an audience that does not consist of remote sensing or GIS experts. However, just about everyone can understand what slope and aspect are.
Slope gives the degree of inclination of a surface. It is a way of finding which parts of a surface are relatively flat and which are steep. Aspect gives the directional orientation of a surface, to find which parts of a surface, for example, are facing South. Computing either slope or aspect involves a large number of local calculations over the entire extent of a surface to find the slope or aspect for each individual pixel of the surface. The use of a five pixel window simply controls how many neighboring pixels are considered when computing slope or aspect. The use of a window is a way to demonstrate additional computational complexity by forcing an interpolation of sorts within each local computation of either slope or aspect.
The Help - About dialog seen in this example shows that this particular system has one CUDA-capable device of Fermi class installed. The demo was run using an EVGA GTX 480 card providing a total of 480 stream processors. An outstanding benefit of NVIDIA technology is that speeds are doubling almost every year while costs are remaining constant or are going down.
The Help - About dialog reports the number of CUDA-enabled GPUs found. Some cards have more than one GPU in the "card." For example, some double-wide cards (such as the original GTX 295 reference design) are really two circuit cards packaged within the same double-wide fan housing, with a GPU on each card for a total of two GPUs. Such cards will be reported as two GPU devices. Plug in two such cards and Manifold will report four GPU devices. Some cards (such as the GTX 295 "Co-op" series) have a single circuit card with two GPU devices mounted on the card. Installing one such card will likewise be reported as two GPU devices.
Experienced Manifold users will see from the screen shot of the Montara Mountain surface that it has been projected and that a palette coloring the surface by height has been applied using the surface display options dialog. Projections and surface display options have no effect on performance of Surface - Transform computations with or without CUDA.
CUDA capability is enabled by default. To turn it off, turn off the Use GPGPU technologies (NVIDIA CUDA) option in the Tools - Options - Miscellaneous dialog. Remember to turn the option back on after doing a demo!
No CUDA card reported in Help - About dialog
· Has a CUDA-capable NVIDIA card been installed in the computer? Note that not all NVIDIA-based graphics cards have CUDA capability. You must have an NVIDIA-based card that supports CUDA. Almost all reasonably contemporary NVIDIA GPUs now support CUDA.
· Have you installed the most recent NVIDIA drivers downloaded directly from NVIDIA's web site? Older drivers, such as those often found on installation DVDs packaged with graphics cards, may not work even if they appear to install some sort of CUDA software.
· Have you changed a graphics card recently? Swapping cards, even within the same series (like switching from a GTX 470 to a GTX 480) may require re-installation of NVIDIA display drivers and restarting the system.
· If you are working with 64-bit Windows have you installed a 64-bit Manifold license? Have you launched the 64-bit Manifold installation? Recall that 64-bit Manifold installations will install both a 64-bit and a 32-bit Manifold executable so that a 32-bit Manifold version can be launched for compatibility with older, 32-bit software (such as Access); however, if you want to work with 64-bit CUDA in 64-bit Windows you should launch the 64-bit Manifold version. See the 32-bit and 64-bit Manifold Editions topic.
· If you are working with 64-bit Windows have you installed 64-bit NVIDIA CUDA drivers? Did you install the right NVIDIA drivers for your Windows system? For example, if you are running Vista x64 you should probably not expect that a driver package provided specifically for Windows XP x64 will work. Note that as of this writing not all NVIDIA driver packages for all possible Windows versions include CUDA support. Drill down into the nvidia.com web site using search terms such as "CUDA downloads" to see if your Windows version is supported.
· Check to make sure that the Use GPGPU technologies (NVIDIA CUDA) option has been enabled in the Tools - Options - Miscellaneous dialog.
No Surface - Transform dialog
· Have you licensed the optional Surface Tools extension for Manifold and activated it? Launch the Help - About dialog to see if the extension is reported as an installed extension. The Surface Tools extension is a built-in part of Universal Edition and Ultimate Edition and does not require activation if you are using either of those two editions. If you are using some other edition and have licensed Surface Tools, you must activate it by following the instructions in the Installing and Activating a Manifold Extension topic.
· Do you currently have a surface open as the active window or as the active layer in a map? The Surface - Transform dialog is not available if the focus is not on a surface window or a surface layer in a map.
Performance gain not observed
· Launch the Help - About dialog: does Manifold report any CUDA-capable devices installed in the system? If not, see the troubleshooting section above.
· Are you executing functions within the Surface - Transform dialog? CUDA at the present writing works only with functions within that dialog. It makes no difference in other functions or other parts of the system, except of course that since CUDA cards are also very fast graphics cards the performance of 3D terrain rendering will be very good.
· Are you executing functions that are listed as supporting CUDA? Writing formulas in the Surface - Transform dialog that utilize functions not listed as supporting CUDA will not benefit from CUDA.
· Is the data set very small? Some cases of small data compute so rapidly that there is little to gain by using CUDA since even a base system without CUDA will execute the task very rapidly.
· Is the formula very simple? CUDA speeds up computation but not overhead like fetching data from disk. If a computation involves a simple formula there is not much to speed up because the computation will get done very rapidly in any event so the time for the job is mostly overhead.
· Is the data set very large? As with the case of simple formulas, if data sets are very large then overhead tasks like fetching data from disk become proportionately larger compared to the time spent on computation. Since CUDA increases the speed of computation, if computation is a proportionately smaller part of the task compared to overhead there will be less visible effect from CUDA.
· Does your project involve slow system resources? Read the Performance Tips topic carefully to make sure your project and your system are structured for maximum performance. If a project does something that cripples performance, such as using a linked component that must come into the project from a very slow network link, then overhead delays will be very great with or without CUDA.
· Are you sure CUDA has not improved performance? Try timing the task with CUDA on and with CUDA off by turning on and off the option to use CUDA in the Tools - Options - Miscellaneous dialog, as illustrated in the demo above. You may be surprised to find that something which seems to take forever, ten minutes or so, with CUDA indeed takes far longer, like hours or days, without CUDA.
Help - About
Surface - Transform
Tools - Options