* -# The \link cudpp_kernel Kernel-Level API\endlink comprises functions
* that run entirely on the GPU across an entire grid of thread blocks.
* These functions may call into the \link cudpp_cta CTA-Level API\endlink
* below them.
* -# The \link cudpp_cta CTA-Level API\endlink comprises functions that run
* entirely on the GPU within a single Cooperative Thread Array (CTA,
* aka thread block). These are low-level functions that implement core
* data-parallel algorithms, typically by processing data within shared
* (CUDA \c __shared__) memory.
*
* Programmers may use any of the lower three CUDPP layers in their own
* programs by building the source directly into their application. However,
* the typical usage of CUDPP is to link to the library and invoke functions in
* the CUDPP \link publicInterface Public Interface\endlink, as in the
* \ref example_simpleCUDPP "simpleCUDPP", satGL, and cudpp_testrig application
* examples included in the CUDPP distribution.
*
* In the future, if and when CUDA supports building device-level libraries, we
* hope to enhance CUDPP to ease the use of CUDPP internal algorithms at all
* levels.
*
* \subsection uses Use Cases
* We expect the normal use of CUDPP will be in one of two ways:
* -# Linking the CUDPP library against another application.
* -# Running our "test" application, cudpp_testrig, that exercises
* CUDPP functionality.
*
* \section references References
* The following publications describe work incorporated in CUDPP.
*
* - Mark Harris, Shubhabrata Sengupta, and John D. Owens. "Parallel Prefix Sum (Scan) with CUDA". In Hubert Nguyen, editor, <i>GPU Gems 3</i>, chapter 39, pages 851–876. Addison Wesley, August 2007. http://graphics.idav.ucdavis.edu/publications/print_pub?pub_id=916
* - Shubhabrata Sengupta, Mark Harris, Yao Zhang, and John D. Owens. "Scan Primitives for GPU Computing". In <i>Graphics Hardware 2007</i>, pages 97–106, August 2007. http://graphics.idav.ucdavis.edu/publications/print_pub?pub_id=915