Tensor cores for vectored multiple double arithmetic in PHCv2.4.93.

The simpleTensorCoreGemm.cu is a simpification of the dmmaTensorCoreGemm.cu
of the CUDA samples collections, leaving only the simple_wmma_gemm_kernel,
replacing the random numbers with an example to motivate higher precision,
using sequences 2**k, for k from 0 to n.  As n grows larger than 53,
the sum of 2**k can no longer represented correctly by 64-bit floats.

Vectored multiple double arithmetic allows the application of tensor cores
to obtain accurate results, for which data staging algorithms are defined.

The code benefitted from collaboration with Howard Chen.

------------------------------------------------------------------------------
file name                     : short description
------------------------------------------------------------------------------
simpleTensorCoreGemm          : simple wgma kernel for matrix multiplication
smDMMA_dims                   : defines the dimensions of the multiplication
smDMMA_host                   : code on the host for the multiplication
smDMMA_kernels                : code on the device for the multiplication
test_smDMMA                   : main test program for the simple kernel
------------------------------------------------------------------------------
bits_of_pi                    : test on the bits of an approximation for pi
splitting_doubles             : halving and quartering fractions of doubles
test_splitting_doubles        : tests on halving and quartering doubles
double_matrix_multiplications : matrix matrix multiplications with doubles
test_mmm                      : tests double matrix matrix multiplications
------------------------------------------------------------------------------
ddmm_host                     : CPU double double matrix multiplications
ddmm_kernels                  : GPU double double matrix multiplications
test_ddmm                     : tests double double matrix multiplications
------------------------------------------------------------------------------
vectored_double_doubles       : defines vectored double double arithmetic
test_vdd                      : tests vectored double double arithmetic
vectored_quad_doubles         : defines vectored quad double arithmetic
test_vqd                      : tests vectored quad double arithmetic
vectored_octo_doubles         : defines vectored octo double arithmetic
test_vod                      : tests vectored octo double arithmetic
vectored_hexa_doubles         : defines vectored hexa double arithmetic
test_vhd                      : tests vectored hexa double arithmetic
------------------------------------------------------------------------------
test_dd_setup                 : tests setup for double double multiplication
------------------------------------------------------------------------------
