Fast CinemaDNG Processor

High performance software for CinemaDNG processing on GPU

Fast CinemaDNG Processor on CUDA

3D LUT color grading on NVIDIA GPU

3D LUT Transform is massively used for color grading and toning applications. To solve the task of 3D LUT grading, we have developed high performance CUDA kernels that run on existing GPU hardware from NVIDIA. We have implemented various formats for 3D LUTs and achieved very high performance for color grading.

Fast CUDA kernels require to put all initial data into GPU shared memory. This is possible for 3D LUT cubes with dimensions up to 17×17×17 (float) and 33×33×33 (integer). Each point of 3D cube consists of three int or float values and it means that even for the latest NVIDIA GPUs not every 3D LUT could match the size of GPU shared memory.

To utilize 3D LUT for color grading, we've implemented integration of Fast CinemaDNG Processor with 3DLUT Creator software. User can choose any frame from raw sequence and send processed 16-bit TIFF to 3DLUT Creator. That software can prepare 3D LUT which will be sent back to Fast CinemaDNG Processor for realtime color grading on GPU.

3DLUT grading features

  • Input data 16-bit per color channel with arbitrary width and height
  • 2.5D and 3D LUT formats: cube
  • Color representation: RGB, HSV
  • Color grading with arbitrary dimensions of 3d cube
  • Internal color cube resolution up to 65×65×65

Hardware and software

  • CPU Intel Core i7-5930K (Haswell-E, 6 cores, 3.5–3.7 GHz)
  • GPU NVIDIA GeForce GTX 1080 (Pascal, 20 SMM, 2560 cores, 1.6–1.7 GHz)
  • OS Windows 7/8/10 SP1 (x64)
  • CUDA Toolkit 9.1

Performance of 2.5D (HSV) and 3D LUT (RGB) Transforms on GPU

Test images: 16-bit RGB, 2432×1366 (2.5K) and 4032×2192 (4K)
Test info: all data (input and output) in GPU memory, timing measurements include GPU computations only, timing for 2.5K/4K images

  • 2.5D LUT (HSV, 90×30 points) – 0.27 ms / 0.66 ms
  • 2.5D LUT (HSV, 90×117 points) – 0.69 ms / 1.33 ms
  • 3D LUT (HSV, 36×8×8 points) – 0.57 ms / 0.83 ms
  • 3D LUT (HSV, 36×29×16 points) – 1.3 ms / 2.4 ms
  • 3D LUT (HSV, 36×56×61 points) – 8.4 ms / 24 ms
  • 3D LUT (RGB, 33×33×33) – 1.6 ms / 2.8 ms