View Full Version : HW / GPU Tone Mapping with FFMPEG
alxscott
9th May 2020, 10:18
Hi,
Quite simply a cry for help!
I'm running a Linux media server, and recently bought a cheap NVIDIA P400 ( Linux Stream Patch Applied) for HW Accelerated Remux when away from home.
I want to do HDR -> SDR Tonemap on the few 4K HDR movies don't also have the blurray for. I've compiled FFMPEG with appropriate flags and managed to run:
ffmpeg -hwaccel nvdec -init_hw_device opencl=ocl -filter_hw_device ocl -threads 4 -extra_hw_frames 3 -I INPUT.mkv -vf "format=p010,hwupload,tonemap_opencl=t=bt709:r=tv:p=bt709:m=bt709:tonemap=hable:format=p010,hwdownload,format=p010" OUTPUT
Problem is it is no faster, if not slower, than software tone map.
Can anyone give me a simple 10BIT HDR -> 8BIT SDR HW Accelerated command using either Hable or Mobius? FYI Linux server, CLI only.
I've spent over a week searching and trying... mainly failing!
Thanks. Alex
richardpl
9th May 2020, 13:16
Why you expect miracles from cheap card?
alxscott
9th May 2020, 15:26
Why you expect miracles from cheap card?
Can you please explain how I am expecting miracles?
Software Tonemapping via zscale is limited to single thread processing.
GPU processing by its very nature is designed for parallel processing... hence why they are used in graphical processing.
The P400 has two cores which should outperform a single CPU core.
I will happy be corrected on any of the above?
Also; if someone can confirm my command is correct and is the most optimised it can be then I will thank them and be content.
EDIT: Just to add; the GPU isn’t at 100% utilisation so improvements are possible.
Sagittaire
9th May 2020, 15:33
Can you please explain how I am expecting miracles?
Software Tonemapping via zscale is limited to single thread processing.
GPU processing by its very nature is designed for parallel processing... hence why they are used in graphical processing.
The P400 has two cores which should outperform a single CPU core.
I will happy be corrected on any of the above?
Also; if someone can confirm my command is correct and is the most optimised it can be then I will thank them and be content.
well it's simple resize ... only on thread can make that at correct speed (certainely really higher speed than realtime source speed).
videoh
9th May 2020, 15:41
The P400 has two cores which should outperform a single CPU core. Any gain could be dwarfed by the overhead of sending the frame up to the GPU and back for your card. Your card has 256 CUDA cores. Not so terrible.
alxscott
9th May 2020, 16:12
Any gain is dwarfed by the overhead of sending the frame up to the GPU and back for your card. richardpl is quite right. For comparison, my 2080 Ti has 4352 cores.
Well; that’s a few more cores! Lol
I was under the impression that due to the flags in the command everything was kept in the GPU memory buffer as it was hardware decoded; tonemapped then hardware encoded?
No zscaling or the like that would need it to be transferred back? That’s what I understood from NVIDIA guide https://devblogs.nvidia.com/nvidia-ffmpeg-transcoding-guide/
videoh
9th May 2020, 16:24
I looked up your card and it is not so bad: 256 CUDA cores, PCIe 3.0, DDR5, 64-bit memory interface. Seems like you should be able to get some gain. I don't know enough about ffmpeg to know if it can properly tap the card's power in your use case. Is it using CUDA cores for the tone mapping and/or resizing?
alxscott
9th May 2020, 16:32
I looked up your card and it is not so bad: 256 CUDA cores, PCIe 3.0, DDR5, 64-bit memory interface. Seems like you should be able to get some gain. I don't know enough about ffmpeg to know if it can properly tap the card's power in your use case. Is it using CUDA cores for the tone mapping and/or resizing?
As far as I know; the command posted should just be using the CUDA cores for tone mapping; no resizing. Resizing is on my “would be nice but not essential” list as I’m just trying to have a SDR version for the media for remote viewing when Plex transcodes due to slow connection. At the minute the server can cope with the transcode but there’s no tonemapping so the colours are washed out.
videoh
9th May 2020, 16:36
It's using opencl and who knows what that is doing?
If you have Avisynth+ installed you could use my CUDA DGHDRtoSDR() filter and play the script in your player (if it supports that), or transcode by opening the script with your encoder.
EDIT: Oops, you are on linux.
alxscott
9th May 2020, 16:52
It's using opencl and who knows what that is doing?
If you have Avisynth+ installed you could use my CUDA DGHDRtoSDR() filter and play the script in your player (if it supports that), or transcode by opening the script with your encoder.
EDIT: Oops, you are on linux.
Haha; Id seen your filter and Avisynth+ earlier in the week and then had the same realisation!! Thanks for the suggestion tho! :)
RanmaCanada
9th May 2020, 16:59
AFAIK, Plex, Emby, Jellfyfin, et all do NOT support this. In fact none of them can get it right because it's not currently possible. Intel is adding this feature into Icelake.
If you want to watch these movies, watch them on the proper hardware.
alxscott
9th May 2020, 17:06
AFAIK, Plex, Emby, Jellfyfin, et all do NOT support this. In fact none of them can get it right because it's not currently possible. Intel is adding this feature into Icelake.
If you want to watch these movies, watch them on the proper hardware.
Plex, Emby, Jellyfin etc. defiantly do not support server side Tonemapping.
Plex, where using mpv as its player can support tone mapping client side to SDR ( using usual filters ... Hable , Mobius, Linear etc. )
This is why I’m using FFMPEG separately to generate a HDR -> SDR version. I know this is possible as I have done it using both software and hardware, as shown above. My question is for a more optimised command as currently hardware transcoding is running slower than CPU transcoding.
EDIT: Just to mention I am using proper hardware; 2019 NVIDIA Pro and Dolby Atmos capable surround sound. This is for when I need to remote view.
richardpl
9th May 2020, 19:01
Well, if other, much simpler opencl filters, like avgblur_opencl are also slower than CPU variant than nothing can be done.
videoh
9th May 2020, 19:19
Well, if other, much simpler opencl filters, like avgblur_opencl are also slower than CPU variant than nothing can be done. Why not screw opencl and use real CUDA? Too hard for ffmpeg devs? Not as lucrative as AC4 decoding?
alxscott
9th May 2020, 21:19
Well, if other, much simpler opencl filters, like avgblur_opencl are also slower than CPU variant than nothing can be done.
That’s a really good shout; I hadn’t tried a different filter to see! I’ll see what avgblur performs like!
alxscott
12th May 2020, 14:05
I managed to find a good compromise in quality and speed; with satisfactory results which for me were : FHD SDR tonemapped output.
For anyone else; this is the command I've decided to use going forward:
ffmpeg -vsync 0 -hwaccel cuda -init_hw_device opencl=ocl -filter_hw_device ocl -extra_hw_frames 3 -threads 16 -c:v hevc_cuvid -resize 1920x1080 -i INPUT -vf "format=p010,hwupload,tonemap_opencl=tonemap=mobius:param=0.01:desat=0:r=tv:p=bt709:t=bt709:m=bt709:format=nv12,hwdownload,format=nv12" -c:a copy -c:s copy -c:v libx264 -max_muxing_queue_size 9999 OUTPUT
This completed a conversion from a 4K UHD Dsic Rip MKV in 67m27s, compared to a file runtime of 2h01m49s so and average conversion rate of 1.8x.
Just as a note, I decided to use software encoding due to the significant quality increase but minimal time increase for encoding .
Thank you all.
foxyshadis
17th May 2020, 18:23
Have you tried without hwupload? I don't know for sure, but that might be forcing a download and re-upload, or it might be a no-op since you didn't explicitly download it.
alxscott
20th May 2020, 08:31
Have you tried without hwupload? I don't know for sure, but that might be forcing a download and re-upload, or it might be a no-op since you didn't explicitly download it.
Hey! I did try removing the hwupload/download arguments; especially when I was using the GPU encoder... but it kept throwing an error. As you can tell no FFmpeg expert but I think it was because I was changing colour space???
vBulletin® v3.8.11, Copyright ©2000-2026, vBulletin Solutions Inc.