Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > Video Encoding > High Efficiency Video Coding (HEVC)

Reply
 
Thread Tools Search this Thread Display Modes
Old 9th May 2020, 10:18   #1  |  Link
alxscott
Registered User
 
Join Date: Aug 2013
Posts: 9
HW / GPU Tone Mapping with FFMPEG

Hi,

Quite simply a cry for help!

I'm running a Linux media server, and recently bought a cheap NVIDIA P400 ( Linux Stream Patch Applied) for HW Accelerated Remux when away from home.


I want to do HDR -> SDR Tonemap on the few 4K HDR movies don't also have the blurray for. I've compiled FFMPEG with appropriate flags and managed to run:
Code:
ffmpeg -hwaccel nvdec -init_hw_device opencl=ocl -filter_hw_device ocl -threads 4 -extra_hw_frames 3 -I INPUT.mkv -vf "format=p010,hwupload,tonemap_opencl=t=bt709:r=tv:p=bt709:m=bt709:tonemap=hable:format=p010,hwdownload,format=p010" OUTPUT
Problem is it is no faster, if not slower, than software tone map.


Can anyone give me a simple 10BIT HDR -> 8BIT SDR HW Accelerated command using either Hable or Mobius? FYI Linux server, CLI only.

I've spent over a week searching and trying... mainly failing!

Thanks. Alex
alxscott is offline   Reply With Quote
Old 9th May 2020, 13:16   #2  |  Link
richardpl
Registered User
 
Join Date: Jan 2012
Posts: 272
Why you expect miracles from cheap card?
richardpl is offline   Reply With Quote
Old 9th May 2020, 15:26   #3  |  Link
alxscott
Registered User
 
Join Date: Aug 2013
Posts: 9
Quote:
Originally Posted by richardpl View Post
Why you expect miracles from cheap card?
Can you please explain how I am expecting miracles?

Software Tonemapping via zscale is limited to single thread processing.

GPU processing by its very nature is designed for parallel processing... hence why they are used in graphical processing.

The P400 has two cores which should outperform a single CPU core.

I will happy be corrected on any of the above?

Also; if someone can confirm my command is correct and is the most optimised it can be then I will thank them and be content.

EDIT: Just to add; the GPU isn’t at 100% utilisation so improvements are possible.

Last edited by alxscott; 9th May 2020 at 15:50.
alxscott is offline   Reply With Quote
Old 9th May 2020, 15:33   #4  |  Link
Sagittaire
Testeur de codecs
 
Sagittaire's Avatar
 
Join Date: May 2003
Location: France
Posts: 2,484
Quote:
Originally Posted by alxscott View Post
Can you please explain how I am expecting miracles?

Software Tonemapping via zscale is limited to single thread processing.

GPU processing by its very nature is designed for parallel processing... hence why they are used in graphical processing.

The P400 has two cores which should outperform a single CPU core.

I will happy be corrected on any of the above?

Also; if someone can confirm my command is correct and is the most optimised it can be then I will thank them and be content.
well it's simple resize ... only on thread can make that at correct speed (certainely really higher speed than realtime source speed).
__________________
Le Sagittaire ... ;-)

1- Ateme AVC or x264
2- VP7 or RV10 only for anime
3- XviD, DivX or WMV9
Sagittaire is offline   Reply With Quote
Old 9th May 2020, 15:41   #5  |  Link
videoh
Useful n00b
 
Join Date: Jul 2014
Posts: 1,667
Quote:
Originally Posted by alxscott View Post
The P400 has two cores which should outperform a single CPU core.
Any gain could be dwarfed by the overhead of sending the frame up to the GPU and back for your card. Your card has 256 CUDA cores. Not so terrible.

Last edited by videoh; 9th May 2020 at 19:39.
videoh is offline   Reply With Quote
Old 9th May 2020, 16:12   #6  |  Link
alxscott
Registered User
 
Join Date: Aug 2013
Posts: 9
Quote:
Originally Posted by videoh View Post
Any gain is dwarfed by the overhead of sending the frame up to the GPU and back for your card. richardpl is quite right. For comparison, my 2080 Ti has 4352 cores.
Well; that’s a few more cores! Lol

I was under the impression that due to the flags in the command everything was kept in the GPU memory buffer as it was hardware decoded; tonemapped then hardware encoded?

No zscaling or the like that would need it to be transferred back? That’s what I understood from NVIDIA guide https://devblogs.nvidia.com/nvidia-f...scoding-guide/
alxscott is offline   Reply With Quote
Old 9th May 2020, 16:24   #7  |  Link
videoh
Useful n00b
 
Join Date: Jul 2014
Posts: 1,667
I looked up your card and it is not so bad: 256 CUDA cores, PCIe 3.0, DDR5, 64-bit memory interface. Seems like you should be able to get some gain. I don't know enough about ffmpeg to know if it can properly tap the card's power in your use case. Is it using CUDA cores for the tone mapping and/or resizing?

Last edited by videoh; 9th May 2020 at 16:27.
videoh is offline   Reply With Quote
Old 9th May 2020, 16:32   #8  |  Link
alxscott
Registered User
 
Join Date: Aug 2013
Posts: 9
Quote:
Originally Posted by videoh View Post
I looked up your card and it is not so bad: 256 CUDA cores, PCIe 3.0, DDR5, 64-bit memory interface. Seems like you should be able to get some gain. I don't know enough about ffmpeg to know if it can properly tap the card's power in your use case. Is it using CUDA cores for the tone mapping and/or resizing?
As far as I know; the command posted should just be using the CUDA cores for tone mapping; no resizing. Resizing is on my “would be nice but not essential” list as I’m just trying to have a SDR version for the media for remote viewing when Plex transcodes due to slow connection. At the minute the server can cope with the transcode but there’s no tonemapping so the colours are washed out.
alxscott is offline   Reply With Quote
Old 9th May 2020, 16:36   #9  |  Link
videoh
Useful n00b
 
Join Date: Jul 2014
Posts: 1,667
It's using opencl and who knows what that is doing?

If you have Avisynth+ installed you could use my CUDA DGHDRtoSDR() filter and play the script in your player (if it supports that), or transcode by opening the script with your encoder.

EDIT: Oops, you are on linux.

Last edited by videoh; 9th May 2020 at 16:40.
videoh is offline   Reply With Quote
Old 9th May 2020, 16:52   #10  |  Link
alxscott
Registered User
 
Join Date: Aug 2013
Posts: 9
Quote:
Originally Posted by videoh View Post
It's using opencl and who knows what that is doing?

If you have Avisynth+ installed you could use my CUDA DGHDRtoSDR() filter and play the script in your player (if it supports that), or transcode by opening the script with your encoder.

EDIT: Oops, you are on linux.
Haha; Id seen your filter and Avisynth+ earlier in the week and then had the same realisation!! Thanks for the suggestion tho!
alxscott is offline   Reply With Quote
Old 9th May 2020, 16:59   #11  |  Link
RanmaCanada
Registered User
 
Join Date: May 2009
Posts: 328
AFAIK, Plex, Emby, Jellfyfin, et all do NOT support this. In fact none of them can get it right because it's not currently possible. Intel is adding this feature into Icelake.

If you want to watch these movies, watch them on the proper hardware.
RanmaCanada is offline   Reply With Quote
Old 9th May 2020, 17:06   #12  |  Link
alxscott
Registered User
 
Join Date: Aug 2013
Posts: 9
Quote:
Originally Posted by RanmaCanada View Post
AFAIK, Plex, Emby, Jellfyfin, et all do NOT support this. In fact none of them can get it right because it's not currently possible. Intel is adding this feature into Icelake.

If you want to watch these movies, watch them on the proper hardware.
Plex, Emby, Jellyfin etc. defiantly do not support server side Tonemapping.

Plex, where using mpv as its player can support tone mapping client side to SDR ( using usual filters ... Hable , Mobius, Linear etc. )

This is why I’m using FFMPEG separately to generate a HDR -> SDR version. I know this is possible as I have done it using both software and hardware, as shown above. My question is for a more optimised command as currently hardware transcoding is running slower than CPU transcoding.

EDIT: Just to mention I am using proper hardware; 2019 NVIDIA Pro and Dolby Atmos capable surround sound. This is for when I need to remote view.
alxscott is offline   Reply With Quote
Old 9th May 2020, 19:01   #13  |  Link
richardpl
Registered User
 
Join Date: Jan 2012
Posts: 272
Well, if other, much simpler opencl filters, like avgblur_opencl are also slower than CPU variant than nothing can be done.
richardpl is offline   Reply With Quote
Old 9th May 2020, 19:19   #14  |  Link
videoh
Useful n00b
 
Join Date: Jul 2014
Posts: 1,667
Quote:
Originally Posted by richardpl View Post
Well, if other, much simpler opencl filters, like avgblur_opencl are also slower than CPU variant than nothing can be done.
Why not screw opencl and use real CUDA? Too hard for ffmpeg devs? Not as lucrative as AC4 decoding?

Last edited by videoh; 9th May 2020 at 20:03.
videoh is offline   Reply With Quote
Old 9th May 2020, 21:19   #15  |  Link
alxscott
Registered User
 
Join Date: Aug 2013
Posts: 9
Quote:
Originally Posted by richardpl View Post
Well, if other, much simpler opencl filters, like avgblur_opencl are also slower than CPU variant than nothing can be done.
That’s a really good shout; I hadn’t tried a different filter to see! I’ll see what avgblur performs like!
alxscott is offline   Reply With Quote
Old 12th May 2020, 14:05   #16  |  Link
alxscott
Registered User
 
Join Date: Aug 2013
Posts: 9
Solved

I managed to find a good compromise in quality and speed; with satisfactory results which for me were : FHD SDR tonemapped output.

For anyone else; this is the command I've decided to use going forward:

Code:
ffmpeg -vsync 0 -hwaccel cuda -init_hw_device opencl=ocl -filter_hw_device ocl -extra_hw_frames 3 -threads 16 -c:v hevc_cuvid -resize 1920x1080 -i INPUT -vf "format=p010,hwupload,tonemap_opencl=tonemap=mobius:param=0.01:desat=0:r=tv:p=bt709:t=bt709:m=bt709:format=nv12,hwdownload,format=nv12" -c:a copy -c:s copy -c:v libx264 -max_muxing_queue_size 9999 OUTPUT
This completed a conversion from a 4K UHD Dsic Rip MKV in 67m27s, compared to a file runtime of 2h01m49s so and average conversion rate of 1.8x.

Just as a note, I decided to use software encoding due to the significant quality increase but minimal time increase for encoding .

Thank you all.
alxscott is offline   Reply With Quote
Old 17th May 2020, 18:23   #17  |  Link
foxyshadis
ангел смерти
 
foxyshadis's Avatar
 
Join Date: Nov 2004
Location: Lost
Posts: 9,558
Have you tried without hwupload? I don't know for sure, but that might be forcing a download and re-upload, or it might be a no-op since you didn't explicitly download it.
foxyshadis is offline   Reply With Quote
Old 20th May 2020, 08:31   #18  |  Link
alxscott
Registered User
 
Join Date: Aug 2013
Posts: 9
Quote:
Originally Posted by foxyshadis View Post
Have you tried without hwupload? I don't know for sure, but that might be forcing a download and re-upload, or it might be a no-op since you didn't explicitly download it.
Hey! I did try removing the hwupload/download arguments; especially when I was using the GPU encoder... but it kept throwing an error. As you can tell no FFmpeg expert but I think it was because I was changing colour space???
alxscott is offline   Reply With Quote
Reply

Tags
hdr, nvidia, opencl, sdr, tonemap

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 00:53.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.