H264 and fps [Archive] - Doom9's Forum

View Full Version : H264 and fps

morris_horesh

26th March 2012, 19:10

Hi,

Let us assume that we configure the H264 encoder with the following settings:
- disable all rate control features (no auto/min/max bitrate not auto/min/max GOP, no vbv lookahead and so forth)
- configure a fixed quantization value (say 25)
- configure a fixed GOP size (>1,say 10 frames)

Let us assume that for every frame (audio and video), the container/wrapper adds a timestamp relative to the beginning of the video as the frame's metadata (if I'm not mistaking, flv uses these timestamps).

We will get a stream with the following properties:
1. a stream that the encoder can encode and the decoder can decode.
2. a stream that has a constant quality (fixed quantization)
3. a stream that has a variable bitrate (no rate control)
4. a stream whose bitrate is low compared to just sending i-frames with the same quantization value (we do have p-frames because GOP>1)
6. a stream that a player can play and sync correctly (we have timestamps for every frame)
7. a stream that doesn't require neither the encoder nor the decoder to know or care about the frame rate.

Am I correct ?

Brother John

26th March 2012, 21:58

1. a stream that the encoder can encode and the decoder can decode.
Yes, if you have a non-broken encoder and decoder. But that is not a result of your settings, that’s generally true. If an encoder creates an H.264 compliant videostream with a certain profile/level an H.264 compliant decoder that support that profile/level is able to decode it.

2. a stream that has a constant quality (fixed quantization)
No. Fixed quantizer means fluctuating quality.

3. a stream that has a variable bitrate (no rate control)
Yes. Though that doesn’t imply that enabling rate control would be any less VBR.

4. a stream whose bitrate is low compared to just sending i-frames with the same quantization value (we do have p-frames because GOP>1)
Depends a bit on the video contents. But in all likelihood: yes.

6. a stream that a player can play and sync correctly (we have timestamps for every frame)
Assuming a non-broken player that knows how to handle your chosen container and time stamp scheme and supports the necessary H.264 profile/level: yes.

7. a stream that doesn't require neither the encoder nor the decoder to know or care about the frame rate.
In a way timestamps and framerate are the same thing.

What are you trying to get at?

morris_horesh

26th March 2012, 22:14

thanks for the reply.

Why does fixed quantizer mean fluctuating quality ?

"In a way timestamps and framerate are the same thing" - not necessarily. If the source does not produce frames at a constant rate then there IS no frame rate. At least not in practical terms.

The question at this point is theoretical. I am thinking of a source that produces frames at a rate that is not constant.
In any case, at this point I am not interested in whether or not it would be a good idea to remove the frame rate and work with constant quantization and GOP values. I am interested in whether or not it can be done and whether or not I can expect a lower bitrate than just sending i-frames.

LoRd_MuldeR

26th March 2012, 22:33

thanks for the reply.

Why does fixed quantizer mean fluctuating quality?

Because constant quantizers, as the name implies, can not adapt to the content of the image at all.

Encode the same clip with x264 twice. Once with CRF (constant rate factor, average quantizer) mode and once with CQ (constant quantizer, no adaptive quantization) mode!

The difference should be quite obvious ;)

morris_horesh

26th March 2012, 22:38

Guest

26th March 2012, 22:58

Do the encoder and decoder have to know what the frame rate is ? No. The frame rate can *optionally* be signalled using VUI data in the ES.

Dark Shikari

26th March 2012, 23:01

"Because constant quantizers, as the name implies, can not adapt to the content of the image at all." - that would mean, I think, various frame SIZES (in bytes) not perceptual image qualities.Quantizer != perceptual image quality.

Often the quantizer may have to be varied to maintain constant perceptual image quality.

26th March 2012, 23:08

Do the encoder and decoder have to know what the frame rate is ?

Constant framerate is not required. x264 supports variable framerate encoding and there shouldn't be too much problems playing back VFR MP4 or MKV files on computers and devices that support them. FLV might work too.

However, many common devices (Blu-ray players, TVs, smartphones, ...) have very specific feature sets and they don't necessarily play VFR video properly.

LoRd_MuldeR

26th March 2012, 23:09

"Because constant quantizers, as the name implies, can not adapt to the content of the image at all." - that would mean, I think, various frame SIZES (in bytes) not perceptual image qualities.

Look at these:

Constant Quantizers:
http://img195.imageshack.us/img195/6871/parkruncq.png

Adaptive Quantizers:
http://img804.imageshack.us/img804/2082/parkrunaq.png

Or watch it in motion:
http://www.mediafire.com/file/jhsh617h6s6976q/CQ-vs-AQ.rar

morris_horesh

26th March 2012, 23:28

Look at these:

Constant Quantizers:
http://img195.imageshack.us/img195/6871/parkruncq.png

Adaptive Quantizers:
http://img804.imageshack.us/img804/2082/parkrunaq.png

Or watch it in motion:
http://www.mediafire.com/file/jhsh617h6s6976q/CQ-vs-AQ.rar

Well, I do see the difference but in the context of what I'm doing this means the same perceptual image quality. If I did to human beings what I normally do to images I would end up behind bars for quite a while :)

Thanks

LoRd_MuldeR

26th March 2012, 23:34

Well, I do see the difference but in the context of what I'm doing this means the same perceptual image quality.

Thanks

Well, there are various ways of objectively measuring "quality" indeed (PSNR, SSIM, etc). Some of these metrics predict the perceived quality better than others.

But "perceptual image quality" is quality as perceived by a human being. And, from the example, it is clear that the AQ version has much better perceptual image quality than the CQ version.

I doubt that you will find a single person (with "normal" visual capacity) who wouldn't prefer the AQ version over the CQ version here... ;)

morris_horesh

26th March 2012, 23:40

Well, there are various ways of objectively measuring "quality" indeed (PSNR, SSIM, etc). Some of these metrics predict the perceived quality better than others.

But "perceptual image quality" is quality as perceived by a human being. And, from the example, it is clear that the AQ version has much better perceptual image quality than the CQ version.

I doubt that you will find a single person (with "normal" visual capacity) who wouldn't prefer the AQ version over the CQ version here... ;)

I can still use AQ without knowing the frame rate though, can't I ?

LoRd_MuldeR

26th March 2012, 23:43

I can still use AQ without knowing the frame rate though, can't I ?

Rate-control, as implemented in x264 (qcomp/MB-Tree), does take into account the frame rate. But I think the VAQ algorithm itself can be applied without knowing frame rate.

davidbitton

28th March 2012, 10:57

hello

Morris talk about me David without really know what i say

first here is what really say Morris:

"I took the time to download the x264 code from videolan. If you look at the code you realize that the fps is only needed for rate control (look for i_fps_num). If you disable rate control features then the fps value is ignored throughout the code.

By disabling rate control I mean no average or max bitrate, no automatic min GOP size (also referred to as min key frame interval), no vbv lookahead and so forth.

And yes - the encoder does crate p frames in this mode.
"

Morris think that if he encode frame at non equal intervall time , he will received I P P P frames , so now imagine that you take a first frame at time T and a second frame at time T + 3 seconds and anbother frame at T + 9 seconds , at this intervall time the encoder will always detetc a scene change and the encoder will force I frame , so how can you agree with morris that say that you will received IPPP frames

Second the GOP , by forcing GOP with non equall intervall time , how can you set GOP ? so how can morris be rigth in this way by taking frames at non equal intervall and forcing IPPPP or IPB frames ...

28th March 2012, 12:09

first here is what really say Morris:

"I took the time to download the x264 code from videolan. If you look at the code you realize that the fps is only needed for rate control (look for i_fps_num). If you disable rate control features then the fps value is ignored throughout the code.

By disabling rate control I mean no average or max bitrate, no automatic min GOP size (also referred to as min key frame interval), no vbv lookahead and so forth.

But there's no need to disable any rate control features to encode variable framerate video with x264. Just use constant QP mode if that's what you want.

Morris think that if he encode frame at non equal intervall time , he will received I P P P frames , so now imagine that you take a first frame at time T and a second frame at time T + 3 seconds and anbother frame at T + 9 seconds , at this intervall time the encoder will always detetc a scene change and the encoder will force I frame

Why would it do so always? If you'd encode a static scene, there certainly wouldn't be a scenecut in that interval with a default keyint setting of 250 frames.

Second the GOP , by forcing GOP with non equall intervall time , how can you set GOP ? so how can morris be rigth in this way by taking frames at non equal intervall and forcing IPPPP or IPB frames ...

GOP size (keyframe interval) is determined in number of frames, so I don't see how using VFR would matter, unless you want to specify GOP size as a time interval?

davidbitton

28th March 2012, 13:25

as you know , the video encoder and specially the 264 use intra and inter frame , the 264 work in slice mode too !!!
now inter frame used the motion estimation and compensation ....

when you take a first frame and the second one is taken as an intervall of 10 seconds for example , there is a lot of chance that this frames are really different , and perhaps as you know the motion estimation search for same macroblock in the frame and previous one , if there is a lot of change between the frame the encoder decide to encode it at I frame ...

for this subject i point you too http://forum.doom9.org/showthread.php?t=141791 for example but and if you want an excelent paper go to

http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=5604298

Scene Change Aware Intra-Frame Rate Control for H.264/AVC

ABSTRACT

Most of rate-control research focuses on inter-coded frames, instead of intra-coded frames which are more possible to cause the problem of buffer overflow. This letter presents a rate control algorithm for intra-frame coding. We propose a Taylor series-based rate-QS model and a scene-change aware rate-QS model to determine quantization parameters for general intra frames and scene-change frames, respectively. Simulation results show that compared to competed approaches, the proposed method achieves better and stable quality with low buffer fullness.

for example ...

28th March 2012, 13:35

when you take a first frame and the second one is taken as an intervall of 10 seconds for example , there is a lot of chance that this frames are really different , and perhaps as you know the motion estimation search for same macroblock in the frame and previous one , if there is a lot of change between the frame the encoder decide to encode it at I frame ...

Sure, if you are encoding a moving scene. But you said nothing about the content, so I gave a counterexample of encoding static scenes (webcam footage of your yard, for example), where a scenecut probably wouldn't get detected that soon. Such a source would also be a more likely candidate to be encoded at your <1fps framerates.

And you can always set a minimum GOP size if you want to force those P frames there. In the original post Morris talked about fixed GOP size where you wouldn't have scenecut detection anyway.

davidbitton

28th March 2012, 13:56

ok
first of all morris not talk about me !!!
apologize to morris
i just enter into the subject
for example in security scene , this can be a good canditate , for long time you have zero movement and suddenly many movements

in this way several project try to find solution , and i refer you to Univercity college of london that do an amazing work with thew vic video conferencing and the rat robust audio transmission

one of the point in the vic is to be able to make fps change (i do not enter into the rtp berkley used by them !!!) , i mean adaptive and vic used the x264 that you can config for a min and max gop

davidbitton

29th March 2012, 19:41

Dear nm

i was a little busy , but i really dont understand your approch , you and morris and LoRd_MuldeR think the same
so first i give you this first link :

http://forum.doom9.org/showthread.php?t=113040

second from this link for example :http://webcache.googleusercontent.com/search?q=cache:92zf9DhnBOsJ:htexmexh.byethost13.com/linux/ffmpeg-howto+gop+fps&cd=2&hl=en&ct=clnk&gl=il
you can read
ntra frames per sec = GOP / FPS of input file
IPS = GOP / FPS

10 = 300 / 30

More GOP = More IPS = More compression = Less quality
Less GOP = Less IPS = Less compression = More quality

The default is 300 for many programs. But if you want a constant IPS, then do: GOP = IPS * FPS (input) = 10 * FPS (input)

So for best quality GOP = 10 * input file FPS

and this link too :

http://www.unified-streaming.com/support/documentation/encoding-profiles/

and finnaly this one :

open-gop
Default: none
Open-GOP is an encoding technique which increases efficiency. Some decoders don't fully support open-GOP streams, which is why it hasn't been enabled by default. You should test with all decoders your streams will be played on, or (if that's impossible) wait until support is generally available.
There's an explanation of Open-GOP here.

so now your friend morris say that you and morris disable the rate control , but you will need to used any random quantization number , by setting a quantization you make a loseless compression and your P frame will depend on the intervall that you take any sample and by this way your GOP , so how can you say yes i disable the fps and set the GOP ?

second point is that you speak about Video Encoder , perhaps you and morris think that you can used video encoder for still image streaming , i think that this is really strange

second your friends morris say that no fps , so no time sending this is for video , no !!!! just you know the people create video encoder to encode Video ... perhaps it's something strange to you morris and other in this group , but video mean fps mean try to use motion compensation you know this little invention used in video encoder , with your point of view this module are obsolete so i think that before try to say yes our friends morris right at any price try to think before ...

30th March 2012, 16:20

i was a little busy , but i really dont understand your approch , you and morris and LoRd_MuldeR think the same
so first i give you this first link :

http://forum.doom9.org/showthread.php?t=113040

That's about GOP size of MPEG-2 video on DVD. Hardly relevant for our discussion.

second from this link for example :http://webcache.googleusercontent.com/search?q=cache:92zf9DhnBOsJ:htexmexh.byethost13.com/linux/ffmpeg-howto+gop+fps&cd=2&hl=en&ct=clnk&gl=il
you can read
ntra frames per sec = GOP / FPS of input file

That's an error. GOP size / FPS = keyframe interval in seconds, not keyframes per second.

More GOP = More IPS = More compression = Less quality
Less GOP = Less IPS = Less compression = More quality

Misleading. I guess that's a suggestion for MPEG-2 and MPEG-4 ASP encoding where rounding errors propagate and very long keyframe intervals should be avoided. But short GOPs are also bad for quality.

H.264 doesn't have quality problems with very long GOPs but they are impractical when seeking or starting playback from the middle of a stream.

so now your friend morris say that you and morris disable the rate control

1. I don't have anything to do with Morris or his project.
2. Apparently he thought that disabling rate control altogether helps with encoding VFR video.
3. I suggest not to disable any rate control features for VFR encoding unless you really know what you are doing. Just use the mode that is suitable for your purpose. CQP is a good choice for starting experiments.

but you will need to used any random quantization number , by setting a quantization you make a loseless compression

CQP (>0) is lossy, not lossless.

and your P frame will depend on the intervall that you take any sample and by this way your GOP , so how can you say yes i disable the fps and set the GOP ?

Again, x264's keyframe interval is specified in number of frames, not in seconds. It doesn't matter how far from each other those frames are in time.

second point is that you speak about Video Encoder , perhaps you and morris think that you can used video encoder for still image streaming , i think that this is really strange

If you are referring to my example of "a static scene", I meant ordinary webcam footage where most of the frame usually doesn't change much even in minutes or hours. Webcam and security cam footage are very common applications for streaming video. Probably the majority of all video footage captured and stored today falls to this category.

davidbitton

31st March 2012, 12:32

you really dont response , what is GOP , GOP is a group of pictures!!! GOP can be IPPPPPPPPPPPP frames or IPB ... frames , did you know why people creates P frames or B frames
P frames was create to send some difference between the first and the second frame etc ...

you and morris think , that P frames is nothing , for you and morris p frame can be anything this mean for you and morris you can take I frame at time T and the first P frames at time T + 2 hours , this is wrong !!!

B frame was create for frame that find the same macroblock at the previous frame and the frame after

for you and morris the first frame can be take at Time T the second at time T + 2 hours and the third at time T + 5 hours

in security system no one take the first frame at time T the second at time T + 2 hours and the third at time T + 5 hours

the GOP is a sort of FPS , becuase he forced you to create IPPPPPPPPPPPPPPPPPPPPPPP frame for example and by this he forced you to create intervall time regular

i do not agree with you and morris , and if you so sure by your way show me an example , write write and write you can write what you want show me an example work !!!!

davidbitton

31st March 2012, 13:22

i do several project on video security , and you know a camera send you video and a camera send you a number of frames per second ... so how can you decide to take the first frame at time T the second a time T + 3 hours etc ...

this is not realistic , generally video security not work as you and morris write , video security work with a video motion detection , when the video motion detection find a motion the video encoder begin to work with the special camera that work at setting FPS

big company work like this

so how morris that think that he his the king of video encoder wrote something wrong !!!

about long IPPPPPPPPPPP frames you wrong , because you can find a lot of stienfic paper that wiull explain that at each P frame you introduce a granular error ... and at low bit rate this create big problem

so please Morris can write anything , but this is really wrong ...

morris_horesh

31st March 2012, 20:15

David,

I asked whether or not the h264 encoder MUST know the FPS in order to work properly and create P frames. The answer the forum members gave (an answer that I must say I expected) was no, it doesn't have to know that. You, I think, are dealing with a different question altogether - whether or not it is a good idea to use the encoder that way. To that the forum members (and I) answer - in most cases no, it is not a good idea, largely because of reasons you mentioned.

1st April 2012, 22:47

@morris_horesh: Agreed.

if you so sure by your way show me an example

Consider this video: http://vimeo.com/19660834
The original file (MJPEG within MOV) can be downloaded by registering and logging in to the site.

In that video, frames have been captured at 20 minute intervals (nights are cut out). Let's throw out two of every three frames just for fun to get a 1 frame/hour video (--vf select_every:3,0).

1. Encode in 2-pass mode with SSIM tunings to get usable quality metrics:

x264 --vf select_every:3,0 --tune ssim --ssim --pass 1 --bitrate 6000 -o construction.default.pass1.mkv construction.mov
x264 --vf select_every:3,0 --tune ssim --ssim --pass 2 --bitrate 6000 -o construction.default.pass2.mkv construction.mov

Second pass log:
lavf [info]: 1000x668p 0:1 @ 30/1 fps (vfr)
x264 [info]: using cpu capabilities: MMX2 SSE2Fast SSSE3 FastShuffle SSE4.1 Cache64
x264 [info]: profile High, level 3.1
x264 [info]: frame I:187 Avg QP:19.30 size:104270
x264 [info]: frame P:2678 Avg QP:22.02 size: 78494
x264 [info]: frame B:592 Avg QP:23.01 size: 50015
x264 [info]: consecutive B-frames: 65.8% 34.0% 0.2% 0.0%
x264 [info]: mb I I16..4: 9.3% 24.4% 66.4%
x264 [info]: mb P I16..4: 6.8% 19.1% 35.5% P16..4: 19.8% 12.8% 5.8% 0.0% 0.0% skip: 0.1%
x264 [info]: mb B I16..4: 3.8% 6.0% 15.9% B16..8: 24.6% 15.4% 6.1% direct:24.7% skip: 3.5% L0:35.0% L1:30.9% BI:34.1%
x264 [info]: 8x8 transform intra:29.8% inter:38.0%
x264 [info]: coded y,uvDC,uvAC intra: 89.2% 69.3% 18.8% inter: 88.6% 66.0% 2.4%
x264 [info]: i16 v,h,dc,p: 12% 32% 39% 17%
x264 [info]: i8 v,h,dc,ddl,ddr,vr,hd,vl,hu: 10% 28% 17% 5% 6% 4% 13% 5% 12%
x264 [info]: i4 v,h,dc,ddl,ddr,vr,hd,vl,hu: 13% 26% 13% 6% 8% 5% 14% 5% 10%
x264 [info]: i8c dc,h,v,p: 45% 33% 16% 6%
x264 [info]: Weighted P-Frames: Y:55.3% UV:44.2%
x264 [info]: ref P L0: 46.7% 27.2% 17.8% 5.5% 2.8%
x264 [info]: ref B L0: 85.8% 14.2% 0.0%
x264 [info]: ref B L1: 99.7% 0.3%
x264 [info]: SSIM Mean Y:0.9834460 (17.811db)
x264 [info]: kb/s:5999.18

encoded 3457 frames, 6.28 fps, 5999.20 kb/s

Note that on average x264 decided to use about 17 interframes (P or B) in a GOP.

2. Encode with same settings but force the encoder to use I-frames only:

x264 --vf select_every:3,0 --tune ssim --ssim --pass 1 --bitrate 6000 --keyint 1 -o construction.intra.pass1.mkv construction.mov
x264 --vf select_every:3,0 --tune ssim --ssim --pass 2 --bitrate 6000 --keyint 1 -o construction.intra.pass2.mkv construction.mov

Second pass log:

lavf [info]: 1000x668p 0:1 @ 30/1 fps (vfr)
x264 [info]: using cpu capabilities: MMX2 SSE2Fast SSSE3 FastShuffle SSE4.1 Cache64
x264 [info]: profile High, level 3.1
x264 [info]: frame I:3457 Avg QP:24.73 size: 75014
x264 [info]: mb I I16..4: 8.6% 47.6% 43.8%
x264 [info]: 8x8 transform intra:47.6%
x264 [info]: coded y,uvDC,uvAC intra: 91.4% 66.0% 15.0%
x264 [info]: i16 v,h,dc,p: 11% 28% 42% 19%
x264 [info]: i8 v,h,dc,ddl,ddr,vr,hd,vl,hu: 9% 26% 14% 6% 7% 4% 16% 5% 13%
x264 [info]: i4 v,h,dc,ddl,ddr,vr,hd,vl,hu: 14% 27% 12% 6% 7% 5% 14% 5% 10%
x264 [info]: i8c dc,h,v,p: 46% 32% 16% 5%
x264 [info]: SSIM Mean Y:0.9757943 (16.161db)
x264 [info]: kb/s:5999.41

encoded 3457 frames, 15.60 fps, 5999.42 kb/s

With I-frames only, this 1 frame/hour video would need over 20% higher bitrate to reach the same SSIM quality as with interframes enabled. At lower bitrates the difference is even higher: 1000 kbps with interframes vs. over 1500 kbps intra-only.

All I'm saying is that long intervals in time do not necessarily break interframe encoding for all types of content. P and B frames are useful whenever parts of the frame can be predicted from previous frames. That's often the case with statically mounted cameras even when frame capture intervals are very long.

about long IPPPPPPPPPPP frames you wrong , because you can find a lot of stienfic paper that wiull explain that at each P frame you introduce a granular error ... and at low bit rate this create big problem

Not with H.264. Let's use a single I frame and 3456 P/B frames to encode the same video used above:

x264 --vf select_every:3,0 --tune ssim --ssim --pass 2 --bitrate 6000 -I infinite --no-scenecut -o construction.inter.2.mkv construction.mov

lavf [info]: 1000x668p 0:1 @ 30/1 fps (vfr)
x264 [info]: using cpu capabilities: MMX2 SSE2Fast SSSE3 FastShuffle SSE4.1 Cache64
x264 [info]: profile High, level 3.1
x264 [info]: frame I:1 Avg QP:19.27 size: 58733
x264 [info]: frame P:2864 Avg QP:21.85 size: 79974
x264 [info]: frame B:592 Avg QP:22.86 size: 50992
x264 [info]: consecutive B-frames: 65.8% 34.0% 0.2% 0.0%
x264 [info]: mb I I16..4: 30.0% 26.2% 43.8%
x264 [info]: mb P I16..4: 6.9% 19.4% 37.1% P16..4: 18.7% 12.3% 5.5% 0.0% 0.0% skip: 0.1%
x264 [info]: mb B I16..4: 3.8% 6.0% 16.1% B16..8: 24.6% 15.4% 6.2% direct:24.7% skip: 3.2% L0:35.0% L1:30.7% BI:34.3%
x264 [info]: 8x8 transform intra:30.0% inter:37.4%
x264 [info]: coded y,uvDC,uvAC intra: 89.0% 69.1% 18.3% inter: 88.9% 66.8% 2.6%
x264 [info]: i16 v,h,dc,p: 13% 32% 38% 17%
x264 [info]: i8 v,h,dc,ddl,ddr,vr,hd,vl,hu: 10% 28% 17% 5% 6% 4% 13% 5% 12%
x264 [info]: i4 v,h,dc,ddl,ddr,vr,hd,vl,hu: 13% 26% 13% 6% 8% 5% 14% 5% 10%
x264 [info]: i8c dc,h,v,p: 45% 33% 16% 6%
x264 [info]: Weighted P-Frames: Y:54.6% UV:44.0%
x264 [info]: ref P L0: 43.6% 29.0% 18.5% 5.8% 3.2%
x264 [info]: ref B L0: 85.8% 14.2% 0.0%
x264 [info]: ref B L1: 99.7% 0.3%
x264 [info]: SSIM Mean Y:0.9835787 (17.846db)
x264 [info]: kb/s:5998.62

encoded 3457 frames, 6.56 fps, 5998.63 kb/s

Note that SSIM is even higher than in the encode with default settings where x264 used 187 I-frames.