You would only use 2-Pass mode to hit a target (average bitrate). For quality-based encoding, you should use 1-Pass CRF mode.
Once you have found the highest possible CRF value that still gives acceptable quality for your eyes (I recommend you start at ~20), you can use that CRF value for all future encodes.
And if you need to limit the "local" bitrate (usually only needed for hardware decoders) you can use CRF+VBV. Note that VBV needs both, a BufSize and a MaxRate.
|