[UPDATED] Audio FAQ

KpeX · 7th January 2004, 22:46

[tebasuna51 edit]
First of all thanks to original owner of this thread KpeX (Last Activity: 4th March 2008 04:19), I want preserve it but update many things to make this FAQ usefull.
There are many links to pages or downloads broken, I want put them with ().
My edits go always between [], when finish my job I'll rename the thread like UPDATED.
[edit end]

A/V Audio FAQ

Welcome to the reorganization of the Audio FAQs. These FAQs address many commonly asked questions. Your first resource in converting should be to follow the Doom9 guides or the more specific audio guides[I can't recommend now these old guides]. Before posting in the audio forum, make sure you've read over any relevant guides and the FAQ in question. There are several people that worked very hard to make these FAQs an excellent source of information, see the last post in this thread for their credit.

The goal of these FAQs is to inform users about how to use the most common audio formats in the AV world and how to encode, decode, and play back their audio with the highest quality, most elegant, and most efficient procedures. Feel free to PM me or post to this thread regarding any corrections or additions.

If you're new to the Doom9 forums, it is strongly recommended to read these threads about our community's netiquette:
Message about Free Pies
Caps and Special Characters
Why a Good Title is Important

As always, search before posting and remember the forum rules, breaking them will be a quick way to get your thread closed or deleted. Cheers and good luck with your audio.

General Audio FAQ

BeSweet FAQ

SVCD/MP2 Audio FAQ

MP3 FAQ

OGG Vorbis FAQ

AC3 & SPDIF FAQ

DTS FAQ

AAC FAQ

[Recommended software]

When was write this FAQ many solutions was based in BeSweet but the last version (1.5b31) was from 2005-9-8 and the last DSPguru post was in 2005-10-7.
It is time to update the soft talking about solutions born in this forum and other free software developpers.
BTW BeSplit can be still used for some operations. Download the last version here: BeSplit v09b8

Dimzon was make plugins for BeSweet, but also create (2005-12-27) a new tool, BeHappy, using AviSynth decoders and filters.
There are a big thread to explain it. Still can be usefull because is easy to update the GUI with the AviSynth tools updated.

After that (2007-5-18) madshi create eac3to, the big thread of this sub-forum (more than 15,000 replies and 10,000,000 views).
Can extract and manage audio streams from multimedia containers VOB/EVO/(M2)TS and MKV.
Like BeSweet is a command line tool but there are some GUI's related in the first post, and also UsEac3to.
But the last version v3.34 was from 2017-11-17 and we can't hope new improvements (read carefull the first post for little updates).

I can't forget Foobar2000, recommended audio player and also with many plugins to manage and encode audio.
It is supported in https://hydrogenaud.io/ fully recommended for audio discussions and tests.

And free open source audio editor Audacity, needed for hard work.
Now the DirectShow filters recommended are the LAV Filters and they are included in a recommended player MPC-HC, (of course there are others VLC, MPC-BE...)
To obtain info from multimedia files we need MediaInfo.
And the recommended container for multimedia files is Matroska.
To rip DVD's (Video or Audio) use MakeMKV, and also BD's.

But the tool than join many multimedia free and open source software is ffmpeg.
Support many containers, AV formats and filters. We can do many things with it.
Here we can talk only for audio management, we can use a GUI like FFMPEG Audio Encoder or also UsEac3to, but I want explain here a little intro to CLI ffmpeg usage.
Maybe can help to the new users to understand part of the power of this tool. The command line syntax is:

Quote:

ffmpeg [global_options] {[input_file_options] -i input} ... {[output_file_options] output} ...

Where:

* [global_options] can be generic options like (only a few samples):

Code:

-v loglevel     set logging level info
-y              overwrite output files
-n              never overwrite output files
-stats          print progress report during encoding
-hide_banner    do not show ffmpeg version and libs included

* The {}... means than accept many inputs, maybe to mux, or mix, or merge, or ...
and produce many outputs, maybe to demux, or split, or ...

* [input_file_options] are applied to each input, only a usefull sample:

Code:

-drc_scale 0        the AC3/DTS decoder must ignore DRC metadata to output the original volume

* [output_file_options] for each output file. Can be a complex syntax, compossed by 4 parts:

[source to be decoded] [filters over uncompresed audio] [codec to encode] [encoder parameters]

All parts are optionals, without info ffmpeg apply the defaults, for instance:

Quote:

ffmpeg -i input.mkv output.mp3

Select and decode the first audio track of the mkv (for instance a DTS-MA-7.1), downmix the 7.1 to 2.0 and encode with libmp3lame at ABR 128 Kb/s.
But if we want decode the second audio track...

* [source to be decoded] that implies know the track order of input files, but most the times the first track [0] is the video track and after go the audio tracks

Code:

-map i:t        where 'i' is the input file order (0 first) and 't' is the track order inside the file (0 first)
-bsf X_core     in audio we can extract only the core from X = dca/eac3/truehd

Quote:

ffmpeg -i input.mkv -map 0:2 output.mp3

Decode the second audio track. But maybe we want override the default downmix method, or resample the audio to create a audio CD, or ...
Now we need apply filters to select our choices:

* [filters over uncompresed audio] can be simple or very complex, for instance:

Code:

-ar rate            set audio sampling rate (in Hz)
-ac channels        set number of audio channels (apply default downmix method)
-vol volume         change audio volume (256=normal)
-af filters         set audio filters, see examples after.
-filter_complex fc  when a complex syntax is required. Examples after.

Simple audio filters examples:

Code:

-af "aformat=sample_fmts=s16"           Downsample bitdepth
-af "adelay=delays=1600:all=1"          Delay 1600 miliseconds inserting silence
-af "atrim=1.1:8.2"                     Trim audio between 1.1 and 8.2 seconds
-af "atempo=0.959041"                   Slowdown audio (25->23.976) preserving the pitch
-af "aresample=50050, asetrate=48000"   Slowdown audio (25->23.976) changing the pitch
-af "pan=stereo|FL=.3254c0+.2301c2+.2818c4+.1627c5|FR=.3254c1+.2301c2-.1627c4-.2818c5"  dplII downmix

Like you can see we can concatenate two simple filters with a comma ','

Complex audio filters examples:

Quote:

-filter_complex "asplit [f][s]; [f] atrim=0:11 [ff]; [s] atrim=11, adelay=delays=2000:all=1 [ss]; [ff][ss] concat=v=0:a=1 [a]" -map "[a]"
Insert 2 second of silence after the first 11 seconds

-filter_complex "asplit [f][s]; [f] pan=3.1|c0=c0|c1=c1|c2=c2|c3=c3 [r]; [s] pan=stereo|c0=0.5*c4+0.5*c6|c1=0.5*c5+0.5*c7, compand=attacks=0:decays=0

oints=-90/-84|-10/-4|-6/-2|-0/-0.3, aformat=channel_layouts=stereo [d]; [r][d] amerge [a]" -map "[a]"
Downmix 7.1 -> 5.1 presenving the volume, and quality, of first channels and downmix only the surround channels trying to preserve the balance most the time between front and surround channels.

I want explain the more complex last one by parts (see also):

Code:

"asplit [f][s];                                        I use the input two times named now [f] and [s]
[f] pan=3.1|c0=c0|c1=c1|c2=c2|c3=c3 [r];    The four first channels remain untouched and named now [r]
[s] pan=stereo|c0=0.5*c4+0.5*c6|c1=0.5*c5+0.5*c7,  The four last channels are downmixed at half volume
compand=attacks=0:decays=0:points=-90/-84|-10/-4|-6/-2|-0/-0.3,  And are amplified by 2 at low volumes
aformat=channel_layouts=stereo [d];                  The output are formated like stereo and named [d]
[r][d] amerge [a]" Now merge the first 4 chan [r] with the 4 surround downmixed to 2 [d] and named [a]
-map "[a]"                                                           We select [a] to be encoded after

Now the semicolon ';' implies changes in named input and output

After manage the uncompressed samples we want select the output format or codec

* [codec to encode] To know all see the docs.

Code:

-strict -2      Some experimental encoders need this parameter: mlp, truehd, dts
-acodec codec   And: aac, ac3, alac, copy(to extract), eac3, flac, mp2, mp3, opus, pcm_s24le, ...
-f format       For instance to specify the header of a pcm_s24le, we can select wav or w64

* [encoder parameters] See the docs to know all parameters allowed

Code:

-aq quality             set audio quality (VBR codec-specific)
-ab bitrate             set audio bitrate (CBR or ABR)
...
-center_mixlev 0.707    codec specific example for ac3 (recommended)

But if you need a specific encoder or parameter, remember you always can use the 'pipe' method, for instance:

Code:

ffmpeg -i INPUT -map 0:T  FILTERS -acodec pcm_s24le -f wav - | qaac --ignorelength --adts --no-delay -V 91 -o OUTPUT.aac -

Remember to select the codec and format supported by the STDIN of encoder.

I hope this post work like a intro to ffmpeg, in the rest of FAQ I add other examples of use it.

Thread Tools	Search this Thread
Show Printable Version Email this Page	Search this Thread: Advanced Search
Display Modes
Switch to Linear Mode Switch to Hybrid Mode Threaded Mode