View Full Version : Add a "Who's talking" display to video
Smite
20th March 2014, 02:25
I've been attempting this on my own but haven't gotten anything to work out and I haven't seen a discussion on it to help point me in the right direction. I'm looking for a way to visually identify when noise comes from audio clips. I'd appreciate any advice anyone might have. If you've ever seen a mumble or skype HUD displaying this sort of info. while in a large call, that is essentially what I'm looking to achieve.
background: I put together a lot of video game footage of multiple players in a grid format, the audio is their commentary. (example 4-person layout (http://i.imgur.com/ZaDb6ah.png)) The audio is all individual files until I mix them together, which I think is necessary for what I want to achieve here.
So I've been trying to find a way with avisynth to do a simple effect for when an individual is talking, such as temporarily changing the color of their name or displaying a speaker icon. This would be beneficial because I generally create groups of 4-8 and it would be useful to see who is speaking visually.
I figure that if I can find a way to check an audio clip for dramatic changes in volume, I can possibly apply these periodic effects. Is there a way? I've tooled around with avisynth for a while now but I think what I'm looking to do is beyond me to work out.
Guest
20th March 2014, 02:45
That sounds hard to do in Avisynth. If I had to accomplish this to save my life, I would first preprocess the separate audio files to create a "talking map". Then I would write an avisynth filter to use the talking map to overlay a sprite(s) on the video in the appropriate position(s).
poisondeathray
20th March 2014, 03:24
Is it noise/speech vs silence ?
One way to do it is with conditionalfilter() and minmaxaudio.dll using AudioRMS()
You need 4 invididual sets of audio & video , and you can combine them after with stackhorizontal/stackvertical
The replacement "layer" can be anything, you can overlay a logo, make a pointer, overlay a border like skype, change the color whenever audio is detected above a threshold value
In this example , I used the alternating audio on/off from colorbars() channel 2 as the audio source, and the "replacement" whenever audio is detected is just a darkened version
colorbars(pixel_type="yv12")
trim(0,300)
getchannel(2)
orig=last
replace=orig.levels(0,0.2,255,0,255,false)
ConditionalFilter(orig,replace,orig,"AudioRMS(0)", ">", "-50", show=true)
raffriff42
20th March 2014, 08:43
@poisondeathray, good idea, but I have some refinements: variable transparency and a little decay time to cut down on flickering. LoadPlugin("MinMaxAudio\Release\MinMaxAudio.dll")
A1=WavSource("a1.wav") ## uncompressed audio is a lot faster due to runtime analysis!
A2=WavSource("a2.wav")
#A3=...
AviSource("v.avi")
debug=true ## set to true for adjusting the mask windows
overlay_1 = Subtitle("VOICE ONE", x=24, y=32, size=56)
AudioLevelOverlay(A1, overlay_1,
\ 16, 26, 512, 80, showmask=debug)
overlay_2 = Subtitle("VOICE TWO", x=Width-524, y=32, size=56)
AudioLevelOverlay(A2, overlay_2,
\ Width-532, 26, 512, 80, showmask=debug)
#overlay_3 = ...
#AudioDub(final_audio_mix)
return Last
##################################
### show overlay clip only when there is audio
### http://forum.doom9.org/showthread.php?p=1674312#post1674312
##
## @ C - base clip
## @ A - audio
## @ O - overlay
## @ x, y, wid, hgt - position & size of mask window
## @ boost - overall level boost (fudge factor) (default 18)
## @ gate - ignore audio under (-gate+boost) dB; (default 20)
## NOTE "boost" and "gate" are shared among all instances of
## this function (thanks Gavino).
## * if the overlay does not get fully opaque, increase boost;
## * if the overlay shows up when it shouldn't, increase gate.
## For example, I used boost=24, gate=12 on a muddy source.
## @ showmask - for setting window size & position
##
function AudioLevelOverlay(clip C, clip A, clip O,
\ int x, int y, int wid, int hgt,
\ int "boost", int "gate", bool "showmask", string "mode")
{
Assert(O.Width==C.Width && O.Height==C.Height,
\ "AudioLevelOverlay: overlay must be same size as base clip")
global boost = Min(Max( 0, Default(boost, 18)), 24)
global gate = Min(Max(0, Default(gate, 20)), 60)
showmask = Default(showmask, false)
mode = Default(mode, "blend")
AudioDub(C, A.AmplifyDB(-6).AudioEcho.Normalize(1))
S = ScriptClip(Last.Crop(0, 0, wid, hgt), """
x = Min(Max(0, Round(AudioRMS(0))+gate+boost), 255)*255/gate
return Last.BlankClip(color=to_rgb(x))""")
M = Overlay(C.BlankClip, S, x=x, y=y).ConvertToY8
return (showmask)
\ ? C.Overlay(M, opacity=0.5, mode="add")
\ : C.Overlay(O, mask=M, mode=mode)
}
function AudioEcho(clip A, float "delay", float "mix") {
delay = Min(Max(0.01, Float(Default(delay, 0.33))), 5.0)
mix = Min(Max(0.0, Float(Default(mix, 0.33))), 1.0)
return A.MixAudio(A.AudioTrim(0, delay)+A, (1.0-mix), mix)
}
function to_rgb(int r, int "g", int "b") {
r = Min(Max(0, r), 255) ## thanks Gavino
g = Min(Max(0, Default(g, r)), 255)
b = Min(Max(0, Default(b, r)), 255)
return (r*65536) + (g*256) + b
}
Gavino
20th March 2014, 11:37
Neat idea, raffriff42 (and poisondeathray for the basic method).
Note that the global variables will cause problems if you ever want to call AudioLevelOverlay() more than once in a script with different values of 'gate' and/or 'boost'. A better way to pass arguments into a run-time script is to use the 'args' parameter of the GRunT run-time filters.
Also, the setting of the mask levels seems to be incorrect.
x = Max(0, Round(AudioRMS(0))+gate+boost)*255/gate
If I understand, the intention is to make the overlay start to become visible for audio levels above -gate, and fully opaque when it reaches -boost. However, with this code it becomes visible at -(gate+boost) and the opacity overshoots 255 (and hence wraps around to zero) at -boost.
(Perhaps function to_rgb() should limit its arguments to 255).
ajk
20th March 2014, 12:44
An alternate suggestion - You could use a plugin such as AudioGraph() to add a visual representation of the audio on each of the clips. With a bit of other AviSynth scripting you could make it as visible or unobtrusive as you prefer.
raffriff42
20th March 2014, 23:43
Thanks for the help, Gavino. I've fixed (in blue) some of the issues you mention, but not this one:A better way to pass arguments into a run-time script is to use the 'args' parameter of the GRunT run-time filters.I confess I have not tried GRunT. How do I get started with it?
For now, I have left the globals as they are, with a caveat.
Re: gate & boost, I messed around trying to meet your specification, but then I realized I was originally thinking of a microphone "boost" switch on an audio mixer, active *before* the noise gate. So your second wording is describes what the settings do, except the wraparound problem is fixed.
Gavino
21st March 2014, 00:32
I confess I have not tried GRunT. How do I get started with it?
Basically, loading (or auto-loading) GRunT.dll replaces the built-in run-time filters (ScriptClip, etc) with extended versions that have a couple of extra arguments, providing (among other things) a simple, natural and robust way to pass variables into a run-time script from 'outside'.
In your function, you could use:
S = ScriptClip(Last.Crop(0, 0, wid, hgt), """
x = Min(Max(0, Round(AudioRMS(0))+gate+boost)*255/gate), 255)
return Last.BlankClip(color=to_rgb(x))""", args="gate,boost")
For more details, see the description in the GRunT thread (and the supplied doc), which should hopefully tell you all you need to know.
wonkey_monkey
21st March 2014, 17:37
An alternate suggestion - You could use a plugin such as AudioGraph()
Or waveform (http://forum.doom9.org/showthread.php?t=165703), which supports more (and looks nicer with some) colour spaces and has a few extra features :)
David
StainlessS
21st March 2014, 19:38
+1 on Waveform, much nicer.
raffriff42
22nd March 2014, 04:56
Waveform works nicely - for example you could size & move each audio waveform under the matching speaker's names.
Here's a demo video for the script above: I'm no animator, but I managed to prepare a still "base" image and 2 "glow" versions, one for each speaker.
https://www.dropbox.com/s/xtvr6xm1gt5wbnr/A%2BC-cartoon-2shot-568x320-C1.jpg?raw=1 (http://youtu.be/X4Bkx0HosPw)
"Who's talking" effect - avisynth (youtu.be) (http://youtu.be/X4Bkx0HosPw)
This task required Overlay mode="lighten" because the glow effects overlap one another, with no mask rectangle used (ie, rectangle = full screen), so script has a new "mode" option.
Another idea - modulate *position* instead of opacity; make an overlay "sprite" jiggle when someone talks.
vBulletin® v3.8.11, Copyright ©2000-2025, vBulletin Solutions Inc.