Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion. Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules. |
12th April 2024, 13:28 | #1 | Link |
Broadcast Encoder
Join Date: Nov 2013
Location: Royal Borough of Kensington & Chelsea, UK
Posts: 3,040
|
Multichannel 5.1 speed up + pitch adjustment
Hi there,
I have a 24p movie that I'm trying to speed up by 4% and pitch adjust so that it's gonna be 25p. This is not something new as I've been doing it other times for stereo contents by using TimeStretch(). For stereo contents, to avoid having the left and right channels go out of phase and generate flanger effect (i.e the audio oscillating constantly from left to right) I use MergeChannels() before calling TimeStretch() so that both channels are processed together and timestretch is aware of it so that it doesn't make them out of phase. In other words: Code:
#Indexing 24p video & audio video=LWLibavVideoSource("M11340.m2v") Left=WAVSource("M11340_Left.wav") Right=WAVSource("M11340_Right.wav") audio=MergeChannels(Left, Right) AudioDub(video, audio) #Speed up + pitch adjustment 4% 48000Hz ConvertAudioToFloat() AssumeFPS(25, 1, false) TimeStretch(tempo=100.0*25.0/(24000.0/1000.0)) SSRC(48000) ConvertAudioTo24bit() So far so good, but the question is: what happens with 5.1? Can I just do: Code:
#Indexing 24p video & audio video=LWLibavVideoSource("M11340.m2v") FL=WAVSource("M11340_01.wav") FR=WAVSource("M11340_02.wav") CC=WAVSource("M11340_03.wav") LFE=WAVSource("M11340_04.wav") LS=WAVSource("M11340_05.wav") RS=WAVSource("M11340_06.wav") audio=MergeChannels(FL, FR, CC, LFE, LS, RS) AudioDub(video, audio) #Speed up + pitch adjustment 4% 48000Hz ConvertAudioToFloat() AssumeFPS(25, 1, false) TimeStretch(tempo=100.0*25.0/(24000.0/1000.0)) SSRC(48000) ConvertAudioTo24bit() And if so, what is TimeStretch() actually doing with the 5.1? Do I have to specify the audio layout in the frame properties? Is it gonna know that it's a 5.1 or is it only gonna process the stream as 3 stereo pairs (i.e FL FR // CC, LFE // LS, RS)? Last but not least, given that the output PCM 24bit 48000Hz audio file muxed as RF64 .wav is gonna be fed to a DolbyE encoder (Dolby DP600) which needs the 5.1+2.0, do I have to filter them separately 'cause otherwise TimeStretch() is gonna make a boo boo or can I just do something like: Code:
#Indexing 24p video & audio video=LWLibavVideoSource("M11340.m2v") FL=WAVSource("M11340_01.wav") FR=WAVSource("M11340_02.wav") CC=WAVSource("M11340_03.wav") LFE=WAVSource("M11340_04.wav") LS=WAVSource("M11340_05.wav") RS=WAVSource("M11340_06.wav") Left=WAVSource("M11340_Left.wav") Right=WAVSource("M11340_Right.wav") audio=MergeChannels(FL, FR, CC, LFE, LS, RS, Left, Right) AudioDub(video, audio) #Speed up + pitch adjustment 4% 48000Hz ConvertAudioToFloat() AssumeFPS(25, 1, false) TimeStretch(tempo=100.0*25.0/(24000.0/1000.0)) SSRC(48000) ConvertAudioTo24bit() Thank you in advance. I'm currently trying to test stuff but I don't really trust my ears, you know... |
12th April 2024, 19:26 | #3 | Link |
Broadcast Encoder
Join Date: Nov 2013
Location: Royal Borough of Kensington & Chelsea, UK
Posts: 3,040
|
To be fair, the point isn't to try other things manually but rather to know whether I should correct my AVS Script or not, 'cause most of the time those would be part of a completely automatized workflow in FFAStrans, so... yeah... I kinda need to know.
Also, I'm extremely curious about the inner working of TimeStretch() and the SoundTouch library. |
13th April 2024, 09:38 | #4 | Link |
Registered User
Join Date: Mar 2011
Posts: 4,903
|
I thought the TimeStretch multichannel issues had been fixed. Is there still a phasing problem?
https://forum.doom9.org/showpost.php...2&postcount=20 The SoundTouch changelog says multichannel support was added to version 1.8.0, which was a fair while ago. http://www.surina.net/soundtouch/README.html |
19th April 2024, 20:20 | #6 | Link |
Broadcast Encoder
Join Date: Nov 2013
Location: Royal Borough of Kensington & Chelsea, UK
Posts: 3,040
|
Ok, so after a few tests, after checking with a tektronix waveform monitor and after sending out the files to the AVID ProTools guys I came back to the following conclusion:
1) In single channel mode (i.e if you have several mono tracks and you use TimeStretch on each one of them individually), the filter doesn't work correctly as it's unaware of the other channels. This means that we have a flanger effect with the audio oscillating from left to right due to the very small pitch variations between the left and the right channel. 2) In multi channel mode those things don't happen. Effectively, if you have a stereo, you must use MergeChannels(left, right) after indexing left and right before you perform the speed up with pitch adjustment. Now this has worked remarkably well and it also works with 5.1. Does it mean that SoundTouch is aware of the audio channel layout? Nope. Does SoundTouch support the new Avisynth ChannelMask for audio? Nope. Nonetheless it gets it right and there's one very simple reason: any multi channel mode is actually a stereo mode under the hood. This means that if you actually have a stereo stream, SoundTouch will perform the speed up + pitch adjustment on left and right at the same time and it will keep them in phase with one another, thus producing a correct output. If you feed it with a 5.1 input, it will take 2ch at a time and perform it as if it was stereo. This works well 'cause it will apply the same logic to keep the channels in phase, so you'll have: FL FR -> in phase CC LFE -> in phase LS RS -> in phase The same goes if you feed a 5.1+2.0 stream made as SoundTouch will filter them as pairs once again: FL FR -> in phase CC LFE -> in phase LS RS -> in phase Left Right -> in phase This also works for 7.1 when you have FL FR -> in phase CC LFE -> in phase SL SR -> in phase LS RS -> in phase What if you have a 5.1 and another 5.1? No problem, once again it will filter them at pairs and therefore produce the right result: FL FR -> in phase CC LFE -> in phase LS RS -> in phase FL FR -> in phase CC LFE -> in phase LS RS -> in phase I can't screenshot the Tektronix Waveform monitor 'cause it's literally a standalone hardware, but I can show you pictures of it: As you can see at the bottom right, under the Lissajou graph, we have the phase between FL and FR and they're perfectly in phase with one another. When I listened to it on my headsets by looking at my own VideoTek() - which is the closest thing I have to a Tektronix in software - I also didn't notice anything unusual through the entire movie: The AVID ProTools guys were also happy and they also flagged the mono one as wrong and the multichannel one as good. Once I sent the multichannel file to QC, it also ended up with a QC PASS, so in this moment I'm as happy as Larry. All is well what ends well. When I have some time I'll update the TimeStretch() wiki page too to make the info readily available to everyone. Last edited by FranceBB; 20th April 2024 at 16:20. |
20th April 2024, 10:08 | #7 | Link |
Registered User
Join Date: Oct 2002
Location: France
Posts: 2,369
|
The old fashion way of creating PAL telecine consisting of "just" speed up things from 24fps to 25fps, resulting indeed on a slighty higher pitch, audible when you compare to audio audio reference, but not realy audible without the reference (exception could be concert/music, or someone you know realy well and perfectly know the voice) is not enough ?
It was done for decades, and several decades ago, they didn't have all this digital audio processing, and the PAL telecine was just a basic simple speed-up, and people have watched the movies on TV for decades without noticing it. The PAL DVDs were the same. Just simple question, as, on the time (at least a decade ago also...) i was doing my own anime PAL DVD from NTSC japanese DVDs, doing a reverese telecine to retreive original film frame and then doing PAL telecine, i tried some of the digital process, result was bad from my point of view, and so used basic usual speed-up.
__________________
My github. |
20th April 2024, 12:37 | #8 | Link |
Broadcast Encoder
Join Date: Nov 2013
Location: Royal Borough of Kensington & Chelsea, UK
Posts: 3,040
|
I wouldn't have any issues with a slightly elevated pitch, but the problem is that when you receive a movie you gotta have the major (i.e the production studio) sign off your processing. For instance, you couldn't just get a movie at 23.976, apply linear interpolation to 50p, divide it in fields to get 25i and encode it as it wouldn't get approved. For the same reason, blending it to 50p and dividing it in fields wouldn't work. Typically the way those things work is that you receive the master file, encode it once to a TX Ready file, then you send it back to the major who gave you the original master file and they either approve or reject the encode. Once they approve it, you become a trusted broadcaster and you're good to encode all the subsequent files you receive on your own. Of course some majors are stricter than others, some want you to perform some things and some others want you to perform some other things, so it's never easy to please everyone. In this case, they specifically asked for speed up with pitch adjustment on both stereo and Dolby, hence the original question about the updated SoundTouch library. Streaming platforms are very lucky as they don't have to go through this process at all given that they can easily put out a 23.976fps movie just fine, a 25p one and a 29.970p program without worrying about frame rate conversion.
|
20th April 2024, 13:11 | #9 | Link |
Registered User
Join Date: Oct 2002
Location: France
Posts: 2,369
|
If i said that, it's just because when i've tried, i just found the result absolutely horrible... Putting a slighty elevated pitch as something just wonderfull !
But again, it was at least 15 if not 20 years ago, so... Things are probably better now, at least, i hope for you.
__________________
My github. |
20th April 2024, 13:17 | #10 | Link | ||
...?
Join Date: Nov 2005
Location: Florida
Posts: 1,445
|
Quote:
https://forum.doom9.org/showpost.php...postcount=2734 Quote:
|
||
20th April 2024, 16:19 | #11 | Link | |
Broadcast Encoder
Join Date: Nov 2013
Location: Royal Borough of Kensington & Chelsea, UK
Posts: 3,040
|
Quote:
About using the version of SoundTouch shipped with the system, I know that you can do it with Linux, but that's because the OS has a completely different approach where everything is shared across multiple programs, which makes sense, but also introduces some complexities in the sense that everything must be updated. I don't actually have anything against still including SoundTouch in Avisynth for Windows to be fair 'cause at least we're shipping a version that we know is gonna work perfectly. I mean, currently I can see that in the source we're including version 2.3.1 as you updated it last time. The currently available version of SoundTouch outside of Avisynth is version 2.3.3. So... you might make the argument that on Linux they could just get version 2.3.3 without waiting for a new Avisynth release, which is fine, but in my mind I wouldn't have updated it anyway as I would consider it "dangerous". I mean, if it's shipped with Avisynth it means that 2.3.1 has been tested and known to be working, while if I update to a new version I might incur into API changes that might break the TimeStretch() function interacting with it. Anyway, I think the current approach is fine. p.s thank you for keeping this updated! Last edited by FranceBB; 20th April 2024 at 16:24. |
|
11th June 2024, 16:29 | #12 | Link | |||||||
Broadcast Encoder
Join Date: Nov 2013
Location: Royal Borough of Kensington & Chelsea, UK
Posts: 3,040
|
Hey guys, I faced a new issue.
Essentially I've got a new file that had 18 audio channels in it, namely 5.1 ITA 2.0 ITA 5.1 ORI 2.0 ORI and then Music and Effects 2.0 for a total of 18 channels (6+2+6+2+2 = 18). I blindly ran it through TimeStretch() however it errored out with "Illegal number of channels". I tested a bit and I realized that the maximum number of supported channels for TimeStretch() is 16, in fact this works: Quote:
Quote:
Since I was applying the pitch adjustment manually to an already speeded up content, I just divided the audio in "sections", applied the Pitch Adjustment individually to each track and then combined the whole thing back together: Quote:
but this made me think about how many other times I might be facing this, so I tried to use the following logic with a for cycle: Quote:
This is because this logic is FLAWED. Effectively, if I do the above what's gonna happen is that my CH.1-2 will become "last" and therefore when it tries to access CH.3-4 it will fail. At that point I thought "well, I could just use "video" in "GetChannels" like: Quote:
Now, to understand this, when I replace GetChannels(i, i+1) thingie with Subtitle() it prints the right channels. In other words: Quote:
As we can see, we get 1, 3, 5, 7, 9, 12, 13, 15, 17, which means that in theory GetChannels(i, i+1) should translate to: GetChannels(1,2), GetChannels(3,4), GetChannels(5,6), GetChannels(7,8), GetChannels(9,10), GetChannels(11,12), GetChannels(13,14), GetChannels(15,16), GetChannels(17,18). Indeed if I use: Quote:
The problem however is that in the very moment in which I use "GetChannels()" then I get two channels to filter up but then I don't know how to add all the two channels I divided in pairs back together to get the 18 channels as part of the loop. |
|||||||
11th June 2024, 17:33 | #14 | Link | |
Avisynth language lover
Join Date: Dec 2007
Location: Spain
Posts: 3,433
|
Quote:
Code:
result = video.KillAudio() for (i = 1, my_channel_number-1, 2) { GetChannels(video, i, i+1) ResampleAudio(48000) ConvertAudioToFloat() TimeStretch(96) ResampleAudio(48000) result = MergeChannels(result, last) } return result |
|
11th June 2024, 17:36 | #15 | Link | ||||||||
Broadcast Encoder
Join Date: Nov 2013
Location: Royal Borough of Kensington & Chelsea, UK
Posts: 3,040
|
Quote:
This is because TimeStretch() can work in two modes: mono mode and stereo mode. By default, if it's fed with 1 single channel, then it works in mono mode, while if it's fed with 2 or more channels it works in stereo mode. For instance, here it's working in mono mode: Quote:
Quote:
This means that stereo mode is always desired unless you have unrelated tracks. Now, suppose you have a 5.1 track, then this is gonna work: Quote:
Quote:
This can scale all the way up to 16 channels, in fact this works Quote:
Quote:
What I'm trying to do then is to divide the tracks in stereo pairs myself and for that I've created the for cycle above. Now, the problem I'm facing is to get the filtered stereo pairs from CH.1-2 all the way up to CH.17-18 out of the for cycle. My latest attempt has been the following one: Quote:
The problem is that once again I'm getting a clip with 2 audio channels instead of the 18 filtered channels. So... how do I get the 18 channels I filtered in the for cycle out of the cycle and into the final result? I'm sure I'm doing something stupid and it's gonna be obvious to you guys, but I'm really struggling with this... |
||||||||
11th June 2024, 17:53 | #16 | Link | ||
Avisynth language lover
Join Date: Dec 2007
Location: Spain
Posts: 3,433
|
Quote:
Quote:
|
||
11th June 2024, 17:56 | #17 | Link | |||
Broadcast Encoder
Join Date: Nov 2013
Location: Royal Borough of Kensington & Chelsea, UK
Posts: 3,040
|
Quote:
Thank you so much! I tested it with: Quote:
and initially it basically said that I couldn't use MergeChannels() at the end 'cause you can't merge a clip without audio with the one that has audio, therefore I changed it to: Quote:
And it worked!! Thank you so much, Gavino! I owe you a beer! If you were here I would have offered you one straight away! I really love this community and I'm so happy to be part of it. |
|||
11th June 2024, 18:12 | #18 | Link | |
Broadcast Encoder
Join Date: Nov 2013
Location: Royal Borough of Kensington & Chelsea, UK
Posts: 3,040
|
Uh, small revision, basically when you're using MergeChannels() with different sample types, it will merge to the lowest sample type instead of the highest one, therefore I changed the BlankClip to sample_type="float" like so:
Quote:
so that the output is 32bit float. Now it truly is perfect. |
|
18th September 2024, 22:45 | #19 | Link |
Broadcast Encoder
Join Date: Nov 2013
Location: Royal Borough of Kensington & Chelsea, UK
Posts: 3,040
|
Good news everyone!
The developer of the SoundTouch library pushed the limit up from 16 channels to 32 channels, so now it's just a matter of integrating the new SoundTouch version in Avisynth and you won't need to use the workaround above to divide the channels into stereo pairs. Check here for the relevant commit: Link |
Thread Tools | Search this Thread |
Display Modes | |
|
|