Auto Target Encoder - an AV1 GUI based on Machine Learning [Archive]

Kurt.noise

17th August 2025, 14:21

https://github.com/Snickrr/Auto-Target-Encoder

A sophisticated, GUI-based encoding tool designed for automated batch processing of your videos that do not require comprehensive fine-tuning. It leverages machine learning to create high-quality, efficient AV1 video encodes. This application automates the entire workflow for large batches of files: it learns from past encodes to predict optimal quality settings, intelligently analyzes each video's complexity, and displays the progress of all parallel jobs in a real-time dashboard.

This tool moves beyond single-file, trial-and-error encoding by building persistent knowledge. A RandomForest machine learning model predicts the exact CQ/CRF value needed to hit a target quality score (VMAF, SSIMULACRA2, BUTTERAUGLI), while other models provide highly accurate ETA predictions by learning your hardware's real-world performance across hundreds of encodes.

https://github.com/Snickrr/Auto-Target-Encoder/raw/main/demo.gif

----
Not tested nor created by me.

Blue_MiSfit

18th August 2025, 06:03

Fascinating! Thank you for sharing

Z2697

18th August 2025, 14:48

This feels like madness to me.
Or maybe AV1 encoders are just so bad that they need AI assist. (which I don't think is the case)
The whole README feels like AI generated just from the excessive use of Emojis... And too "well written" for a repo that contains just 1 source file and 7 out of 9 commits are editing README.
The whole project might be AI generated as well.

EDIT: it is.

This project was created by someone with no prior coding experience, using AI assistance (Claude Opus 4.1 and Gemini 2.5 Pro) for advanced mathematics and coding implementation. The core ideas and extensive debugging/fine-tuning were done manually.

Now don't get me wrong, AI can be a useful tool to help coding, but I don't think this one is.

Leo 69

19th August 2025, 01:47

Now don't get me wrong, AI can be a useful tool to help coding, but I don't think this one is.

Could you explain in detail why you think there's something wrong with this tool just because it was created with the help of AI? Which bugs or flaws have you already found after using the tool?

Z2697

19th August 2025, 02:01

Could you explain in detail why you think there's something wrong with this tool just because it was created with the help of AI? Which bugs or flaws have you already found after using the tool?

What I mean is just that it's created entirely by AI, not created with help of AI.

Leo 69

19th August 2025, 07:37

What I mean is just that it's created entirely by AI, not created with help of AI.

You're overestimating the capabilities of modern LLMs. They can't create anything on their own without a user's input, which I'd say has to be rather specific. So, to create such a tool the human does need to actually know what he's doing in order to make correct prompts, do the quality assessment of the LLM's output, wrap up the complete package and publish it on the forum. LLMs can't do that alone. There's obviously a lot of human work that was done to make this tool possible. You're wrong.

Z2697

19th August 2025, 16:19

You're overestimating the capabilities of modern LLMs. They can't create anything on their own without a user's input, which I'd say has to be rather specific. So, to create such a tool the human does need to actually know what he's doing in order to make correct prompts, do the quality assessment of the LLM's output, wrap up the complete package and publish it on the forum. LLMs can't do that alone. There's obviously a lot of human work that was done to make this tool possible. You're wrong.

Human is helping AI, then.
But I agree the concept of this tool should work, at least in theory.

RanmaCanada

19th August 2025, 18:07

This looks interesting, but I don't know if it's practical as we know automated metrics can be bad.

benwaggoner

19th August 2025, 23:38

Huh. We need a lot more details about what it is actually generating, based on what training data! Is it generating command lines? Trained on what? And with what input data? What source attributes go into the generation?

AI and ML certainly have applications for video encoding, but I don't see enough details to know what this is doing, or how much better it could do than a skilled human.

Leo 69

19th August 2025, 23:53

Here is a stage-by-stage description of what it does:

Stage 1: Initialization and Learning
When the application starts, it first loads all user settings from a config.ini file. It then initializes its Machine Learning (ML) models by training them on historical data stored in a local SQLite database. This allows the app to learn from previous encodes to make smarter predictions about encoding speed and quality settings for new videos.
Stage 2: File Queuing and Pre-Filtering
The user selects a folder of videos to process. The script scans this folder and populates a queue. Before any heavy processing begins, it performs a fast pre-filter, automatically skipping files that are too short, too small, or have a bitrate lower than user-defined thresholds.
Stage 3: Video Analysis
For each valid file, the application performs a detailed analysis. It uses ffprobe to get technical details (resolution, frame rate, etc.) and performs a complexity analysis to understand the video's content (e.g., detecting scene changes). This data is compiled into a set of "features" that the ML models can understand.
Stage 4: ML-Driven Quality Search
This is the core of the application. To find the perfect quality setting (CQ/CRF value) that meets the user's target (e.g., a VMAF score of 95):
It first creates a short, high-quality "master sample" by stitching together representative clips from the video.
It uses its trained Quality Model to predict the best CQ value needed to hit the target score.
Based on the model's confidence, it intelligently tests one or two CQ values by encoding only the small sample file, which is extremely fast.
If the prediction is wrong or the model is not confident, it falls back to an efficient search to find the optimal CQ value. All test results are cached in the database to avoid re-doing work.
Stage 5: Final Encoding
Once the optimal CQ value has been found, the script proceeds to encode the full-length original video using that setting. It monitors the encoding process in real-time to provide progress updates and detect if the process has stalled.
Stage 6: Finalization, Logging, and Learning
After the encode is complete, the script verifies the new file. If it meets the criteria (e.g., sufficient size reduction), it saves the final file and can optionally delete the original. Critically, it logs the performance data (how long it took, the final file size, etc.) back into the SQLite database. This act of logging completes the feedback loop, ensuring that the ML models become more accurate with every video it processes.

GeoffreyA

20th August 2025, 07:22

So, if I understand correctly, it extends the approach of Av1an, reducing the CQ/CRF search space by inferring what would be the right value. The updating of the database and retraining of the model should, in theory, lead to improved accuracy in that selection.

benwaggoner

27th August 2025, 18:49

So, if I understand correctly, it extends the approach of Av1an, reducing the CQ/CRF search space by inferring what would be the right value. The updating of the database and retraining of the model should, in theory, lead to improved accuracy in that selection.
Although what ground truth data set of optimal decisions it would be trained on is a big question.

GeoffreyA

27th August 2025, 19:54

Although what ground truth data set of optimal decisions it would be trained on is a big question.

Perhaps it trains the model from scratch. By resorting to ordinary searches when prediction fails, which will be the case for early encodes, it works normally until, hitting critical mass, the model starts predicting accurately.

Z2697

29th August 2025, 23:36

So I guess you haven't tested it.
Weeks later I finally give it a try.

The concept did work, until it doesn't.
The model needs 50 trials to train, then it can predict a high confidence CRF value, and a medium confidence CRF range, if both fails, the program won't fallback to full search, it encodes with lowest in CRF "search range", or you can configure it to skip.
VMAF can't be predicted very well, I assume the other 2 as well, so this will happen quite often.
But if you give it similar videos it does work... But I guess I can just use same settings, no need some prediction...

Encoding results saved in database, the script won't let you encode it again.
VMAF runs in single thread... (technically FFmpeg's fault because it uses FFmpeg's libvmaf filter)
Well, it's open source, so I can fix it, but I'm feeling fooled enough.

Z2697

29th August 2025, 23:44

So, if I understand correctly, it extends the approach of Av1an, reducing the CQ/CRF search space by inferring what would be the right value. The updating of the database and retraining of the model should, in theory, lead to improved accuracy in that selection.

Av1an encodes scene based chunks... which is better than this one, the prediction can be added but again, good luck predicting that.
Even Av1an is too much for me. This goes to show how disappointing the AV1 encoders are: low performance, or mediocre RDO, even refuse to add scenecut keyframe.

GeoffreyA

30th August 2025, 07:13

So I guess you haven't tested it.
Weeks later I finally give it a try.

The concept did work, until it doesn't.
The model needs 50 trials to train, then it can predict a high confidence CRF value, and a medium confidence CRF range, if both fails, the program won't fallback to full search, it encodes with lowest in CRF "search range", or you can configure it to skip.
VMAF can't be predicted very well, I assume the other 2 as well, so this will happen quite often.
But if you give it similar videos it does work... But I guess I can just use same settings, no need some prediction...

Encoding results saved in database, the script won't let you encode it again.
VMAF runs in single thread... (technically FFmpeg's fault because it uses FFmpeg's libvmaf filter)
Well, it's open source, so I can fix it, but I'm feeling fooled enough.

It does seem to be an over-engineered approach for choosing CRF, which is not a problem to begin with.

Av1an encodes scene based chunks... which is better than this one, the prediction can be added but again, good luck predicting that.
Even Av1an is too much for me. This goes to show how disappointing the AV1 encoders are: low performance, or mediocre RDO, even refuse to add scenecut keyframe.

Yes, this tool works at the file rather than scene level.

I tried Av1an last year and it was too slow on my system. Much effort for little gain. The anime-encoding community makes a big deal about Av1an.

AV1 has been disappointing. Whether that's because it descends from TrueMotion and VPx, others more familiar with those codecs will know better. The endless tuning needed and cargo culting do not help. FGS has also proved to be a gimmick in its current implementation. Perhaps what's needed is an innovative breakthrough in encoding noise, if that's scientifically possible.

benwaggoner

3rd September 2025, 17:42

So I guess you haven't tested it.
Weeks later I finally give it a try.

The concept did work, until it doesn't.
The model needs 50 trials to train, then it can predict a high confidence CRF value, and a medium confidence CRF range, if both fails, the program won't fallback to full search, it encodes with lowest in CRF "search range", or you can configure it to skip.
VMAF can't be predicted very well, I assume the other 2 as well, so this will happen quite often.
But if you give it similar videos it does work... But I guess I can just use same settings, no need some prediction...

Encoding results saved in database, the script won't let you encode it again.
VMAF runs in single thread... (technically FFmpeg's fault because it uses FFmpeg's libvmaf filter)
Well, it's open source, so I can fix it, but I'm feeling fooled enough.
And a mean VMAF for a whole file really isn't that informative either, as it doesn't account for quality variations. and VMAF itself wasn't tuned for AV1, and doesn't have that high subjective correlation in the first place. VMAF was absolutely the least mediocre metric we had at that point, and a real step forward. But it's just a ML predictor of subjective MOS rating, with plenty of blind spots. Training another ML based on a first, only kind of accurate ML just compounds the AI drift from what a human expert can do.

With just that feedback loop, I would expect most Doom 9 readers to be able to outperform the AI by just picking appropriate parameters for the content using human brains.

CruNcher

7th September 2025, 14:50

Yeah that model was really the first @ that time its so outdated by now and we can see that VAQ brought more psychovisualy then the whole VMAF tuning
But it doesn't come at a real surprise we see it all the time happening once AQ Encoder implementation stage and Psy tuning evaluation has begun.

Pretty important stage now for AV1

charliebaby

13th December 2025, 13:30

This thing is excellent! I used a modified FFmpeg with VMAF 97, here's the result.

https://www.mediafire.com/file/ohkb25sdqc7k349/videotest.mkv/file