Log in

View Full Version : Detecting the original resolution of an upscale image DCT


LStation
4th January 2025, 18:23
I'd like assistance in replicating what a python script does for images, but now for video

# dct-r1.py based on github project
# https://github.com/joaocarvalhoopen/Detecting_the_original_resolution_of_an_upscale_image_DCT/blob/master/original_resolution_of_upscalled_image.ipynb
# using python 3.12 today
# pip3 install numpy
# pip3 install PIL
# pip3 install scipy
import os, sys
import numpy as np
from PIL import Image
from scipy.fftpack import dct

def process_image(image_path):
im = Image.open(image_path)
def rgb2gray(rgb):
return np.dot(rgb[...,:3], [0.2989, 0.5870, 0.1140])

im_array = np.array(im)
gray_1 = rgb2gray(im_array)
im_gray = np.array(gray_1).astype(float)
t1 = dct(dct(im_gray, axis=0, norm='ortho'), axis=1, norm='ortho')
dct_img = Image.fromarray(t1.astype(np.uint8)).convert("L")
directory, filename = os.path.split(image_path)
new_filename = "dct_" + filename
dct_img.save(os.path.join(directory, new_filename))

for image_path in sys.argv[1:]:
process_image(image_path)

# dct.cmd ; drag n drop image(s) onto this to run the python script on them
# @echo off
# start /min cmd /c for %%i in (%*) do ( python dct-r1.py %%i )

here's a clip(video, google drive) (https://drive.google.com/file/d/1meB6ayC7nuidSDeHLZupkbLEF9IDakT2/view?usp=drive_link), and a still taken from it and processed(image, google drive) (https://drive.google.com/file/d/1yAYcN94RwGbhVVs4qGS9iGhcNZz-m8Cc/view?usp=drive_link), the black lines present in the noise give an impression on what the original resolution of the image was before being scaled ( it is 960x540 but appears it was scaled by 2 ).

I'm new to using Vapoursynth and I've got some basics of it working, commented out the lines that don't do anything now but hopefully convey the idea, in the end having grayscale_process() return dct_img

# dct-r1.vpy with ffms2 and fmtc plugins from vsdb.top
import vapoursynth as vs
#import numpy as np
#from scipy.fftpack import dct
#from PIL import Image

def grayscale_process(clip):
clip = clip.fmtc.matrix(mat="601")
clip = clip.std.ShufflePlanes(planes=[0], colorfamily=vs.GRAY)
# im_gray = np.array(clip).astype(float)
# t1 = dct(dct(im_gray, axis=0, norm='ortho'), axis=1, norm='ortho')
# dct_img = Image.fromarray(t1.astype(np.uint8)).convert("L")
return clip

core = vs.core

clip = core.ffms2.Source(source='P:/ATH.mp4')
grayscale_clip = grayscale_process(clip)
grayscale_clip.set_output()

Selur
4th January 2025, 19:07
Maybe more something like:
def apply_dct(clip):
def dct_process(n, f):
# Extract the frame as a NumPy array
frame_array = np.asarray(f[0], dtype=np.float32)

# Apply DCT on both axes
dct_result = dct(dct(frame_array, axis=0, norm='ortho'), axis=1, norm='ortho')

# Normalize DCT result to 8-bit range and clip values
dct_result = np.clip(dct_result, 0, 255).astype(np.uint8)

# Create a new frame for output
fout = f.copy()
np.copyto(np.asarray(fout[0]), dct_result)
return fout

return clip.std.ModifyFrame(clips=clip, selector=dct_process)

# Ensure input is grayscale (convert if necessary)
if clip.format.color_family != vs.GRAY:
clip = core.std.ShufflePlanes(clip, planes=0, colorfamily=vs.GRAY)

# Apply DCT to each frame
dct_video = apply_dct(clip)

# Output the processed video
dct_video.set_output()

doesn't seem to work for me, but maybe it helps you further,...

LStation
4th January 2025, 21:21
Maybe more something like:
...
doesn't seem to work for me, but maybe it helps you further,...

I appear to have replication(image, google drive) (https://drive.google.com/file/d/1CTMv6cT2kj0bX2sBczNi0HmGNFTVUQaI/view?usp=drive_link)
# dct-r1.vpy with ffms2 and fmtc plugins from vsdb.top
import vapoursynth as vs
import numpy as np
from scipy.fftpack import dct

core = vs.core
clip = core.ffms2.Source(source='g:/title.mp4')

def apply_dct(clip):
def dct_process(n, f):
# Extract the frame as a NumPy array
frame_array = np.asarray(f[0], dtype=np.float32)

# Apply DCT on both axes
dct_result = dct(dct(frame_array, axis=0, norm='ortho'), axis=1, norm='ortho')

# Normalize DCT result to 8-bit range and clip values
dct_result = np.clip(dct_result, -1024, 1024).astype(np.uint8)

# Create a new frame for output
fout = f.copy()
np.copyto(np.asarray(fout[0]), dct_result)
return fout

return clip.std.ModifyFrame(clips=clip, selector=dct_process)

# Ensure input is grayscale (convert if necessary)
if clip.format.color_family != vs.GRAY:
clip = core.std.ShufflePlanes(clip, planes=0, colorfamily=vs.GRAY)

# Apply DCT to each frame
dct_video = apply_dct(clip)

# Output the processed video
dct_video.set_output()

to clarify anyone finding this in the future, I hadn't been able to replicate their 'lena-color' results in how the noise was distributed but went forward anyway since the indicators were still present.
according to pixel color counter, the .py had 256 unique color results and the .vpy had 220 unique color results, so not an exact replica at the moment.
(dct_result, -1024, 1024) ... as long as it's (dct_result, -1, 1) .. it blows out the scale, I like how easy to visually see this anomaly.. when it works. I'm aware of one example where a title is obviously upscaled but it isn't detected (this vpv script will make it so much easier to sift)

yuygfgg
5th January 2025, 10:31
I'm not quite sure about DCT, but I normally use FFT to help find native resolution. You can install https://github.com/Beatrice-Raws/FFTSpectrum to get the FFT frequency spectrum of a video clip.

For some sources (mostly japanese anime) you can use getnative/getfnative (https://github.com/YomikoR/GetFnative, https://github.com/Infiziert90/getnative)

LStation
6th January 2025, 06:01
I'm not quite sure about DCT, but I normally use FFT to help find native resolution. ...

I went with an avisynth port(github) (https://github.com/Asd-g/AviSynth-FFTSpectrum) by Asd-g
# FFTS.avs
ConvertToPlanarRGB(ImageSource("G:\%02d.png", 1, 67))
z_ConvertFormat(pixel_type="YUV444P8", colorspace_op="rgb:709:709:full=>709:709:709:limited")
FFTSpectrum(grid=false)

# avsresize to get planar YUV http://avisynth.nl/index.php/Avsresize
# FFTSpectrum http://avisynth.nl/index.php/FFTSpectrum

my impression is the dct and fft spectrum methods could help ID when somethings been upsampled by looking at how empty an area could be(image, googledrive) (https://drive.google.com/file/d/19eOKpvG6Ta_Gh_gv9HUBGBS2ReWcAPG_/view?usp=drive_link).

Fixed resolution assets like drawn things, SkullGirls(image, googledrive) (https://drive.google.com/file/d/1jGxE7BuqT8mtP9h7QQOogG6ZjOUj89FZ/view?usp=drive_link) or Uni2(image, googledrive) (https://drive.google.com/file/d/1aLI33scAF6zznBRghajmi6A0yB9T0OvC/view?usp=drive_link), FFT spectrum appears to excel here.

Mixed assets like sprites on drawn background, MvsC2(image, googledrive) (https://drive.google.com/file/d/1j8Qffotv8dqhBI-86bWRsS82nj_Lc62G/view?usp=drive_link), appear to do just as good.

Scalable assets like pixel art, Footsies(image, googledrive) (https://drive.google.com/file/d/15e-aKAB-ixp7tBXNxFvbTUOFb4UNBpWA/view?usp=drive_link) ShovelKnight(image, googledrive) (https://drive.google.com/file/d/1t9QfbhfF5IpD-W7NSu2q_uA4ZF1EbuUq/view?usp=drive_link) TMNT(image, googledrive) (https://drive.google.com/file/d/163kLTArEmFnPN4fFU1Y9RhczwktnF-CZ/view?usp=drive_link), can be a bit of a mixed bag, ShovelKnight for example only 900p didn't have aliasing so it's un-scaled resolution doesn't appear to be available.

a user on youtube, Brazil Pixel, inspired me to look into DCT methods, which attempt to id res by the first set of lowest sum on each axis, which is what the top left of their video is(video, youtube) (https://www.youtube.com/watch?v=z48DY2QBXU4), their focus is on identifying Dynamic Resolution changes(translated article) (https://www-pcmanias-com.translate.goog/a-metodologia-do-tio-hildo-para-a-deteccao-de-resolucoes-nativas/?_x_tr_sl=pt&_x_tr_tl=en&_x_tr_hl=pt-PT&_x_tr_pto=wapp).

you can ogle or use those source images(zip, googledrive) (https://drive.google.com/file/d/1sgRYdaFaLblKZAxDFTiCzLIBnOtuxhlO/view?usp=drive_link) above.

Jamaika
6th January 2025, 23:17
This is very old project. I don't know if it works properly. Don't use input RGB.

ffplay_avx2.exe FFTSpectrum.avs
https://limewire.com/?referrer=48k1433kko

LWLibavVideoSource("input_v210.avi",fpsnum=30000,fpsden=1001)
LanczosResize(640, 480)
z_ConvertFormat(pixel_type="YUV420",colorspace_op="709:709:709:l=>709:709:709:f", dither_type="none")
FFTSpectrum(grid=true)

Z2697
7th January 2025, 18:10
Some weird sh*t I wrote
https://github.com/Mr-Z-2697/z-vsPyScripts/blob/39199e3b6a12f8541f13480ff51ae358b500266c/zvs.py#L694

Since I want them "invertible" the values are not normalized.
I chose opencv, I think I looked scipy as well, but cv2 is faster, IIRC (I can't even remember what I had for dinner this morning).

def dct(src):
if not src.format.id==vs.GRAYS:
raise ValueError('I thought only GRAYS input was supported.')
import numpy as np
import cv2
def dct(n,f):
fout=f.copy()
tr=np.asarray(fout[0])
tr=cv2.dct(tr)
np.copyto(np.asarray(fout[0]),tr)
return fout
return core.std.ModifyFrame(src,src,dct)


def idct(src):
if not src.format.id==vs.GRAYS:
raise ValueError('I thought only GRAYS input was supported.')
import numpy as np
import cv2
def idct(n,f):
fout=f.copy()
tr=np.asarray(fout[0])
tr=cv2.idct(tr)
np.copyto(np.asarray(fout[0]),tr)
return fout
return core.std.ModifyFrame(src,src,idct)