Doom9's Forum - View Single Post

ReinerSchweinlin · 18th February 2023, 13:37

Quote:

Originally Posted by benwaggoner

16 GB? That can fit a lot of frames.

Yes, 16GB. But its not a "normal" L3 Cache, its referd to "remote L2 Cache". Its bandwith is higher than the 8 Lane DDR4 access, but not as fast as modern L3 Cache. It can be configured to act as a normal transparent Cache (like a L3 Cache), but also accessed with a seperate driver (or in a hybrid mode). too bad there are no motherboards in Europe for these Xeons. I know that its probally not really worth it, but for a small amount of money, I`d satisfy my curiosity and get one

Quote:

The source code would be the definitive resource. There may be a higher level doc somewhere, but I couldn't find one with a quick search. But "a very small subset" is likely not compatible.

Thanx for checking though. I am not deep enough into all this to simply look up the source code and get my answer.

On a side note: When I was tinkering with CPU feature sets yesterday on an 1950x, I found odd performance differences in different runs, depending, turning AVX2 off seemed to speed things up... Seems there is some potential in individually tweaked binary compiles, taylored to a CPU (of course not worth if one wants to distribute it publicly, but tweaking a personal encoding server this way would be fun), so I probably will have to learn to compile stuff like this properly after all...
ok, back to topic...

Quote:

The lack of strong single-threaded perf would be the big bottleneck anyway.

I think so, too.... These atom cores really are weak... Even a core2duo has more ooompf per core

Quote:

Although, I just recalled that WPP might allow some WPP parallelization; nominally 1 thread per 64 pixels high, although probably only 2x better given overhead. WPP certainly allows for decoder parallelization. Even still, an Atom core is many times slower slower for CABAC-like operations than a modern Xeon core, so that's already factored into comparisons.

I remember quality penalties from too much parallelization - is it worth thinking about it or are we talking a few percent difference in efficiency here?

Quote:

Modern video encoding is stressful in pretty much every way, so Amdahl's Law prevents any big improvement in one area from helping all that much.

Maybe at this point its worth mentioning that getting a XEON Phi of course is pure for academic research and interest, tinkering with old stuff, etc.... For anyone reading along - simply getting a modern Desktop CPU is a much better idea

Quote:

As I've mentioned before, some years back Intel discovered that x265 pushed Xeon thermals hotter than Intel's on internal thermal test tool's theoretical worst case.

Haha... Whenever something like this happened back in my days at university - people reverted to introducing the "factor correction"... simply mutliply the whole equation by something that sounds reasonable (I hope Intel does better and to be fair - This was one user group of.... not so scientific members...)..

Quote:

Circa 1996, it took about 80 minutes to encode 1 minute of 320x240p15 on my then rocket-fast PowerMac 8100/80 workstation. I was able to charge $80/minute for a tape-to-file conversion with a $20/min surcharge for VHS (mainly to encourage the client to find the Beta SP master).

AH, I remember these machines, I did some service on them back then... Good times... I still remember some encoding adventures - good old Abit BP6 with two P3 celeron Tualatin CPUs was able to do realtime MPEG2 for SVCD Encoding..

Edit:

Phoronix has some CPU-Infos which might be interesting:

Code:

processor	: 0
vendor_id	: GenuineIntel
cpu family	: 6
model		: 87
model name	: Intel(R) Xeon Phi(TM) CPU 7210 @ 1.30GHz
stepping	: 1
microcode	: 0x1b0
cpu MHz		: 1168.239
cache size	: 1024 KB
physical id	: 0
siblings	: 256
core id		: 0
cpu cores	: 64
apicid		: 0
initial apicid	: 0
fpu		: yes
fpu_exception	: yes
cpuid level	: 13
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl est tm2 ssse3 fma cx16 xtpr pdcm sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch ring3mwait cpuid_fault epb pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms avx512f rdseed adx avx512pf avx512er avx512cd xsaveopt dtherm ida arat pln pts
bugs		: cpu_meltdown spectre_v1 spectre_v2 mds msbds_only
bogomips	: 2600.01
clflush size	: 64
cache_alignment	: 64
address sizes	: 46 bits physical, 48 bits virtual
power management:

Code:

rchitecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
Address sizes:                   46 bits physical, 48 bits virtual
CPU(s):                          256
On-line CPU(s) list:             0-255
Thread(s) per core:              4
Core(s) per socket:              64
Socket(s):                       1
NUMA node(s):                    1
Vendor ID:                       GenuineIntel
CPU family:                      6
Model:                           87
Model name:                      Intel(R) Xeon Phi(TM) CPU 7210 @ 1.30GHz
Stepping:                        1
CPU MHz:                         1192.466
CPU max MHz:                     1500.0000
CPU min MHz:                     1000.0000
BogoMIPS:                        2600.01
L1d cache:                       2 MiB
L1i cache:                       2 MiB
L2 cache:                        32 MiB
NUMA node0 CPU(s):               0-255
Vulnerability Itlb multihit:     Not affected
Vulnerability L1tf:              Not affected
Vulnerability Mds:               Vulnerable: Clear CPU buffers attempted, no microcode; SMT mitigated
Vulnerability Meltdown:          Mitigation; PTI
Vulnerability Spec store bypass: Not affected
Vulnerability Spectre v1:        Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:        Mitigation; Full generic retpoline, STIBP disabled, RSB filling
Vulnerability Tsx async abort:   Not affected
Flags:                           fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl est tm2 ssse3 fma cx16 xtpr pdcm sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch ring3mwait cpuid_fault epb pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms avx512f rdseed adx avx512pf avx512er avx512cd xsaveopt dtherm ida arat pln pts