A New Character Encoding [Archive]

View Full Version : A New Character Encoding

wswartzendruber

24th August 2022, 17:29

EDIT: Updated on 2022-08-27 @ 07:30 UTC to reflect an updated spec.

I was thinking about how zombies could rise up at any moment and attack us, and then I realized that all we would then have are 7-bit teletype machines for communication. So, being the selfless individual that I am, I have created a new 7-bit encoding scheme for Unicode.

STF-7: The Silly 7-bit Transformation Format (https://wswartzendruber.net/lets-talk-about/stf7/index.html)

Please mail my Nobel Prize to my U.S. address. After all, I will be the reason people can continue sending poop emojis to each other after the world ends.

FranceBB

24th August 2022, 18:21

Funny enough, if there was a zombie apocalypse and the human race slowly faced its end and extinction, the only thing left would be the Arctic Code Vault with all open source projects in it, including Avisynth, all its functions, plugins, all my stuff, all your stuff, Linux, libav etc.
It would be quite something to pass on to aliens when they'll visit Earth. xD

Ok, we both had one drink too many at this point.

filler56789

24th August 2022, 19:26

UTF-7 has been invented already. :)

https://en.wikipedia.org/wiki/UTF-7

Seriously now:
it sucks that even in the 21st century the so-called I.T. world still hasn't abandoned its roots from the 7-bit era :–/

If it had, everybody and everything would be always using pure Unicode by now, not UTF-8 or anything...

"respecting" ASCII plus creating/using more and more "languages" which are always subsets of English means:

United-States-centrism and nothing else -_-

wswartzendruber

24th August 2022, 19:29

UTF-7 has been invented already. :)

https://en.wikipedia.org/wiki/UTF-7
I know what UTF-7 is. And it sucks. It has none of the advantages listed in the Features section where I compare STF-7 to UTF-7.

Seriously now:
it sucks that even in the 21st century the so-called I.T. world still hasn't abandoned its roots from the 7-bit era :–/

If it had, everybody and everything would be always using pure Unicode by now, not UTF-8 or anything...
You mean UTF-16? I see its advantages for totally enclosed systems. Keeping text exchange in UTF-8, though, makes things easier. Lots of text handling runtimes don't like null bytes, for example.

"respecting" ASCII plus creating/using more and more "languages" which are always subsets of English means:

United-States-centrism and nothing else -_-
What are these new languages that are subsets of English? U.S. English is a fork of British English, but aside from that, I'm not seeing how this is true.

For characters, English uses Latin writing, same as South America, most of Europe, and much of Africa.

FranceBB

24th August 2022, 20:22

U.S. English is a fork of British English

and knowing Brits, they'll never merge it back to master LMAO

wswartzendruber

24th August 2022, 21:38

And we made the language MIT-licensed so they could pull it back into master, too!

Fookin' arseholes.

filler56789

25th August 2022, 08:20

You mean UTF-16?

No, I meant what I said, PURE Unicode.
No distinction between "binary" and "text", no distinction between "printable" and "non-printable".
No backward compatibility with software, firmware and hardware from the 7-bit era and/or for the 7-bit era, when the 8th bit of each byte was used for "parity-check" because the geniuses from the 7-bit epoch had to find some use for it.

What are these new languages that are subsets of English?
C and C++ are entirely made of English words, this is what I said and meant.
HTML, XML, PostScript, C#, Java, Perl, Ruby, Python, etc Etc ETC, too.
Oh, yes, BASIC, Pascal, RPG, COBOL, Fortran and Algol as well,
but maybe you are much too young and have never heard of them. :)

Since dreaming is still free :) ,
I keep dreaming of a future where nobody and nothing has to depend on escape-sequences, percent-encoding, UUE /XXE /MIME /yEnc, &, &0161;, whatever.

wswartzendruber

25th August 2022, 14:15

No, I meant what I said, PURE Unicode.
No distinction between "binary" and "text", no distinction between "printable" and "non-printable".
This is your level of understanding on the topic? You are only worth responding to in as much as others may learn from it.

EDIT: I should be more clear. Your ignorance combined with your condescending attitude is what makes you not worth dealing with.

If you want to store text in a binary form that's not an image, you need an encoding. UTF-16 is Unicode's native encoding. It's maximum capacity is defined by UTF-16's BMP + surrogate addressing limit.

No backward compatibility with software, firmware and hardware from the 7-bit era and/or for the 7-bit era, when the 8th bit of each byte was used for "parity-check" because the geniuses from the 7-bit epoch had to find some use for it.
Using the eighth bit as a parity check was a common practice in 8-bit teletype. It helped ensure that encoded characters made it over in-tact.

You'd rather they didn't?

C and C++ are entirely made of English words, this is what I said and meant.
Ah.

Since dreaming is still free :) ,
I keep dreaming of a future where nobody and nothing has to depend on escape-sequences, percent-encoding, UUE /XXE /MIME /yEnc, &, &0161;, whatever.
Dreaming is free. It costs nothing, and you'll receive nothing.

filler56789

26th August 2022, 14:29

@wswartzendruber,

thanks for confirming that, just like let's say 97% of the programmers "and similar",
you are exceedingly narrow-minded and just an elaborate tr0ll.

You like to believe you are "logical", "rational" :rolleyes: + "well-informed", but evidently you are neither.

{{
"Must Follow Standards freezes the Cybernetic Sector into existing functionality. In some cases, e.g., ASCII that's an undeniable good ... for a while. ASCII has a range of coding to support teleprinters. Who the heck uses teleprinters these days? But there they sit, hogging space that these days could be used for other, more important, purposes.
It's possible to state ASCII is obsolete; it was designed for 8 bit systems in a 64 bit world."

The technological change over the past 40 years continues today. Much of it is not reaching the consumer market because of existing "standards," e.g., WinTel. And the fact 95% of the people on the software side know bugger-all about hardware, its design, architectural trade-offs between hardware and software, and hardware/software integration.
Putting it simply, COMPUTER SYSTEMS AVAILABLE IN 2013 ARE SQUARELY BASED ON THE LIMITATIONS OF 1975 HARDWARE USING PARADIGMS AND HEURISTICS DEVELOPED IN 1956.

source: https://eurotrib1.eurotrib.com/comments/2013/6/7/05318/71420/2
}}

TRUTH: computer systems available in 2013 are squarely based on the limitations of 1975 hardware using paradigms and heuristics developed in 1956.

But according to you and several other computer ""scientists"" :rolleyes: and software ""engineers"" :rolleyes: ,
the actual problem is the people who stick to Windows 7 or Windows XP because "they hate progress" :rolleyes: and are unable to see how much Micro$oft plus the inventors of Python and Qt are soooo good-intentioned and always know what it's the best for every user. :)

filler56789

26th August 2022, 14:43

If you want to store text in a binary form that's not an image, you need an encoding.

And why is that?
It's because "things have been designed so".
ASCII and programming languages which use English keywords are not a natural phenomenon. This is what I said and meant.
Yes, it would be possible to create hardware, firmware and software designed to make no distinction between "text" and "machine-bytes". After all, nobody uses mechanical typewriters anymore, and the computers of today should not be designed for serving the American businessmen from the 50s.
But you regard certain human limitations created by human conventions as "facts of Nature".
*shrugs*

wswartzendruber

26th August 2022, 16:23

Yes, it would be possible to create hardware, firmware and software designed to make no distinction between "text" and "machine-bytes".
Don't let me stop you from explaining how this would work...

filler56789

26th August 2022, 18:06

IF ↑↑you↑↑ were actually smart, you would already know that I won't read your smart-a§§ed reply because I had already sent you to my IgnoreList. :)

wswartzendruber

26th August 2022, 18:09

With all that settled, we can move on.

I'm going to create some reference implementations of STF-7 in at least a few languages:

1. Java
2. C#
3. C

Java and C# I'm pretty clear on, but how to do this in C has some obvious hurdles to overcome.

Namely: Has anyone ever encountered a platform that doesn't define uint32_t?