Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > Capturing and Editing Video > Avisynth Development
Register FAQ Calendar Today's Posts Search

Reply
 
Thread Tools Search this Thread Display Modes
Old 4th July 2010, 23:11   #41  |  Link
stax76
Registered User
 
stax76's Avatar
 
Join Date: Jun 2002
Location: On thin ice
Posts: 6,837
ANSII covered Cyrillic chars are not my problem but Unicode Cyrillic chars which I could possibly convert to ANSII, maybe I'm just writing nonsense, I don't know.
stax76 is offline   Reply With Quote
Old 4th July 2010, 23:16   #42  |  Link
Groucho2004
 
Join Date: Mar 2006
Location: Barcelona
Posts: 5,034
Quote:
Originally Posted by stax76 View Post
ANSII covered Cyrillic chars are not my problem but Unicode Cyrillic chars which I could possibly convert to ANSII, maybe I'm just writing nonsense, I don't know.
Are talking about file names or the encoding of the script? If you mean the content of the script then yes, you can easily convert UTF-16 or UTF-8 to Ansi (CP-1251).
Groucho2004 is offline   Reply With Quote
Old 4th July 2010, 23:22   #43  |  Link
stax76
Registered User
 
stax76's Avatar
 
Join Date: Jun 2002
Location: On thin ice
Posts: 6,837
Script file names are my problem, StaxRip generates those based on the name of the source file and the source file will of course also appear in the scripts but that's not a problem it seems.
stax76 is offline   Reply With Quote
Old 4th July 2010, 23:27   #44  |  Link
Groucho2004
 
Join Date: Mar 2006
Location: Barcelona
Posts: 5,034
Quote:
Originally Posted by stax76 View Post
Script file names are my problem, StaxRip generates those based on the name of the source file and the source file will of course also appear in the scripts but that's not a problem it seems.
So, as far as I understand, the report you got about Russian file names states that there is no problem, right? If that is the case, what exactly is the problem?
Groucho2004 is offline   Reply With Quote
Old 4th July 2010, 23:58   #45  |  Link
stax76
Registered User
 
stax76's Avatar
 
Join Date: Jun 2002
Location: On thin ice
Posts: 6,837
StaxRip rejects Unicode file names, it was requested to remove this but I don't believe it can be removed because it cannot work, it can work with Cyrillic ANSII but not with Cyrillic Unicode. Using alternative script names would be a big task as scripts are generated in countless locations with complicated code. At least I can say it's not my fault.
stax76 is offline   Reply With Quote
Old 5th July 2010, 00:29   #46  |  Link
Groucho2004
 
Join Date: Mar 2006
Location: Barcelona
Posts: 5,034
Quote:
Originally Posted by stax76 View Post
StaxRip rejects Unicode file names, it was requested to remove this but I don't believe it can be removed because it cannot work, it can work with Cyrillic ANSII but not with Cyrillic Unicode. Using alternative script names would be a big task as scripts are generated in countless locations with complicated code.
Windows stores file names internally always as Unicode (UTF-16) in order to provide support for file names for all languages. The system code page determines how the characters of the file names are interpreted by the API (if you don't override the default code page in your file and string operations).

I simply don't understand your problem. Since Avisynth doesn't support UTF-8 or UTF-16 you have to create your scripts with the appropriate MBCS encoding using the correct code page. There are Win32 API functions you can use to ensure the proper encoding. Even dummy programming languages like VB should support this.
I'm doing this in plain C without problems, one would think that .NET (which I assume you're using) should make this very simple.

Quote:
Originally Posted by stax76 View Post
At least I can say it's not my fault.
I suggest you do some reading on the matter and re-think that statement.
Groucho2004 is offline   Reply With Quote
Old 11th July 2010, 10:30   #47  |  Link
HayateYuki
Registered User
 
Join Date: Mar 2008
Location: Hong Kong, China
Posts: 5
Actually, IMHO, for nowadays programs on Windows platforms, Native Unicode support could be the base requirement.

Such as using only Unicode version of API, which there is "W" at the end of API functions...

It could be huge difficulty, but it could be the only complete way to avoid any this kind of problems...

For AVS script parser, as like foxyshadis said, BOM could be the best way to determine whether the script is UTF-8, UTF-16 or MBCS encoding, then convert them to UTF-16 with MultiByteToWideChar() and then do the things...

Please, throw away ...A() for future version.
HayateYuki is offline   Reply With Quote
Old 11th July 2010, 14:42   #48  |  Link
stax76
Registered User
 
stax76's Avatar
 
Join Date: Jun 2002
Location: On thin ice
Posts: 6,837
Quote:
then convert them to UTF-16 with MultiByteToWideChar() and then do the things...
And that is the problem as I understand it, the core is 8 bit instead of 16 and it's not easy to port it to 16 bit.
stax76 is offline   Reply With Quote
Old 11th July 2010, 15:11   #49  |  Link
kemuri-_9
Compiling Encoder
 
kemuri-_9's Avatar
 
Join Date: Jan 2007
Posts: 1,348
Quote:
Originally Posted by HayateYuki View Post
Please, throw away ...A() for future version.
it's not as simple as just tossing the 'A', from the API functions and it'll magically just work.

avisynth is highly designed around the use of char * which would all have to be changed to wchar_t * if unicode was to be used and supported within scripts.
this is by no means a trivial change.
let's not forget that this core change would break every current filter there is as well, and it's no small undertaking to fix them all either.

the avisynth project is even defined to use MBCS which defines 'A'-less API calls back to 'A' versions through the preprocessor.
__________________
custom x264 builds & patches | F@H | My Specs

Last edited by kemuri-_9; 11th July 2010 at 15:15.
kemuri-_9 is offline   Reply With Quote
Old 11th July 2010, 15:40   #50  |  Link
Midzuki
Unavailable
 
Midzuki's Avatar
 
Join Date: Mar 2009
Location: offline
Posts: 1,480
Change the title of this thread ?

Just a nit-pick, but, IMHO,

"Foreign Language characters in filenames"

should be replaced by

"Non-ASCII characters in filenames"

(or: "Non-ANSI characters in filenames")

((because "foreign" is just a point-of-view)).

Last edited by Midzuki; 11th July 2010 at 15:45.
Midzuki is offline   Reply With Quote
Old 11th July 2010, 22:50   #51  |  Link
Groucho2004
 
Join Date: Mar 2006
Location: Barcelona
Posts: 5,034
Quote:
Originally Posted by HayateYuki View Post
For AVS script parser, as like foxyshadis said, BOM could be the best way to determine whether the script is UTF-8, UTF-16 or MBCS encoding, then convert them to UTF-16 with MultiByteToWideChar() and then do the things...
Why would anyone create a script in UTF-8 or UPF-16 when it's known that Avisynth doesn't support it? Am I missing something?

The only "inconvenience" you have with Avisynth not supporting Unicode is that you have choose the proper encoding of your script according to the system code page it's supposed to be used on. It's very simple.
Groucho2004 is offline   Reply With Quote
Old 12th July 2010, 17:32   #52  |  Link
krieger2005
Registered User
 
krieger2005's Avatar
 
Join Date: Oct 2003
Location: Germany
Posts: 377
It's important if you want to use pathes in AVS which contain Non ASCII Character.
krieger2005 is offline   Reply With Quote
Old 12th July 2010, 17:44   #53  |  Link
Groucho2004
 
Join Date: Mar 2006
Location: Barcelona
Posts: 5,034
Quote:
Originally Posted by krieger2005 View Post
It's important
What's important?
Groucho2004 is offline   Reply With Quote
Old 18th July 2010, 21:15   #54  |  Link
krieger2005
Registered User
 
krieger2005's Avatar
 
Join Date: Oct 2003
Location: Germany
Posts: 377
Quote:
Originally Posted by Groucho2004 View Post
Why would anyone create a script in UTF-8 or UPF-16 when it's known that Avisynth doesn't support it?
Answer:

Quote:
Originally Posted by krieger2005
It's important if you want to use pathes in AVS which contain Non ASCII Character.

Last edited by krieger2005; 18th July 2010 at 21:18.
krieger2005 is offline   Reply With Quote
Old 18th July 2010, 22:26   #55  |  Link
Groucho2004
 
Join Date: Mar 2006
Location: Barcelona
Posts: 5,034
@krieger2005
You obviously don't know anything about encoding standards.

First of all, I recommend that you look up what ASCII stands for. Wikipedia is your friend.

ACSII is only a 7 bit subset of a 8 bit encoding scheme. AVISynth however supports 8 bit encodings based on the common Windows code pages.

If you are for example using code page 1252 it covers all Western languages like English, German, French, Norwegian, Italian, etc including all accented characters these languages have.
If you set your Windows to Russian (you don't need Russian Windows for that) it will support all Cyrillic languages and so on.

Sorry, but your argument is nonsense. There is only a problem when people don't know how create a script properly.
Groucho2004 is offline   Reply With Quote
Old 19th July 2010, 01:13   #56  |  Link
mariush
Registered User
 
Join Date: Dec 2008
Posts: 589
And which code page contains the Euro character (€) ? Don't answer, there's several code pages that actually exist and contain it but the point is it's not in ASCII and not all people getting a file with such character may actually have installed that code page.

I won't even get into the new currency symbol for the new Indian Rupee for example which is planned to be included in the UTF-8 - the rupee is represented by the Unicode character 20A8 (₨) right now, but it's going to receive another position for the new symbol... the point is code pages is antique technology..

The sensible way is to move all the way to utf-8 which is perfectly fine and can represent if not all, then almost all characters out there. There are free libraries like the IBM's unicode one which would make the job easier but I'm not that good of a programmer to get involved.

ps.. some people work as freelancers therefore receive files with various names from various parts of the world - sometimes included in winrar archives, don't make the assumption that a video is created on one computer and ends its life on the same computer, on the same code page window has running.

Last edited by mariush; 19th July 2010 at 01:16.
mariush is offline   Reply With Quote
Old 19th July 2010, 03:37   #57  |  Link
Groucho2004
 
Join Date: Mar 2006
Location: Barcelona
Posts: 5,034
Quote:
Originally Posted by mariush View Post
And which code page contains the Euro character (€) ?
Hm, I tried but can't figure out why one would use the Euro symbol in an AVISynth script.

Quote:
Originally Posted by mariush View Post
ps.. some people work as freelancers therefore receive files with various names from various parts of the world - sometimes included in winrar archives, don't make the assumption that a video is created on one computer and ends its life on the same computer, on the same code page window has running.
I'm not saying it's perfect and I certainly agree that Unicode AVISynth would be the best way to go but we have to work with what's given. I find it hard to believe that anyone would go through the trouble porting AVISynth (and all plugins in existance) to Unicode.
Groucho2004 is offline   Reply With Quote
Old 19th July 2010, 10:59   #58  |  Link
Gavino
Avisynth language lover
 
Join Date: Dec 2007
Location: Spain
Posts: 3,431
Quote:
Originally Posted by Groucho2004 View Post
Hm, I tried but can't figure out why one would use the Euro symbol in an AVISynth script.
In a subtitle - just tried it and it works fine for me.
Gavino is offline   Reply With Quote
Old 19th July 2010, 20:07   #59  |  Link
krieger2005
Registered User
 
krieger2005's Avatar
 
Join Date: Oct 2003
Location: Germany
Posts: 377
actually i am working as a software developer and know exactly what i'm talking about since i used the libraries behind converting character from codepages to unicode and the way back and so must know the theory behind usage of such libraries. Using Codepages is a possible solution and used decades away. but when you start using Character which are in two different codepages you can't use codepages or must search very long for a codepage which support such character-combinations (for example german: ä and russian я).

Why should someone use such a mix? Just because it's possible!!!! Don't ask why peaple do something - because peaple just do it.
krieger2005 is offline   Reply With Quote
Old 19th July 2010, 21:57   #60  |  Link
foxyshadis
Angel of Night
 
foxyshadis's Avatar
 
Join Date: Nov 2004
Location: Tangled in the silks
Posts: 9,559
To avoid recompiling plugins, and since AVISYNTH_INTERFACE_VERSION isn't passed into any constructors, you could either attempt to find the value within the plugin, or have two copies of AddFunction, a char * and a wchar_t * version, the plugin gets whichever environment it tries to use. Kind of nasty, but that's baked code for you. Actually, you could even modify the header to point AddFunction (ThrowError, etc) to an internal AddFunctionW when recompiled, since the filter only gets the functionality if recompiled, but never loses any; that way you don't need to hack in a char * check.

No matter what happens, there will have to be char->wchar conversion functions, because not all plugins are ever going to be recompiled.
foxyshadis is offline   Reply With Quote
Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 21:23.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.