control characters incorrectly ignored in OSC string #5536
Replies: 8 comments
-
|
We took the state handling from vt100.net, which lists for OSC_STRING state:
Paul Williams has tested that against multiple DEC VT devices, so I tend to believe, that xterm might be slightly off here. |
Beta Was this translation helpful? Give feedback.
-
|
The de-facto standard is xterm. It is more important to match xterm than essentially-non-existent hardware. The project is called xterm.js, not vt100.js. The demo sets the |
Beta Was this translation helpful? Give feedback.
-
|
Sorry, I strongly disagree. The parser is the best tested thing we have in a continuity sense, thanks to Paul Williams. While xterm had to reverse engineer most things back in the 80s/90s, when literally no docs were available. So there are still flaws hidden there.
Yes I am aware of the name conflict. I really dont like the name and wished more than once it would not carry that name.
I voted more than once for a proper terminfo entry and an overall better TE options discovery. I literally never got any serious or constructive response to any of that. |
Beta Was this translation helpful? Give feedback.
-
|
Perhaps, but that ship has sailed. Almost all who want to use term.js will want to have modern extensions. Nobody wants a vt100. Modern extensions have standardized on xterm or a superset thereof. See my new example. The Maybe we need the equivalent of gcc's For what it's worth, if you run |
Beta Was this translation helpful? Give feedback.
-
|
We cannot easily change the parser rules, as it will have unforeseen side effects on the shape of sequences. OSC is specced to safely transmit only ASCII printables, so no control chars. If your HTML snippets contain meaningful chars outside of that range (even higher UTF-8 or any other 8bit or multibyte encoding) you gonna need to use a transport encoding like base64. Thats the gist of it and done that way by all more complex OSC sequences like iTerm's IIP. Btw DCS is a bit more free in its payload letting a few more control chars pass, but in general the same rule applies here for arbitrary bytes. ECMA-35 allows to spec custom sequence payload protocols with a special modifier sequence (was it called DOCS?), but thats def. dead and not supported by most TEs I have tested (played around with that when trying to create a sound transport sequence). For best support across TEs we are kinda bound to OSC with ASCII printables (and to DCS with less support across current TEs). |
Beta Was this translation helpful? Give feedback.
-
I believe we have to allow general UTF-8 printable characters. There are a number of existing "standard" OSC escape sequences that take raw text without base64-encoding. For example OSC 2 (change window title) and OSC 8 (set link URL). We cannot restrict these to ASCII without breaking applications and reasonable expectations. OSC and DCS payloads cannot pass through arbitrary binary content without encoding, but they should handle all valid "text" content. Pedantic compatibility with some obsolete pre-Unicode specification is not a good argument against. "Text content" (as may appear in "text files") includes printable characters, joiners, and valid whitespace. Portable valid whitespace includes at minimum space, tabs, and standard line-endings (LF or CR+LF). Might as well also allow at least VT, NEL, and FF, even if not as "portable" or well-defined.
I think it is unlikely any application would misbehave if LF is passed through by OSC rather than ignored. If there is, either the application or the OSC handler can easily be fixed - and should be. However, having LF be ignored rather than passed through is more likely to cause misbehavior, and is more complicated to work around. |
Beta Was this translation helpful? Give feedback.
-
|
I just checked ECMA-48 - it also allows 00/08-00/13 (BS, HT, LF, VT, FF CR) in OSC. Seems DEC had a stricter idea about OSC, idk why. So I am not totally opposed to allowing those, as it would bring us closer to ECMA conformance. We still need to check for all OSC handlers, that they dont break badly (warning: the fact that DCS also allows LF caused us a CVE in the past). I think your argument about about UTF8 is a "false friend" for your HTML purpose. HTML itself can be encoded in any text encoding, not only UTF8. Since you use system tools to generate those snippets it is not even guaranteed, that a custom system locale will not interfere here. Imho as an OSC sequence creator you should be prepared for that, e.g. make the sequence more robust in that regard. In summary - if you dont want other TEs to choke on your OSC sequence, it should stay in ASCII printables. |
Beta Was this translation helpful? Give feedback.
-
|
Converted this to a discussion since it's not clear what we should do exactly, trying to make the issues more actionable and moving discussions here. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Most control characters (including CR and LF) are ignored in OSC_STRING.
This is inconsistent with xterm, which I think we should follow in this respect. See the
sos_tableinVTPrsTbl.cin the xterm sources: "The CASE_IGNORE entries correspond to the characters that can be accumulated for the string function (e.g., OSC)." I also verified this using thegdbdebugger.A simple fix in
EscapeSequenceParse.tsis to replace:by:
Optionally, we can also tweak the handling of
ParserAction.OSC_PUTin line 769:My suggestion is to not change that line.
I will submit a pull request if this looks reasonable.
Beta Was this translation helpful? Give feedback.
All reactions