Discussion:
[savannah-help-public] [sr #109423] Non-BMP characters truncate comments
David Corbett
2017-11-27 21:09:12 UTC
Permalink
URL:
<http://savannah.gnu.org/support/?109423>

Summary: Non-BMP characters truncate comments
Project: Savannah Administration
Submitted by: dscorbett
Submitted on: Mon 27 Nov 2017 09:09:10 PM UTC
Category: Savannah trackers - bugs, tasks, etc.
Priority: 5 - Normal
Severity: 3 - Normal
Status: None
Assigned to: None
Originator Email:
Operating System: None
Open/Closed: Open
Discussion Lock: Any

_______________________________________________________

Details:

When a comment containing a non-BMP Unicode character is submitted, it will be
truncated before that character. See, for example, the original submissions of
bug #51672 and bug #52538.




_______________________________________________________

Reply to this item at:

<http://savannah.gnu.org/support/?109423>

_______________________________________________
Message sent via/by Savannah
http://savannah.gnu.org/
Ineiev
2017-11-28 13:02:39 UTC
Permalink
Update of sr #109423 (project administration):

Assigned to: None => ineiev

_______________________________________________________

Follow-up Comment #1:

Could you attach a sample text in a file?

_______________________________________________________

Reply to this item at:

<http://savannah.gnu.org/support/?109423>

_______________________________________________
Message sent via/by Savannah
http://savannah.gnu.org/
David Corbett
2017-11-28 15:41:52 UTC
Permalink
Additional Item Attachment, sr #109423 (project administration):

File name: 1f600.txt Size:0 KB


_______________________________________________________

Reply to this item at:

<http://savannah.gnu.org/support/?109423>

_______________________________________________
Message sent via/by Savannah
http://savannah.gnu.org/
Ineiev
2017-11-28 17:45:56 UTC
Permalink
Follow-up Comment #2, sr #109423 (project administration):

Does it reproduce?

begin πŸ˜€end

_______________________________________________________

Reply to this item at:

<http://savannah.gnu.org/support/?109423>

_______________________________________________
Message sent via/by Savannah
http://savannah.gnu.org/
Ineiev
2017-11-28 17:52:43 UTC
Permalink
Follow-up Comment #3, sr #109423 (project administration):

Thanks for the sample; curiously, preview seems to be unaffected.

_______________________________________________________

Reply to this item at:

<http://savannah.gnu.org/support/?109423>

_______________________________________________
Message sent via/by Savannah
http://savannah.gnu.org/
Ineiev
2017-12-08 10:15:33 UTC
Permalink
Follow-up Comment #4, sr #109423 (project administration):

This looks like a MySQL bug <https://bugs.mysql.com/bug.php?id=67297>: the
string breaks when inserted. FYI: Savannah runs on top of 'mysql Ver 14.14
Distrib 5.5.58, for debian-linux-gnu (x86_64) using readline 6.3'.

_______________________________________________________

Reply to this item at:

<http://savannah.gnu.org/support/?109423>

_______________________________________________
Message sent via/by Savannah
http://savannah.gnu.org/
Bob Proulx
2017-12-08 20:59:21 UTC
Permalink
Post by Ineiev
This looks like a MySQL bug <https://bugs.mysql.com/bug.php?id=67297>: the
string breaks when inserted. FYI: Savannah runs on top of 'mysql Ver 14.14
Distrib 5.5.58, for debian-linux-gnu (x86_64) using readline 6.3'.
Excellent debugging!

Bob
Ineiev
2017-12-11 16:19:03 UTC
Permalink
Post by Bob Proulx
Post by Ineiev
This looks like a MySQL bug <https://bugs.mysql.com/bug.php?id=67297>: the
string breaks when inserted. FYI: Savannah runs on top of 'mysql Ver 14.14
Distrib 5.5.58, for debian-linux-gnu (x86_64) using readline 6.3'.
Excellent debugging!
Thank you!

The question is what to do. we could wait for MySQL or store new messages
in something like base64, with a magic prefix.
Bob Proulx
2017-12-12 07:30:42 UTC
Permalink
Post by Ineiev
The question is what to do. we could wait for MySQL or store new messages
in something like base64, with a magic prefix.
Upgrading quickly is problematic. And trying to upgrade just the
database system and not the clients is also problematic. Or perhaps
the reverse if that is the problem. At least not without testing. In
another set of systems I tried upgrading just half of things and then
the clients and server were out of sync with options they were trying
to pass. It caused a lot of problems.

How much of a problem is this? Can we scan the data before storing it
and detect when we will have the problem and then encode it then?

Bob
Ineiev
2017-12-12 12:40:15 UTC
Permalink
Post by Bob Proulx
Post by Ineiev
The question is what to do. we could wait for MySQL or store new messages
in something like base64, with a magic prefix.
Upgrading quickly is problematic. And trying to upgrade just the
database system and not the clients is also problematic. Or perhaps
the reverse if that is the problem. At least not without testing. In
another set of systems I tried upgrading just half of things and then
the clients and server were out of sync with options they were trying
to pass. It caused a lot of problems.
How much of a problem is this?
Some characters break the comment they are in.
Post by Bob Proulx
Can we scan the data before storing it
and detect when we will have the problem and then encode it then?
Probably we could. I think some UTF-8 parsing would be needed, it's
somewhat harder than to encode unconditionally.
Bob Proulx
2017-12-12 18:05:30 UTC
Permalink
Post by Ineiev
Post by Bob Proulx
Can we scan the data before storing it
and detect when we will have the problem and then encode it then?
Probably we could. I think some UTF-8 parsing would be needed, it's
somewhat harder than to encode unconditionally.
If we encode unconditionally does that mean that previous entries
already stored would need to be encoded?

Bob
Ineiev
2017-12-13 13:32:11 UTC
Permalink
Post by Bob Proulx
Post by Ineiev
Post by Bob Proulx
Can we scan the data before storing it
and detect when we will have the problem and then encode it then?
Probably we could. I think some UTF-8 parsing would be needed, it's
somewhat harder than to encode unconditionally.
If we encode unconditionally does that mean that previous entries
already stored would need to be encoded?
My idea was: new entries are prefixed with a magic string no old entries
begin with; when extracting, Savane checks if the string begins with
the prefix and decodes it when it does.
Ineiev
2018-03-03 17:06:06 UTC
Permalink
Post by Ineiev
My idea was: new entries are prefixed with a magic string no old entries
begin with; when extracting, Savane checks if the string begins with
the prefix and decodes it when it does.
I've just pushed and installed a workaround.
Ineiev
2018-03-03 16:55:05 UTC
Permalink
Follow-up Comment #5, sr #109423 (project administration):

I've just pushed and installed a workaround: whenever a new comment is
written, it reads back what it actually inserted in the database, and if it
differs from the original text, it base64encodes it and inserts with a magic
prefix.

When extracting comments from the database, it checks for that prefix and
base64decodes when needed:

test: πŸ˜€

_______________________________________________________

Reply to this item at:

<http://savannah.gnu.org/support/?109423>

_______________________________________________
Message sent via/by Savannah
http://savannah.gnu.org/
Ineiev
2018-03-03 16:56:11 UTC
Permalink
Update of sr #109423 (project administration):

Status: None => Works For Me


_______________________________________________________

Reply to this item at:

<http://savannah.gnu.org/support/?109423>

_______________________________________________
Message sent via/by Savannah
http://savannah.gnu.org/
Ineiev
2018-03-03 16:59:37 UTC
Permalink
Update of sr #109423 (project administration):

Status: Works For Me => In Progress

_______________________________________________________

Follow-up Comment #6:

The notifications still come broken

_______________________________________________________

Reply to this item at:

<http://savannah.gnu.org/support/?109423>

_______________________________________________
Message sent via/by Savannah
http://savannah.gnu.org/
Ineiev
2018-07-28 12:29:18 UTC
Permalink
Update of sr #109423 (project administration):

Status: In Progress => Done
Open/Closed: Open => Closed

_______________________________________________________

Follow-up Comment #7:

It turns out, my terminal wasn't capable to display it; mutt with
gnome-terminal shows that character in notifications correctly.

Closing the request.

_______________________________________________________

Reply to this item at:

<http://savannah.gnu.org/support/?109423>

_______________________________________________
Message sent via Savannah
https://savannah.gnu.org/

Loading...