Discussion:
[savannah-help-public] [sr #109439] Commit notification hook mishandles non-ASCII author names
Ludovic Courtès
2018-01-08 08:32:00 UTC
Permalink
URL:
<http://savannah.gnu.org/support/?109439>

Summary: Commit notification hook mishandles non-ASCII author
names
Project: Savannah Administration
Submitted by: civodul
Submitted on: Mon 08 Jan 2018 09:31:59 AM CET
Category: Source code repositories - developer access
Priority: 5 - Normal
Severity: 3 - Normal
Status: None
Assigned to: None
Originator Email:
Operating System: None
Open/Closed: Open
Discussion Lock: Any

_______________________________________________________

Details:

Hello,

The email notification hook behind the Guix repositories incorrectly handles
non-ASCII commit author names.

See for instance:

https://lists.gnu.org/archive/html/guix-commits/2018-01/msg00197.html
https://lists.gnu.org/archive/html/guix-commits/2018-01/msg00198.html
https://lists.gnu.org/archive/html/guix-commits/2018-01/msg00199.html
https://lists.gnu.org/archive/html/guix-commits/2018-01/msg00200.html
https://lists.gnu.org/archive/html/guix-commits/2018-01/threads.html

It also leads to invalid headers like this:

Mail-Followup-To: "guix-***@gnu.org, Ludovic Court"@savannah.gnu.org,
"ès <***@gnu.org>"@savannah.gnu.org

Could it be that the notification hook is running in a non-UTF8 locale?

TIA,
Ludo'.




_______________________________________________________

Reply to this item at:

<http://savannah.gnu.org/support/?109439>

_______________________________________________
Message sent via/by Savannah
http://savannah.gnu.org/
Glenn Morris
2018-01-09 04:26:16 UTC
Permalink
Follow-up Comment #1, sr #109439 (project administration):

This was reported to git-multimail (which is still what Savannah uses, I
think?) years ago, and closed wontfix:

https://github.com/git-multimail/git-multimail/issues/70

Presumably the Savannah user database is still Latin-1.

_______________________________________________________

Reply to this item at:

<http://savannah.gnu.org/support/?109439>

_______________________________________________
Message sent via/by Savannah
http://savannah.gnu.org/
Bob Proulx
2018-01-09 19:22:10 UTC
Permalink
Follow-up Comment #2, sr #109439 (project administration):

I spent some time looking into this problem and the issue is much too
complicated to type into a web page text area. I thought about dragging this
conversation over to the mailing list but decided to give it a shot here
anyway. Glenn is correct about the Latin1 encoding being the problem.

There are many problems. One is that Savannah's web interface is designed
around Latin1 not UTF-8. I don't know what needs to be done to fix the web UI
to migrate it from Latin1 to UTF-8. I didn't try it and am not sure but I am
pretty sure that if I update the database to contain UTF-8 content instead of
Latin1 content then the web page would be the reverse mangling.

https://savannah.gnu.org/users/civodul

Oh, and there is also a lot of content stored in the database in UTF-8 content
too. Even though the database character encoding is specified as Latin1.
Assaf has an entry describing this problem in the TODO list. That mismatch is
also a problem for other data in the other direction.

In any case here are some data factoids just as general information. I will
dump some data from the MySQL database.


vcs0:~# getent passwd civodul | awk -F: '{print$5}' | od -tx1 -c
0000000 4c 75 64 6f 76 69 63 20 43 6f 75 72 74 e8 73 0a
L u d o v i c C o u r t 350 s \n

vcs0:~# getent passwd civodul | awk -F: '{print$5}' | iconv -f LATIN1 -t UTF-8
| od -tx1 -c
0000000 4c 75 64 6f 76 69 63 20 43 6f 75 72 74 c3 a8 73
L u d o v i c C o u r t 303 250 s
0000020 0a
\n


This shows that indeed the content from the database is returned in a Latin1
encoding. This is then used by git-multimail and onward. If it were UTF-8
then from here onward through the email it should all work okay.

At the moment I think a reasonable workaround would be handling this in the
git-multimail wrapper that we are already using with git-multimail. It's all
Python and I am a Perl guy so please forgive me if I don't know Python well
enough to make the changes myself. But if someone were to propose patches to
the python then I think this could be fixed there. Here is raw access to the
git repository including config for git-multimail. The file needing patching
is post-receive. Looking at that file should give a python person enough
information on the process and they should be able to hack in a workaround.

https://git.savannah.gnu.org/git/guix.git/hooks/

If the fromaddr could be passed through "iconv -f LATIN1 -t UTF-8" then I
think the result would work around the current Latin1 issues. Patches
solicited.

And one more thing. We are using git-multimail from just after the 1.0.0 tag
plus 3 with two local changes on top of that from 2014. It's been working
well so there hasn't been a need to update. But if someone were offended that
we aren't using the latest version of git-multimail and was willing to test
out the new version then I'd be happy to work through the upgrade with them.


_______________________________________________________

Reply to this item at:

<http://savannah.gnu.org/support/?109439>

_______________________________________________
Message sent via/by Savannah
http://savannah.gnu.org/

Loading...