TL;DR: See title. How can I tell Google they’re probably processing their mail wrong?

After setting up the Matrix Authentication Service (MAS) and exim-relay as mail server, I noticed verification mails sent from the service are often in the spam directory.

When digging deeper, I found out the mails are failing DKIM authentication. This was weird because DKIM is set up correctly, as verified by other mail providers and online DKIM test tools such as DMARC Tester.

Searching online for “gmail fails DKIM authentication, while other providers pass”, I found regular reports, posts or similar without resolution, or unrelated resolutions such as DKIM alignment.

Using meld, I compared the original source of mails as received by gmail with those of other providers, and found a difference:

In other providers, the header for “From:” and “Reply-To:” fields are presented with double-quotes:

From: "John Smith" <j.smith@example.com>
Reply-To: "John Smith" <j.smith@example.com>

In gmail, where DKIM fails, there are no double-quotes:

From: John Smith <j.smith@example.com>
Reply-To: John Smith <j.smith@example.com>

As this should be the raw source each, I ruled out presentation issues and digged deeper.

I found out, that specifically the rust crate lettre, as used by the MAS, encodes names with whitespace using double-quotes. Further, from researching a bit more and reading RFC 2822 sections 3.2.4 and 3.2.5, I come to the conclusion that whitespace needs no quoting in mail headers.

I created issues upstream and downstream to report the issue at lettre and MAS, particularly that their mails are failing DKIM checks at gmail:

If you’ve read that far, you probably wonder why I post all of that? For one, to provide another data point for people scratching their heads over mail issues.

But other than that: I’m pretty sure the google mail servers should not strip the quotes before doing the DKIM check. I assume they have some kind of decode -> process -> encode workflow, that then simply encodes the headers again, this time without the quotes. But IMHO a correctly signed message should not lead to an authentication error, even if the contents are not perfectly encoded.

I would be curious on getting some feedback from some mail experts on what is happening here. This is not my field of expertise and I’m going by what I’ve learned over the past 48h.

  • hperrin@lemmy.ca
    link
    fedilink
    English
    arrow-up
    5
    ·
    6 hours ago

    Wait, so when you download what is supposed to be the original email from Gmail, it’s not actually the original? How is that ok?

      • hperrin@lemmy.ca
        link
        fedilink
        English
        arrow-up
        1
        ·
        edit-2
        1 hour ago

        That’s the unfortunate reality. Google can force their will on the internet because they own around 35% of all email. And Microsoft owns another ~35%, so if the two decide to change how email works, it’s everyone else who has to conform.

    • w2xel@gehirneimer.deOP
      link
      fedilink
      arrow-up
      5
      ·
      5 hours ago

      Usually, the important parts of the mail, such as subject, sender and contents are protected by DKIM authentication. Unfortunately this is usually not visible to the end-user, i.e. as in my case, where mails fail DKIM, but are still presented in my inbox.

      Mail servers and relays add headers to the mail as it goes, for example their own IPs to trace the mail, or authentication results if authentication happens at various endpoints.

      In the end, the mail as in the gmail postbox is the result of the original mail, and all these additions of the mail relays. In an ideal world only DKIM authenticated would be presented to the end-user, but the world of mail seems to be so broken, that many sending servers just do not apply DKIM/DMARC correctly, and thus many receivers accept broken mail.

      • hperrin@lemmy.ca
        link
        fedilink
        English
        arrow-up
        4
        ·
        5 hours ago

        Yeah, I understand how the mail that Gmail receives is not necessarily the mail that the user sent (in regards to headers), and I understand Gmail can add headers (like auth results, spam scores, other sorts of records), but is Gmail really changing the “From” header or other headers included in the DKIM signature?? I would think that would be absolutely unacceptable.

        • w2xel@gehirneimer.deOP
          link
          fedilink
          arrow-up
          2
          ·
          4 hours ago

          Yes, it seems it does that. I assume it has some processing chain involving decode -> process -> encode of the whole mail, and usually that works out and the DKIM check passes. Apparantly, if you send something non-standard the decode and encode are not symmetrical and this happens. IMHO it shouldn’t, so I agree. Especially, showing users mails with broken authentication seems broken as well to me.

          • hperrin@lemmy.ca
            link
            fedilink
            English
            arrow-up
            1
            ·
            edit-2
            1 hour ago

            Quotes around the display name in a header certainly conforms to the standard (and in fact, the way Gmail rewrites it does not conform to the standard when it contains a space, but is the obsolete form), and I would expect any decent mail program to leave it alone. Then again, Gmail is not a decent mail program, and hasn’t been for a long time.

            Edit: @[email protected] points out below that Gmail’s rewrite does follow the spec, so it is conforming to the standard when white space is in the display name. It’s only when there is a dot (.) that it would be using the obsolete form.

            • w2xel@gehirneimer.deOP
              link
              fedilink
              arrow-up
              1
              ·
              3 hours ago

              I had to think about this for a while, but the standard is only obsoleting folding white space, i.e. white space that wraps lines, such as:

              Subject: This
               is a folded line
              

              which is equivalent to

              Subject: This is a folded line
              

              As I understand it, white space is allowed before and after obsoletion. Or do I understand it wrong?

              Edit:

              I think in the obsoleted language the following would have been allowed for a From: field as well:

              From: John
               Smith <j.smith@example.com>
              
              • hperrin@lemmy.ca
                link
                fedilink
                English
                arrow-up
                1
                ·
                edit-2
                3 hours ago

                Folding white space is a different issue. It’s the atom / quoted-string part. That’s the standard form of a display name. Meaning if it’s more than one word (separated by white space), it should use the quoted string form. It does list obs-phrase as an alternative, but using obsolete syntax should be avoided when possible.

                • w2xel@gehirneimer.deOP
                  link
                  fedilink
                  arrow-up
                  1
                  ·
                  2 hours ago

                  phrase is 1*word, not word. The difference to obs-phrase is that it allows dots (“.”) and folding whitespace. Not that it allows whitespace.

              • hperrin@lemmy.ca
                link
                fedilink
                English
                arrow-up
                1
                ·
                edit-2
                3 hours ago

                The section right below shows that it’s obsolete:

                A.6.1. Obsolete addressing
                
                   Note in the below example the lack of quotes around Joe Q. Public,
                   the route that appears in the address for Mary Smith, the two commas
                   that appear in the "To:" field, and the spaces that appear around the
                   "." in the jdoe address.
                
                ----
                From: Joe Q. Public <john.q.public@example.com>
                To: Mary Smith <@machine.tld:mary@example.net>, , jdoe@test   . example
                Date: Tue, 1 Jul 2003 10:52:37 +0200
                Message-ID: <5678.21-Nov-1997@example.com>
                
                Hi everyone.
                ----
                

                And in the spec itself, that syntax is named as “obs-phrase”.

                But yes, though obsolete, it is still legal syntax. So I guess I shouldn’t say it “does not conform to the standard”, but rather “just barely conforms to the standard”.

                • w2xel@gehirneimer.deOP
                  link
                  fedilink
                  arrow-up
                  1
                  ·
                  2 hours ago

                  In the same example, the Mary Smith is not the issue, rather the @machine.tld, as is written in the description.

                  John Q. Public is an issue because of the dot, not the spaces. The spaces are an issue in jdoe@test . example, as that’s actually an address, not a name

  • voracitude@lemmy.world
    link
    fedilink
    arrow-up
    6
    ·
    edit-2
    7 hours ago

    Hey, if you can figure out how to get Google to listen to you, please let them know they should have updated the DMARC policy for gmail.com to p=reject like 18 months ago, too. Because they’re still using p=none.

    Speaking of which, landing in spam doesn’t mean much as it’s post-processing after the mail has been accepted. What are the details of your tests so far - email you’re sending from with your MAS, email you’re receiving at? If you’re using an @gmail.com address to send, it could be the DMARC policy at fault - having a policy of “none” is worse than not having DMARC configured at all, nowadays. Have you received any bounce messages?

    And, have you tried a custom domain from somewhere like namecheap? If your mails through your own domain (with fully aligned DNS and a DMARC policy of “reject”) still go to spam then the issue is likely in your config for MSA or exim. If not, the issue is likely Gmail’s DMARC policy.

    • w2xel@gehirneimer.deOP
      link
      fedilink
      arrow-up
      2
      ·
      7 hours ago

      Ah interesting, I’m sending from my own domain and IP with DMARC set up to quarantine.

      Yes, the mail server is accepting it because gmail accepts the mail if either SPF or DKIM passes (not AND). My observation is that google for some reason sometimes puts the mail into spam, and sometimes into the regular mailbox. In both cases the headers show dkim=fail and spf=pass and I have no idea why it’s not deterministic. I’ve also tested this with same/similar mail contents.

      Edit: To be honest, I also don’t think that mail from my domain should be “sometimes” in the regular mailbox, if DKIM fails and DMARC has adkim=quarantine.

      • voracitude@lemmy.world
        link
        fedilink
        arrow-up
        3
        ·
        edit-2
        4 hours ago

        Ah-ha, I see, it’s more clear now. Yeah, if it’s an @gmail.com address, then the issue could well be how they’re processing the email. Did you notice any other differences in the email received by Gmail vs others, e.g. does the DKIM signature match between them?

        My thinking is that if they’re dropping the quotes around the name, they might be mangling the DKIM signature too…

        • w2xel@gehirneimer.deOP
          link
          fedilink
          arrow-up
          3
          ·
          7 hours ago

          Yes, I’ve seen one other header change: gmail seems to (again sometimes?) enforce Message-ID fields in the header, and may add or change it if it doesn’t match it’s internal requirements. Interestingly, I’ve seen both my mailserver getting “rejected because missing message-id” messages, and messages passing to the mailbox, but with a google-added Message-ID in the raw source.

          For my specific case of DKIM failures, I’ve not noticed other differences.

          If I take the specific raw source from gmail, i.e. after processing by google, re-add the quotes, and manually check the DKIM signature, the signature passes. With other words, the quotes are literally the only relevant change in my case.

          • voracitude@lemmy.world
            link
            fedilink
            arrow-up
            3
            ·
            edit-2
            4 hours ago

            Yeah, sounds like you cracked it, frankly. This does make sense - the signature covers the entire message, including the original headers (which would include those quotes, in your case). If Google’s processing is removing quotes that were originally there, the message changed, so DKIM would fail.

            I think Google must only leave the quotes in if there are special characters like a comma in the name. Are you able to update your exim to only include quotes when the from name includes a special character? Feels bad having to engineer around Google’s incompetence, I know, but that should solve the issue.

            Edit: Or you could change your “from” name to “no @ reply” or “daemon @ box” or something, which would force Google to leave the quotes in.

            • w2xel@gehirneimer.deOP
              link
              fedilink
              arrow-up
              2
              ·
              6 hours ago

              I think it’s technically an encoding bug in lettre, which is used in matrix authentication services: https://github.com/lettre/lettre . As far as I can tell, exim is relaying the messages “correctly” or at least without altering them.

              I.e. lettre should not add quotes for whitespace. But also google shouldn’t alter messages before authenticating. In an ideal world, both sides are fixed ^^

              • hperrin@lemmy.ca
                link
                fedilink
                English
                arrow-up
                2
                ·
                3 hours ago

                Lettre is following the spec, where if the display name contains a space, the modern way to encode it is to put quotes around it.

              • voracitude@lemmy.world
                link
                fedilink
                arrow-up
                2
                ·
                6 hours ago

                Sorry, was editing while you were replying - I’ll reply here in case you don’t see my edits, sorry if you read them already. In light of the following:

                • The only difference in the emails is the quotes in the “From”
                • DKIM passes after you manually add the quotes back in
                • Every other provider leaves the quotes in place

                I’m pretty sure that your setup is fine, including your exim config, and this is an issue specifically with Google’s processing like you originally thought. Try making your “from” name for the service “daemon @ box”, that will force Google to leave the quotes in. If the email passes dkim at Gmail like that, we have definitively proven your original theory correct.