Skip to content

Migrate away from ImprovMX for mailing lists #485

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
9 tasks done
jfly opened this issue Sep 30, 2024 · 22 comments
Closed
9 tasks done

Migrate away from ImprovMX for mailing lists #485

jfly opened this issue Sep 30, 2024 · 22 comments

Comments

@jfly
Copy link
Contributor

jfly commented Sep 30, 2024

We currently use ImprovMX to handle mail sent to @nixos.org (see relevant dns entries).

  • We only use ImprovMX for mail forwarding (teams like infra@, marketing@, etc). Today, nobody sends mail from @nixos.org, and nobody has any inboxes.
  • You need a web account with ImprovMX to see and to update these mail forwards. The Nix community can't see/audit any of this.
  • There are various limits (number of forwards, perhaps the number of emails an address can forward to?). See https://improvmx.com/pricing/. I don't know if we're currently paying for ImprovMX. I think I heard that we've run into some of these limits.

The plan

A few weeks ago, @Mic92 asked me to look into self hosting this instead. He recommended Simple NixOS Mailserver (SNM). I've played with it a bit, and it does seem like a good fit here.

  1. Install SNM on umbriel.
  2. Verify this server can successfully send mail (target: 10/10 on https://www.mail-tester.com/). Either by temporarily adding a login account, or speaking directly to postfix via the cli.
  3. Monitor smtp tls (see below).
  4. Alert on stmp tls monitor failures.
  5. Make it possible to send emails as nixos.org (start replacing mail-test.nixos.org with nixos.org).
  6. Wait until the Nix Steering Committee Election is done: https://nixos.org/blog/announcements/2024/sc-election-2024/.
  7. Talk to t-online and outlook to tell them we exist. #585
  8. Update mailing-lists with latest mailing lists and SMTP sending accounts in ImprovMX #586
  9. Switchover MX records from ImprovMX to umbriel.nixos.org #587

Notes

  1. Monitoring
  2. Backups
    • Not necessary. This service is pretty much stateless (except for the mail stuck in queues, which we can live with?)

Alternatives considered

  • I don't know if there's been any serious discussion about paying someone (ImprovMX or something else) to handle this for us. Since declarative management and audit-ability are important to us, it would either have to be a provider that has a Terraform provider, or we could build one ourselves.
  • @Mic92, can you shed any light on this?
jfly added a commit to jfly/infra that referenced this issue Sep 30, 2024
I'm going to be working on <NixOS#485>.
This will give me the power to do most of the work there, except for
deploying the relevant DNS changes with Terraform.
jfly added a commit to jfly/infra that referenced this issue Sep 30, 2024
I'm going to be working on <NixOS#485>.
This will give me the power to do most of the work there, except for
deploying the relevant DNS changes with Terraform.
@SuperSandro2000
Copy link
Member

I just want to make awareness that you probably need to write a mail to t-online and outlook (none 356) to whitelist your IP otherwise mails cannot be delivered.

@mweinelt
Copy link
Member

mweinelt commented Oct 1, 2024

After the leak of the existing email mappings I would be interested in discussing the privacy aspect of the email mappings. What other organization publishes those? The current set of addresses were not given to us by its recipients with the intent to make them public.

@jfly
Copy link
Contributor Author

jfly commented Oct 1, 2024

I just want to make awareness that you probably need to write a mail to t-online and outlook (none 356) to whitelist your IP otherwise mails cannot be delivered.

I hear you on this. I've never run a mailserver before, and honestly have no idea what our deliverability is going to be like. I believe the current set of emails is quite tiny, and may not even include any t-online or outlook. My personal opinion on this is that we should make sure we've solved the monitoring story: if we get notified for email stuck in queues, then we can tackle these allowlists as necessary, or we can give up and pay someone to handle this for us.

After the leak of the email mappings I would be interested in discussing the privacy aspect of the email mappings.

Sorry about that. I asked one person about this, but should have talked to more people before posting.

Ideas:

  1. We could encrypt the email addresses. This would be hard to code review.
  2. We could seek consent from all the relevant people. I don't know how hard this would be. I don't have the list anymore, but it didn't seem like an insurmountable number.
  3. Do this behind some self-hosted (or paid) webapp with a login. That's basically what we do today with ImprovMX.

@Mic92
Copy link
Member

Mic92 commented Oct 2, 2024

I just want to make awareness that you probably need to write a mail to t-online and outlook (none 356) to whitelist your IP otherwise mails cannot be delivered.

For T-Online at least this is just one email after setting up reverse DNS and everything up correctly.

Overall I also don't expect the NixOS foundation to have to handle large volume of email. The vote was the first time, we had to do this actually.

@Mic92
Copy link
Member

Mic92 commented Oct 2, 2024

  1. We could encrypt the email addresses. This would be hard to code review.
  2. We could seek consent from all the relevant people. I don't know how hard this would be. I don't have the list anymore, but it didn't seem like an insurmountable number.
  3. Do this behind some self-hosted (or paid) webapp with a login. That's basically what we do today with ImprovMX.

@zimbatm started to ask existing users of email addresses about that.

@Mic92
Copy link
Member

Mic92 commented Oct 2, 2024

I hear you on this. I've never run a mailserver before, and honestly have no idea what our deliverability is going to be like. I believe the current set of emails is quite tiny, and may not even include any t-online or outlook. My personal opinion on this is that we should make sure we've solved the monitoring story: if we get notified for email stuck in queues, then we can tackle these allowlists as necessary, or we can give up and pay someone to handle this for us.

Some DMARC and reading the mail logs in case there are delivery problems. I didn't had any big issues with emails for the NixOS wiki and that looks more like bulk messages compared to what I expect to be sent from nixos.org.

@zimbatm
Copy link
Member

zimbatm commented Oct 2, 2024

@jfly Is it possible to move the email addresses into sops-encoded secrets, or is this part only configurable with plain Nix code?

@SuperSandro2000
Copy link
Member

For T-Online at least this is just one email after setting up reverse DNS and everything up correctly.

And you need to have a proper imprint on the TLD of the rDNS entry and contact means via I think telephone and e-mail that is not going over the mail server.

I have recently done it and it took me a few back and forths but it is doable.

@jfly
Copy link
Contributor Author

jfly commented Oct 2, 2024

EDIT: After some discussion, we decided to give people the option of encrypting their email addresses when adding themselves to a mailing list. See #495 (comment) and the refinement to it here.

@jfly Is it possible to move the email addresses into sops-encoded secrets, or is this part only configurable with plain Nix code?

It currently requires plain Nix code:

Adding support for encrypted emails seems like it might actually not be too hard:

  • We could adjust the nixpkgs service to allow for multiple virtual_alias_maps (currently it supports exactly 0 or 1), and then we could add a new entry to that array to point at a virtual alias map generated with a sops-nix template.
    • I think the nixpkgs change I have in mind will look weird. We might need a more generic solution that has a satisfying answer to this question: "why does virtual_alias_maps get this special escape hatch but not other maps like alias_maps?"
  • Adding a new entry is a little tricky because you actually need to run postmap to "compile" these mappings, but I think the existing services.postfix.mapFiles option is flexible enough to do this for us without changes.

tl;dr:

  • It's possible, but requires changes to nixpkgs, and perhaps SNM, depending on how we want to expose this.
  • I'm willing to do this work, but would prefer to wait until we know if it's necessary first.
    • If it makes sense to do this work, I could use a brainstorm partner on the nixpkgs change.

jfly added a commit to jfly/infra that referenced this issue Feb 16, 2025
To avoid potential alterting noise: I'll wait until this is deployed and
succeeding before declaring an additional alert.

refs: NixOS#485
jfly added a commit to jfly/infra that referenced this issue Feb 16, 2025
To avoid potential alerting noise: I'll wait until this is deployed and
succeeding before declaring an additional alert.

refs: NixOS#485
jfly added a commit to jfly/infra that referenced this issue Feb 19, 2025
I re-locked in order to pull in
<NixOS/nixpkgs#383081>

To avoid potential alerting noise: I'll wait until this is deployed and
succeeding before declaring an additional alert.

refs: NixOS#485
jfly added a commit to jfly/infra that referenced this issue Feb 20, 2025
I re-locked in order to pull in
<NixOS/nixpkgs#383081>

To avoid potential alerting noise: I'll wait until this is deployed and
succeeding before declaring an additional alert.

refs: NixOS#485
jfly added a commit to jfly/infra that referenced this issue Feb 20, 2025
I re-locked in order to pull in
<NixOS/nixpkgs#383081>

To avoid potential alerting noise: I'll wait until this is deployed and
succeeding before declaring an additional alert.

refs: NixOS#485
jfly added a commit to Erethon/nixos-infra that referenced this issue Mar 28, 2025
I'm going to be working on <NixOS#485>.
This will give me the power to do most of the work there, except for
deploying the relevant DNS changes with Terraform.
jfly added a commit to Erethon/nixos-infra that referenced this issue Mar 28, 2025
I re-locked in order to pull in
<NixOS/nixpkgs#383081>

To avoid potential alerting noise: I'll wait until this is deployed and
succeeding before declaring an additional alert.

refs: NixOS#485
@infinisil
Copy link
Member

I'm now hitting https://gitlab.com/simple-nixos-mailserver/nixos-mailserver/-/issues/302, because my mail server at infinisil.com had a strict SPF policy (-all), which was not a problem with ImprovMX. I didn't receive any mails since the switch, so I only noticed this once somebody pointed it out to me (thanks @ryantrinkle). For now I updated by SPF records to be less strict (~all), and explicitly add a:umbriel.nixos.org to the allow list, which should hopefully fix the issue, but I really don't think that's a great solution, because others might also be affected but not know about it.

@jfly
Copy link
Contributor Author

jfly commented Apr 8, 2025

Sorry, I'm not quite following this. There are 2 ways that https://gitlab.com/simple-nixos-mailserver/nixos-mailserver/-/issues/302 could affect us:

  1. Someone uses a @nixos.org address to send mail to a mailing list. We addressed that by loosening our SPF record (as you said you've already done with your mailserver at infinisil.com.
  2. Someone signs up a @nixos.org address for some other mailing list. That mailing list which forwards emails that fail SPF but pass DKIM (and therefore pass DMARC). Our mailserver would (IMO incorrectly) drop those. https://gitlab.com/simple-nixos-mailserver/nixos-mailserver/-/issues/301 is a feature request to SNM to accept these instead.

I didn't receive any mails since the switch

Which emails haven't you received, and why? I don't see how changing your personal mailserver's SPF policy would have any affect on this.

jfly added a commit to jfly/infra that referenced this issue Apr 8, 2025
This reverts commit e47fbe0.

We received a report of delivery issues:
NixOS#485 (comment).

I'm not sure how long it will take to root cause this issue and fix it.
I propose that we roll back for now.
@jfly
Copy link
Contributor Author

jfly commented Apr 8, 2025

That all said, out of an abundance of caution, I'd like to roll back until we understand what's going on: #621

@infinisil
Copy link
Member

Thanks for the quick offer!

Here's the message Ryan Trinkle received when CCing [email protected]:

This is the mail system at host umbriel.nixos.org.

I'm sorry to have to inform you that your message could not
be delivered to one or more recipients. It's attached below.

For further assistance, please file an issue at
https://github.com/NixOS/infra/issues/new. Please anonymize any personal
email addresses in your report.

If you do so, please include this problem report. You can
delete your own text from the attached returned message.

The mail system

<[email protected]> (expanded from <[email protected]>): host
mail.infinisil.com[206.81.23.189] said: 550 5.7.23 <TOADDRESS>:
Recipient address rejected: Message rejected due to: SPF fail - not
authorized. Please see
http://www.openspf.net/Why?s=mfrom;id=FROMADDRESS;ip=37.27.20.162;r=infinisil.com
(in reply to RCPT TO command)

Where FROMADDRESS is Ryan Trinkle's personal email address.
I can also see this having happened at least once more for a [email protected]-forwarded email.

@SuperSandro2000
Copy link
Member

SuperSandro2000 commented Apr 8, 2025

which should hopefully fix the issue, but I really don't think that's a great solution, because others might also be affected but not know about it.

We had the same problem over at c3d2.de and I am afraid that is the only solution that I have personally found.


@infinisil are you trying to send a mail via umbriel.nixos.org that has @infinisil.com in the from? Without the right configuration SPF prevents that (which is good and correct) and IMO sending mails from other mail servers is anyway something that is a bit sketchy.

I think SPF rewriting could be configured to fix this and the mailing list might be lacking that.

@infinisil
Copy link
Member

infinisil commented Apr 8, 2025

Me and @jfly sat together and looked into this a bit more closely. Conclusions:

  • ImprovMX didn't have this issue because it used SRS. @jfly is looking into doing the same with our own mail server before giving it another try. We found this to be the best description on how to do that.
  • Both me and @ryantrinkle's mail servers have the same restrictive -all SPF policy, but it's up to the senders SPF record (that's what the S stands for!) to determine pass/fail. So if we wanted to workaround this issue by updating SPF records to be more lax, the sender (which in this case was @ryantrinkle) has to do that, not the receiver (which in this case was me). Since the infra team has now temporarily rolled back to ImprovMX until SRS is configured, we won't need that, but good to know.

@jfly
Copy link
Contributor Author

jfly commented Apr 9, 2025

I played around with SRS, and this does look pretty straightforward to do. My progress so far:

  1. I noticed that nixpkgs has a pretty old version of postsrsd. postsrsd 2 has some breaking changes, so I'd rather develop against that version than have to deal with those breaking changes in the future. postsrsd: 1.12 -> 2.0.10 + corresponding service changes nixpkgs#397316

  2. I've implemented this with my personal mailserver, it was quite straightforward: jfly/snow@ec179dc. This has the desired effect: I see that emails forwarded onto another domain now pass SPF.

    Before

    Image

    Image

    After

    Image

    Image

  3. I've sent in a PR to implement SRS on nixos.org here: fix(mailserver): enable Sender Rewriting Scheme (SRS) on umbriel #622

  4. I've read through https://support.google.com/mail/answer/175365, which mentions ARC as another thing we should implement. I see in our logs that ImprovMX does implement it, but we don't have it configured on our server. Some light research led me to this reddit comment. tl;dr: it might be a pain to implement/maintain (OpenARC doesn't seem to be maintained), and it's not clear that it would really do anything for our deliverability, as ARC seems to rely upon manually configured trust. Please disregard, we do get ARC with simple-nixos-mailserver.

jfly added a commit to jfly/infra that referenced this issue Apr 9, 2025
We had deliverability issues when forwarding mail to other domains (if
the domain of the email we were forwarding had a strict SPF policy, the
receiving mailserver would drop it). See
NixOS#485 (comment) for
details.
jfly added a commit to jfly/infra that referenced this issue Apr 9, 2025
We had deliverability issues when forwarding mail to other domains (if
the domain of the email we were forwarding had a strict SPF policy, the
receiving mailserver would drop it due to SPF fail). See
NixOS#485 (comment) for
details.
@Mic92
Copy link
Member

Mic92 commented Apr 9, 2025

Rspamd does implement ARC. Was this not also used by simple-mail server?

@jfly
Copy link
Contributor Author

jfly commented Apr 10, 2025

Rspamd does implement ARC. Was this not also used by simple-mail server?

Oops. You're totally right. I see ARC headers in emails forwarded by umbriel. Please disregard.

@nixos-discourse
Copy link

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/simple-nixos-mailserver-message-rejected-due-to-spf-fail-not-authorized/38067/16

@mweinelt
Copy link
Member

mweinelt commented Apr 10, 2025

Rspamd does implement ARC. Was this not also used by simple-mail server?

Oops. You're totally right. I see ARC headers in emails forwarded by umbriel. Please disregard.

Can't say I do. My rspamd even classifies your recent mails with ARC_NA.

@jfly
Copy link
Contributor Author

jfly commented Apr 10, 2025

This issue is getting too large. I've filed #631 to investigate ARC.

@jfly
Copy link
Contributor Author

jfly commented Apr 10, 2025

I'm closing this. The new mailserver has launched (and hopefully will stay launched).

We still have to clean up ImprovMX, which is tracked by #587.

@jfly jfly closed this as completed Apr 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants