- 
                Notifications
    
You must be signed in to change notification settings  - Fork 929
 
WeeklyTelcon_20210921
        Geoffrey Paulsen edited this page Sep 21, 2021 
        ·
        1 revision
      
    - Austen Lauria (IBM)
 - Brendan Cunningham (Cornelis Networks)
 - Brian Barrett (AWS) - Welcome Back!
 - David Bernholdt (ORNL)
 - Geoffrey Paulsen (IBM)
 - Harumi Kuno (HPE)
 - Hessam Mirsadeghi (NVIDIA))
 - Howard Pritchard (LANL)
 - Jeff Squyres (Cisco)
 - Joseph Schuchart (HLRS)
 - Josh Hursey (IBM)
 - Matthew Dosanjh (Sandia)
 - Michael Heinz (Cornelis Networks)
 - Sriraj Paul (Intel)
 - Thomas Naughton (ORNL)
 - Todd Kordenbrock (Sandia)
 
- Akshay Venkatesh (NVIDIA)
 - Artem Polyakov (NVIDIA)
 - Aurelien Bouteiller (UTK)
 - Brandon Yates (Intel)
 - Charles Shereda (LLNL)
 - Christoph Niethammer (HLRS)
 - Edgar Gabriel (UH)
 - Erik Zeiske (HPE)
 - Geoffroy Vallee (ARM)
 - George Bosilca (UTK)
 - Joshua Ladd (NVIDIA)
 - Marisa Roman (Cornelius)
 - Mark Allen (IBM)
 - Matias Cabral (Intel)
 - Nathan Hjelm (Google)
 - Noah Evans (Sandia)
 - Raghu Raja
 - Ralph Castain (Intel)
 - Sam Gutierrez (LANL)
 - Scott Breyer (Sandia?)
 - Shintaro iwasaki
 - Tomislav Janjusic (NVIDIA)
 - William Zhang (AWS)
 - Xin Zhao (NVIDIA)
 
- Does Fortran Fixes affect API?  (i.e. needed for v5.0.0?)
- PR https://github.com/open-mpi/ompi/pull/9259
- Jeff reviewed 16 days ago, looks incomplete.
 - Think that 9367 addresses the issue with 9259.
 
 - and PR https://github.com/open-mpi/ompi/pull/9367
 
 - PR https://github.com/open-mpi/ompi/pull/9259
 
- Schedule: Probably pushed to October for 4.0.7
 - PR 9298, should we take that?
- Opinion is that --cpu-set is broken badly enough we should remove it.
 - We could print a warning opal_show_help so we don't break.
- What's the president?
 - Concern is it's high risk area of code.
 - Other concern is that yes this changes behavior back to v3.1, but that's old.
 - Command line option
 - Could introduce a new option to do this in middle of stream
- But then we'd have to maintain both options through v5.0.
 
 
 - This area of code is very easy to mess it up, and don't want to do a quick turn around if we fix something.
 - Should update the mpirun man page / document that it's broken.
 - Same for v4.1 as v4.0
 - Issue a warning message if user uses -cpu-set.
 - Geoff will PR a man page and warning message, and mpirun --help, and close the above PRs.
- Consensus is that this change is too risky (based on area of code change) for gain in release branches.
 - v5.0.x works fine.
 
 
 
- Schedule:
- Shooting for v4.1.2rc1 on Friday.
 
 - Two outstanding:
- ROMIO update.
 - --cpu-set PR from Geoff (see above)
 - One more pending on v4.1.x Jenkins had some issues that Brian is looking at.
 
 - ROMIO 3.2.1 based PR 8371 do we want to take this?
- v4.1.x does this need to go back to v4.0.x?
 
 
- Schedule: aiming for rc1 on Sept 23rd.
 - George was able to verify the BTL+OSC RDMA failures is not only IBM.
 - Tommy's still pushing on UCX Onesided.
 - PMIx and/or PRRTE are releasing a new minor rev that we'll pickup for v5.0.x
- Did we update yet?
 
 - Think there are other issues than just one sided.
- 5 in issues, only 2 are one-sided.
 - One is static linking, Austen will reverify
 
 - Talk about gcc v4.7 and RHEL6
- PMIx and PRRTE just don't compile on RHEL6, but because of this, do we even care about RHEL6? specifically gcc v4.4.7
 - RHEL7 v4.8.5 works fine.
 - Not interested in testing all of those gcc version.
- Jeff will post a Pull Request.
 - Will officially truncate support as well.
 - No issues with glibc issues, so no hard check.
 
 
 - Documentation
- Got a change in sphynx tools needed.  No sure if there's a release yet.
- This fixes outputting issues in manpages.
 
 - Process to update FAQ is to talk to Jeff or Harumi.
 - Any changes in README or FAQ let them know to make changes in NEW docs.
- For now, make changes in ompi-www and README as usual and let them know.
 
 
 - Got a change in sphynx tools needed.  No sure if there's a release yet.
 - v5.0.x requires pandoc.  If user downloads from .tarball they do NOT need pandoc installed.
- If user runs 
make distormake dist-checkthey WILL need pandoc.- This is a strange quirk, but seems fine.
 
 
 - If user runs 
 - Problem with OFI and Open MPI
- No discussion
 
 - Github Project of [critical v5.0.x issues|https://github.com/open-mpi/ompi/projects/3]
- Issue 8983 - Nathan volunteered to put out a fix.
 - If we partially disable OSC/TCP BTL - Not breaking MPI compliance, just breaking One-sided performance badly.
- https://github.com/open-mpi/ompi/pull/8984
 - https://github.com/open-mpi/ompi/issues/7830
 - users could fall back to using UCX or OFI, and not the BTLs.
- But that's a different can-of-worms
 
 - Brian will take a look at issue.
 
 - Described approach of rc1 on Sept 23, disabling any functionality that are blockers to allow for the rc.
- Worried that blockers might not be fixed in time, so will put in code to issue an error at runtime to prevent getting into those paths, and document it heavily.
 
 
 - MPIAlltoallw needs to go in.  Is a PR from Giles George
- https://github.com/open-mpi/ompi/pull/9329
 - Test has been merged not a fix.
 - 
https://github.com/open-mpi/ompi/pull/9330
- George thinks it's ready to go.
 - Jeff will review.
 
 
 - Portals bugfixes incomming.
- Todd's working on this. Hasn't posted yet. Will post this week.
 - 9391
 
 - 
https://github.com/open-mpi/ompi/pull/9326 should get into 5.0 too
- This fixes a correctness issues, and George is concerned about performance.
 - Is argobots now unsupported?
- no. Our integration allow users to call MPI withing a blocking argobot function and this still works.
 - What we think is a thread that will block in libevent, because libevent isn't aware of argobots, so libeven will block entire thread.
 
 - George joined about this time. I think he said this was ready or that he'd re-read.
 
 
- No discussion
 - MTT results look pretty good
- Cisco is seeing lots of segvs on master
 - Some compile fails which are strange.
 
 
- No update
 - Don't do the old system, use this new system for v5.0.0
 
- No discussion [Open MPI 4.0 API Compliance Github Project|https://github.com/open-mpi/ompi/projects/2]
 - Jeff's going to review PR 9246
 - Howard will review 7985
 - Need to decide what to do with 8057
 - Sessions branch, don't want to merge into master until possibly v5.0.1 gets out.
- It will complicate things in finalize/initialize code.
 
 
- Looking okay.
 - Ciscos results are still hidden by default.
 
- No discussion.