- 
                Notifications
    
You must be signed in to change notification settings  - Fork 929
 
WeeklyTelcon_20210601
        Geoffrey Paulsen edited this page Jul 5, 2021 
        ·
        2 revisions
      
    - Austen Lauria (IBM)
 - Brendan Cunningham (Cornelis Networks)
 - Brian Barrett (AWS)
 - David Bernholdt (ORNL)
 - Edgar Gabriel (UH)
 - Geoffrey Paulsen (IBM)
 - Hessam Mirsadeghi (NVIDIA))
 - Howard Pritchard (LANL)
 - Jeff Squyres (Cisco)
 - Joseph Schuchart (HLRS)
 - Matthew Dosanjh (Sandia)
 - Sam Gutierrez (LANL)
 - Todd Kordenbrock (Sandia)
 - Tomislav Janjusic (NVIDIA)
 - William Zhang (AWS)
 
- Akshay Venkatesh (NVIDIA)
 - Artem Polyakov (NVIDIA)
 - Aurelien Bouteiller (UTK)
 - Brandon Yates (Intel)
 - Charles Shereda (LLNL)
 - Christoph Niethammer (HLRS)
 - Erik Zeiske (HPE)
 - Geoffroy Vallee (ARM)
 - George Bosilca (UTK)
 - Harumi Kuno (HPE)
 - Josh Hursey (IBM)
 - Joshua Ladd (NVIDIA)
 - Marisa Roman (Cornelius)
 - Mark Allen (IBM)
 - Matias Cabral (Intel)
 - Michael Heinz (Cornelis Networks)
 - Nathan Hjelm (Google)
 - Naughton III, Thomas (ORNL)
 - Noah Evans (Sandia)
 - Raghu Raja (secret startup)
 - Ralph Castain (Intel)
 - Scott Breyer (Sandia?)
 - Shintaro iwasaki
 - Xin Zhao (NVIDIA)
 
- 
Will roll v4.0.6 rc today
 - 
We'll do one more RC, and then get a final v4.0.6 out.
 - 
Where are we on pack/unpack with long and long double
- only external32
 - This worked before, but not sure
 
 - 
8918 - pack/unpack with external32
 - 
8818 - checking if
 - 
Brian thinks Issue 8990 would also apply to v4.0.x
- with-libevent=/usr (Debian packaging does), we add a -L/usr to wrapper output, and put all of the -L to find deps, before -L to libmpi.so, and if there is an ompi in /usr/lib as well,
 
 
- Shooting for end of August
 - No driver to rush, so now just in bugfix phase.
 
- Unscheduled RC
 - PR 9014 - new blocker.
- fix should just be a couple of lines of code... hard to decide what we want.
 - Ralph, Jeff and Brian started talking.
 
 - Need some configury changes in before we RC.
 - Issue 8850, 8990 and more
 - Brian will file 3-ish issues
- One is configure pmix
 
 - Dynamic Windows fix in for UCX.
 - Any update on debugger support?
 - Need some documentation that Open MPI v5.0 supports PMIx based debuggers, and that if
 - MPIR Shim - pushed up fixes, and enabled CI.
- Could add it to some more CI, to ensure that PMIx doesn't break
 - IBM is working on some CI testing with MPIR (typically very brittle)
 - Need some guidance on pmix version.
 - Right not, probably not a big deal, but perhaps in 2 years when we have 3 release branches with different pmix versions on different release branches, it might make sense to do open-mpi CI testing.
- Shouldn't be too much work to do.
 
 
 - UCC coll component updating to just set to be default when UCX is selected.  PR 8969
- Intent is that this will eventually replace hcoll.
 
 
- PR 8998 - MPIPy -
- In shift to PRRTE, --oversubscribe is NOT being handled. If you have more procs than slots on a node, internal oversubscribe var is not yet being set.
 - Jeff will look at.
 
 
- Mellanox hasn't been reporting for a while. Tommi will follow up.
 - Jeff did some work on Cisco MTT.
- There are a bunch of one-sided issues across node.
 - Austen and Jeff looking into.
 - Narrowed it down to strange results from MPI_Comm_split
- Local Peers value appears to be set wrong under PRRTE
 
 
 - Joseph see when he installed hwloc in installation path, which leads to warnings if using another hwloc.
- We changed how all of this worked a few weeks ago.
 - We shouldn't be installing one unless we can't find an external one.
 - Problem is if you link the application to a different hwloc, it now complains.
 - This has always been true, we just warn now. Don't do this.
 
 - Austen filed a couple of issues from MTT.
 
- No discussion
 
- No update
 
- No discussion.