Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Buildkit on Windows Server 2019 Unable to Reuse Namespaces / Endpoints #5668

Open
nerddtvg opened this issue Jan 16, 2025 · 10 comments · May be fixed by #5686
Open

Buildkit on Windows Server 2019 Unable to Reuse Namespaces / Endpoints #5668

nerddtvg opened this issue Jan 16, 2025 · 10 comments · May be fixed by #5686
Assignees

Comments

@nerddtvg
Copy link

When running on Server 2019, buildkit is unable to reuse a namespace for the second RUN command in a Dockerfile (or subsequent runs). The namespace state is 1 and endpoint state is 4 (Detatched).

An identical buildkit, containerd, cni configuration in Server 2022 works without issue.

Server 2019 1809
Microsoft Windows NT 10.0.17763.0

buildkitd --version
buildkitd github.com/moby/buildkit v0.18.2 e4da654b1251f91e914fab18eba33743aefd7080

containerd --version
containerd github.com/containerd/containerd/v2 v2.0.1 88aa2f531d6c2922003cc7929e51daf1c14caa0a

nerdctl --version
nerdctl version 2.0.2

I have tried the Windows CNI plugin versions 0.3.0 and 0.3.1.

This is a similar error to: #4960 However, the subnet and NAT configurations are correct and match the existing nat HnsNetwork.

Sample Dockerfile:

FROM mcr.microsoft.com/dotnet/runtime:8.0-windowsservercore-ltsc2019

SHELL ["C:\\Windows\\System32\\WindowsPowerShell\\v1.0\\powershell", "-NonInteractive", "-NoProfile", "-Command", "$ErrorActionPreference = 'Stop'; $ProgressPreference = 'SilentlyContinue';"]

RUN ipconfig /all
RUN nslookup google.com
RUN ping google.com

ENTRYPOINT ["C:\\Windows\\System32\\WindowsPowerShell\\v1.0\\powershell", "-Command", "$sec = Get-Random -Minimum 10 -Maximum 100 ; Write-Host $sec ; Start-Sleep -Seconds $sec"]

buildkitd logs:

time="2025-01-16T15:36:59Z" level=debug msg="load cache for [1/4] FROM mcr.microsoft.com/dotnet/runtime:8.0-windowsservercore-ltsc2019@sha256:feca15ab0fa7f47610aabfd72f9d87c9926b520b67c824003e9d44aaf2095424 with amcuh3zdpvou93q7u656cj3da::9y7eql5jiyjat2ld2ci8voerl"
time="2025-01-16T15:36:59Z" level=debug msg="Calling proc (1)"
time="2025-01-16T15:36:59Z" level=debug msg="Calling proc (2)"
time="2025-01-16T15:37:00Z" level=debug msg="creating new network namespace rrinwqxoq8xx5xyw2ehllg2dl" span="[2/4] RUN ipconfig /all" spanID=5b1a16f33e8cbd78 traceID=f9b970ce5f01f392bee8dca451ba00c4
time="2025-01-16T15:37:00Z" level=debug msg="hcn::HostComputeNamespace::Create id="
time="2025-01-16T15:37:00Z" level=debug msg="hcn::HostComputeNamespace::Create JSON: {\"Type\":\"Guest\",\"SchemaVersion\":{\"Major\":2,\"Minor\":0}}"

time="2025-01-16T15:37:00Z" level=debug msg="finished creating network namespace rrinwqxoq8xx5xyw2ehllg2dl" span="[2/4] RUN ipconfig /all" spanID=5b1a16f33e8cbd78 traceID=f9b970ce5f01f392bee8dca451ba00c4
time="2025-01-16T15:37:00Z" level=debug msg="finished setting up network namespace rrinwqxoq8xx5xyw2ehllg2dl" span="[2/4] RUN ipconfig /all" spanID=5b1a16f33e8cbd78 traceID=f9b970ce5f01f392bee8dca451ba00c4
time="2025-01-16T15:37:08Z" level=debug msg="Calling proc (1)"
time="2025-01-16T15:37:08Z" level=debug msg="Calling proc (2)"
time="2025-01-16T15:37:08Z" level=debug msg="returning network namespace rrinwqxoq8xx5xyw2ehllg2dl from pool" span="[3/4] RUN nslookup google.com" spanID=c7b93686715648d1 traceID=f9b970ce5f01f392bee8dca451ba00c4
time="2025-01-16T15:37:09Z" level=error msg="/moby.buildkit.v1.Control/Solve returned error: rpc error: code = Unknown desc = process \"C:\\\\Windows\\\\System32\\\\WindowsPowerShell\\\\v1.0\\\\powershell -NonInteractive -NoProfile -Command $ErrorActionPreference = 'Stop'; $ProgressPreference = 'SilentlyContinue'; nslookup google.com\" did not complete successfully: failed to create shim task: hcs::CreateComputeSystem sl4e3pv92h6my5lu0k08bdg34: The requested operation for attach namespace failed.: unknown" spanID=abb02b324b4c34d9 traceID=f9b970ce5f01f392bee8dca451ba00c4
process "C:\\Windows\\System32\\WindowsPowerShell\\v1.0\\powershell -NonInteractive -NoProfile -Command $ErrorActionPreference = 'Stop'; $ProgressPreference = 'SilentlyContinue'; nslookup google.com" did not complete successfully: failed to create shim task: hcs::CreateComputeSystem sl4e3pv92h6my5lu0k08bdg34: The requested operation for attach namespace failed.: unknown
4052 v0.18.2 C:\Program Files\buildkit\buildkitd.exe --run-service --service-name buildkitd --containerd-worker=true --containerd-cni-config-path=C:\Program Files\containerd\cni\conf\0-containerd-nat.conf --containerd-cni-binary-dir=C:\Program Files\containerd\cni\bin --debug --log-file=C:\Windows\Temp\buildkitd.log
main.unaryInterceptor
        /src/cmd/buildkitd/main.go:717
google.golang.org/grpc.NewServer.chainUnaryServerInterceptors.chainUnaryInterceptors.func1
        /src/vendor/google.golang.org/grpc/server.go:1202
github.com/moby/buildkit/api/services/control._Control_Solve_Handler
        /src/api/services/control/control_grpc.pb.go:289
google.golang.org/grpc.(*Server).processUnaryRPC
        /src/vendor/google.golang.org/grpc/server.go:1394
google.golang.org/grpc.(*Server).handleStream
        /src/vendor/google.golang.org/grpc/server.go:1805
google.golang.org/grpc.(*Server).serveStreams.func2.1
        /src/vendor/google.golang.org/grpc/server.go:1029
runtime.goexit
        /usr/local/go/src/runtime/asm_amd64.s:1700

4052 v0.18.2 C:\Program Files\buildkit\buildkitd.exe --run-service --service-name buildkitd --containerd-worker=true --containerd-cni-config-path=C:\Program Files\containerd\cni\conf\0-containerd-nat.conf --containerd-cni-binary-dir=C:\Program Files\containerd\cni\bin --debug --log-file=C:\Windows\Temp\buildkitd.log
github.com/moby/buildkit/solver/llbsolver/ops.(*ExecOp).Exec
        /src/solver/llbsolver/ops/exec.go:493
github.com/moby/buildkit/solver.(*sharedOp).Exec.func2
        /src/solver/jobs.go:1100
github.com/moby/buildkit/util/flightcontrol.(*call[...]).run
        /src/util/flightcontrol/flightcontrol.go:122
sync.(*Once).doSlow
        /usr/local/go/src/sync/once.go:76
sync.(*Once).Do
        /usr/local/go/src/sync/once.go:67
runtime.goexit
        /usr/local/go/src/runtime/asm_amd64.s:1700

4052 v0.18.2 C:\Program Files\buildkit\buildkitd.exe --run-service --service-name buildkitd --containerd-worker=true --containerd-cni-config-path=C:\Program Files\containerd\cni\conf\0-containerd-nat.conf --containerd-cni-binary-dir=C:\Program Files\containerd\cni\bin --debug --log-file=C:\Windows\Temp\buildkitd.log
github.com/moby/buildkit/solver.(*edge).execOp
        /src/solver/edge.go:966
github.com/moby/buildkit/solver/internal/pipe.NewWithFunction[...].func2
        /src/solver/internal/pipe/pipe.go:78
runtime.goexit
        /usr/local/go/src/runtime/asm_amd64.s:1700

time="2025-01-16T15:37:09Z" level=debug msg="session finished: <nil>" spanID=64affebe8305aa53 traceID=f9b970ce5f01f392bee8dca451ba00c4

Windows Hyper-V-Compute event log:

[sl4e3pv92h6my5lu0k08bdg34] Create Container, type 'Silo Container', settings '{"Owner":"containerd-shim-runhcs-v1.exe","SchemaVersion":{"Major":2,"Minor":1},"Container":{"GuestOs":{"HostName":"buildkitsandbox"},"Storage":{"Layers":[{"Id":"1a3348c8-c43b-5c1a-83f4-4bbff0d32f39","Path":"C:\\ProgramData\\containerd\\root\\io.containerd.snapshotter.v1.windows\\snapshots\\98"},{"Id":"9caf22d5-d18a-59e5-afd8-a5e3870a1fee","Path":"C:\\ProgramData\\containerd\\root\\io.containerd.snapshotter.v1.windows\\snapshots\\90"},{"Id":"298e3ca2-da55-5141-8d6a-d1924c6353ed","Path":"C:\\ProgramData\\containerd\\root\\io.containerd.snapshotter.v1.windows\\snapshots\\89"},{"Id":"fb9d2f2a-b87c-5a9c-99c0-9e720042277a","Path":"C:\\ProgramData\\containerd\\root\\io.containerd.snapshotter.v1.windows\\snapshots\\88"},{"Id":"2137054d-9b8a-5e07-8472-e1f06abc13fa","Path":"C:\\ProgramData\\containerd\\root\\io.containerd.snapshotter.v1.windows\\snapshots\\87"},{"Id":"1d381a65-06c4-5a2a-8ba6-842762e847b5","Path":"C:\\ProgramData\\containerd\\root\\io.containerd.snapshotter.v1.windows\\snapshots\\86"}],"Path":"\\\\?\\Volume{ca559c84-13df-4d85-897c-57fec4a804e0}\\"},"MappedDirectories":[{"HostPath":"\\\\?\\Volume{12586166-25a9-4653-bc2f-258d609e6840}\\Program Files\\buildkit\\buildkitd.exe","ContainerPath":"C:\\Windows\\System32\\get-user-info.exe","ReadOnly":true}],"MappedPipes":[{"ContainerPipeName":"otel-grpc","HostPath":"\\\\.\\pipe\\buildkit-otel-grpc"}],"Processor":{},"Networking":{"Namespace":"8d8fd7ab-75aa-4945-8cf5-8dfb906a9cae"},"RegistryChanges":{"AddValues":[{"Key":{"Hive":"System","Name":"ControlSet001\\Control"},"Name":"WaitToKillServiceTimeout","Type":"String","StringValue":"2147483647"}]}},"ShouldTerminateOnLastHandleClosed":true}'

[sl4e3pv92h6my5lu0k08bdg34] Queue system notification: 2 / 0x803B002E

[sl4e3pv92h6my5lu0k08bdg34] Create compute system, result 0x803B002E

HCN_E_NAMESPACE_ATTACH_FAILED 	The requested operation for attach namespace failed 	0x803b002E

HNS Output:

PS > Get-HnsEndpoint

ActivityId                : 2974E63E-C9DD-4A5B-A867-65069FBFA448
AdditionalParams          :
CreateProcessingStartTime : 133815154201296258
DNSServerList             : 168.63.129.16
DNSSuffix                 : yzspoegj5jtuzenk3d2ra04d2g.bx.internal.cloudapp.net
EnableLowInterfaceMetric  : True
EncapOverhead             : 0
GatewayAddress            : 172.24.96.1
Health                    : @{LastErrorCode=0; LastUpdateTime=133815154201266283}
ID                        : CC4C7F16-26BC-4CDF-B052-E840FE09DA93
IPAddress                 : 172.24.105.2
InterfaceConstraint       : @{InterfaceGuid=00000000-0000-0000-0000-000000000000}
MacAddress                : 00-15-5D-41-7B-2F
Name                      : rrinwqxoq8xx5xyw2ehllg2dl_nat
Namespace                 : @{ID=8D8FD7AB-75AA-4945-8CF5-8DFB906A9CAE; IsDefault=False}
Policies                  : {}
PrefixLength              : 20
RemoveProcessingStartTime : 133815154255318204
Resources                 : @{AdditionalParams=; AllocationOrder=2; Allocators=System.Object[]; Health=; ID=2974E63E-C9DD-4A5B-A867-65069FBFA448; PortOperationTime=0; State=1; SwitchOperationTime=0; VfpOperationTime=0; parentId=871E0337-F1D6-45D2-9BD3-71BEB6A40E21}
SharedContainers          : {}
StartTime                 : 133815154288301459
State                     : 4
Type                      : NAT
Version                   : 38654705669
VirtualNetwork            : 10de7571-39fa-4fa5-9c94-b06cee9dc9c1
VirtualNetworkName        : nat



PS > Get-HnsNamespace -Id 8D8FD7AB-75AA-4945-8CF5-8DFB906A9CAE

ActivityId       : 381977D2-D854-45C7-B386-961006DD4892
AdditionalParams :
CompartmentGuid  : 00000000-0000-0000-0000-000000000000
CompartmentId    : 2
Containers       : {}
Health           : @{LastErrorCode=0; LastUpdateTime=133815154200336902}
ID               : 8D8FD7AB-75AA-4945-8CF5-8DFB906A9CAE
IsDefault        : False
Policies         : {}
ResourceList     : {@{Data=; Type=Endpoint}}
Resources        : @{AdditionalParams=; AllocationOrder=0; Health=; ID=381977D2-D854-45C7-B386-961006DD4892; PortOperationTime=0; State=1; SwitchOperationTime=0; VfpOperationTime=0}
State            : 1
Type             : VM
Version          : 38654705669

If I remove the HnsEndpoint (Get-HnsEndpoint | Where-Object {$_.State -eq 4} | Remove-HnsEndpoint), the container will run one step and fail on the next RUN trying to re-use the endpoint. The container won't be able to connect out to the Internet though, ping and DNS lookups fail.

#7 [4/4] RUN ping google.com
#7 4.874 Ping request could not find host google.com. Please check the name and try again.
#7 ERROR: process "C:\\Windows\\System32\\WindowsPowerShell\\v1.0\\powershell -NonInteractive -NoProfile -Command $ErrorActionPreference = 'Stop'; $ProgressPreference = 'SilentlyContinue'; ping google.com" did not complete successfully: exit code: 1
------
 > [4/4] RUN ping google.com:
4.874 Ping request could not find host google.com. Please check the name and try again.

Since the CNI pool size cannot be less than one (setting 0 provisions a single namespace) and the cache time cannot be adjusted (const 5 minutes), I can find no workaround to prevent the namespace reuse on 2019. If the namespace was cleaned up each time, this would work.

I'm unsure the root cause of the bug, if it is a 2019 HCS issue or not.

@nerddtvg
Copy link
Author

I did test buildkit v0.19.0-rc2 to make sure it wasn't already fixed and the same error occurs.

@profnandaa
Copy link
Collaborator

ACK, will take a look and get back.

@peterfocko
Copy link

I am facing the same issue on Server 2019, first RUN instruction succeeds, the second one fails always with: The requested operation for attach namespace failed.: unknown

@profnandaa
Copy link
Collaborator

HI @nerddtvg --
I have not been able to repro on WS2019, I've stumbled on some other unrelated errors trying to run buildctl (which you can ignore for now):

Can I get this from you please, I'd like to confirm something about the NetworkAdapter and the HnsNetwork:

Get-NetAdapter # list all of them, I'd like to see which one the HnsNetwork is connected to
$nat = Get-HnsNetwork | where {$_.Name -eq "nat" }

And also, what do you get when you run this ctr command?:

ctr run --rm --cni mcr.microsoft.com/windows/nanoserver:ltsc2019 cni-test curl -I example.com

I'm working to solve this CNI setup issue once and for all. Seems like we have a couple of setup scripts around that are not all coherent...

@nerddtvg
Copy link
Author

@profnandaa Below are the requested responses.

PS C:\Windows\system32> Get-NetAdapter

Name                      InterfaceDescription                    ifIndex Status       MacAddress             LinkSpeed
----                      --------------------                    ------- ------       ----------             ---------
vEthernet (nat)           Hyper-V Virtual Ethernet Adapter             18 Up           00-15-5D-4A-2A-41        10 Gbps
Ethernet0                 vmxnet3 Ethernet Adapter                      9 Up           00-50-56-90-70-07        10 Gbps


PS C:\Windows\system32> Get-HnsNetwork


ActivityId             : F6566D6A-7468-46CF-B33B-E0F824681E66
AdditionalParams       :
CurrentEndpointCount   : 0
Extensions             : {@{Id=E7C3B2F0-F3C5-48DF-AF2B-10FED6D72E7A; IsEnabled=False; Name=Microsoft Windows Filtering Platform}, @{Id=E9B59CFA-2BE1-4B21-828F-B6FBDBDDC017; IsEnabled=False; Name=Microsoft Azure VFP Switch Extension}, @{Id=EA24CD6C-D17A-4348-9190-09F0D5BE83DD; IsEnabled=True; Name=Microsoft NDIS Capture}}
Flags                  : 0
Health                 : @{AddressNotificationMissedCount=0; AddressNotificationSequenceNumber=0; InterfaceNotificationMissedCount=0; InterfaceNotificationSequenceNumber=0; LastErrorCode=0; LastUpdateTime=133819496200703357; RouteNotificationMissedCount=0; RouteNotificationSequenceNumber=0}
ID                     : F6C5EF13-C350-4695-B9C9-DE9B2A87FBF8
IPv6                   : False
InterfaceConstraint    : @{InterfaceGuid=00000000-0000-0000-0000-000000000000}
LayeredOn              : 2F75B87E-2BD0-463B-973B-14691C40C0DC
MacPools               : {@{EndMacAddress=00-15-5D-4A-2F-FF; StartMacAddress=00-15-5D-4A-20-00}}
MaxConcurrentEndpoints : 5
Name                   : nat
NatName                : ICS79A62C55-619B-4670-8714-9CBC8EDA9C54
NetworkAdapterName     : Ethernet0
Policies               : {}
Resources              : @{AdditionalParams=; AllocationOrder=2; Allocators=System.Object[]; Health=; ID=F6566D6A-7468-46CF-B33B-E0F824681E66; PortOperationTime=0; State=1; SwitchOperationTime=0; VfpOperationTime=0; parentId=72DD24AA-E676-42B9-B991-01461764C08B}
State                  : 1
Subnets                : {@{AdditionalParams=; AddressPrefix=172.24.192.0/20; GatewayAddress=172.24.192.1; Health=; ID=37668DAD-A2B6-43D1-8FE1-B953CFD1A681; Policies=System.Object[]; State=0}}
TotalEndpoints         : 260
Type                   : NAT
Version                : 38654705669

I had to change the test image because I already had this one pulled. I am able to access the network fine, it's the second RUN command in a build that fails.

PS C:\Windows\system32> ctr run --rm --cni mcr.microsoft.com/dotnet/framework/runtime:4.8-windowsservercore-ltsc2019 cni-test curl -I example.com
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
HTTP/1.1 200 OK
Content-Type: text/html
ETag: "84238dfc8092e5d9c0dac8ef93371a07:1736799080.121134"
Last-Modified: Mon, 13 Jan 2025 20:11:20 GMT
Cache-Control: max-age=1805
Date: Fri, 24 Jan 2025 14:05:07 GMT
Connection: keep-alive

@nerddtvg
Copy link
Author

@profnandaa - I also sent you a message on the Docker Community Slack that might help.

@profnandaa
Copy link
Collaborator

From your ctr run --cni run, seems to me there no issue with your CNI settings.

Fixed my CNI setup issues, I can repro this now. Let work on this on Monday. will respond to you on Slack and we can post back here the key findings for others.

My dockerfile:

FROM mcr.microsoft.com/windows/nanoserver:ltsc2019

RUN curl.exe -I example.com
RUN curl.exe -I bing.com
RUN curl.exe example.com

Build log:

#5 [2/4] RUN curl.exe -I example.com
#5 1.692   % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
#5 1.692                                  Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
#5 1.910 HTTP/1.1 200 OK
#5 1.910 Cache-Control: max-age=792
#5 1.910 Content-Type: text/html
#5 1.910 Date: Sat, 25 Jan 2025 16:59:48 GMT
#5 1.910 Etag: "84238dfc8092e5d9c0dac8ef93371a07:1736799080.121134"
#5 1.910 Last-Modified: Mon, 13 Jan 2025 20:11:20 GMT
#5 1.910
#5 DONE 2.4s

#6 [3/4] RUN curl.exe -I bing.com
#6 ERROR: process "cmd /S /C curl.exe -I bing.com" did not complete successfully: failed to create shim task: hcs::CreateComputeSystem z7ii51kg6icl3t9ota7se86my: The requested operation for attach namespace failed.
------
 > [3/4] RUN curl.exe -I bing.com:
------
Dockerfile:4
--------------------
   2 |
   3 |     RUN curl.exe -I example.com
   4 | >>> RUN curl.exe -I bing.com
   5 |     RUN curl.exe example.com
   6 |
--------------------
error: failed to solve: process "cmd /S /C curl.exe -I bing.com" did not complete successfully: failed to create shim task: hcs::CreateComputeSystem z7ii51kg6icl3t9ota7se86my: The requested operation for attach namespace failed.

@ehan701
Copy link

ehan701 commented Jan 25, 2025

Spam

@profnandaa
Copy link
Collaborator

cross-linking with an old issue here - containerd/containerd#5729

@profnandaa
Copy link
Collaborator

Another one - microsoft/hcsshim#1822

profnandaa added a commit to profnandaa/buildkit that referenced this issue Jan 28, 2025
There is an issue with `HnsEndpoints` on WS2019, after first time
use, they can't be re-attached. See moby#5668. HCS CreateComputeSystem
fails with:

```
The requested operation for attach namespace failed.
```

As a work-around, have an option for PoolSize=-1 to turn off the
namespace pooling functionality and have each namespace only be used
once and discarded.

To enable this, you run buildkit with the additional flag:
`--containerd-cni-pool-size=-1`.

Fixes moby#5668

Signed-off-by: Anthony Nandaa <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants