============================= test session starts ============================== platform linux -- Python 3.9.21, pytest-6.2.5, py-1.11.0, pluggy-0.13.1 rootdir: /home/jenkins/mindspore/testcases/testcases/tests/st/backend_ascend/debug, configfile: ../../../../../../../sault/virtual_test/virtualenv_002/sault/config/pytest.ini plugins: forked-1.6.0, hydra-core-1.3.2, xdist-1.32.0, anyio-4.9.0 collected 1 item test_interface_hccl.py /home/jenkins/.local/lib/python3.9/site-packages/numpy/core/getlimits.py:549: UserWarning: The value of the smallest subnormal for type is zero. setattr(self, word, getattr(machar, word).flat[0]) /home/jenkins/.local/lib/python3.9/site-packages/numpy/core/getlimits.py:89: UserWarning: The value of the smallest subnormal for type is zero. return self._float_to_str(self.smallest_subnormal) /home/jenkins/.local/lib/python3.9/site-packages/numpy/core/getlimits.py:549: UserWarning: The value of the smallest subnormal for type is zero. setattr(self, word, getattr(machar, word).flat[0]) /home/jenkins/.local/lib/python3.9/site-packages/numpy/core/getlimits.py:89: UserWarning: The value of the smallest subnormal for type is zero. return self._float_to_str(self.smallest_subnormal) [WARNING] ME(1320078:281473043328704,MainProcess):2025-07-15-11:25:15.485.41 [mindspore/parallel/cluster/process_entity/_api.py:240] mindx is not installed, using original mindspore recovery strategy.: No module named 'taskd' Start worker process with rank id:0, log file:./logs/worker_0.log. Environment variable [RANK_ID=0] is exported. Start worker process with rank id:1, log file:./logs/worker_1.log. Environment variable [RANK_ID=1] is exported. Start worker process with rank id:2, log file:./logs/worker_2.log. Environment variable [RANK_ID=2] is exported. Start worker process with rank id:3, log file:./logs/worker_3.log. Environment variable [RANK_ID=3] is exported. Start worker process with rank id:4, log file:./logs/worker_4.log. Environment variable [RANK_ID=4] is exported. Start worker process with rank id:5, log file:./logs/worker_5.log. Environment variable [RANK_ID=5] is exported. Start worker process with rank id:6, log file:./logs/worker_6.log. Environment variable [RANK_ID=6] is exported. Start worker process with rank id:7, log file:./logs/worker_7.log. Environment variable [RANK_ID=7] is exported. [WARNING] ME(1320078:281473043328704,MainProcess):2025-07-15-11:25:15.725.775 [mindspore/parallel/cluster/process_entity/_api.py:267] Distributed job is spawned. Waiting all processes to exit... /home/jenkins/.local/lib/python3.9/site-packages/numpy/core/getlimits.py:549: UserWarning: The value of the smallest subnormal for type is zero. setattr(self, word, getattr(machar, word).flat[0]) /home/jenkins/.local/lib/python3.9/site-packages/numpy/core/getlimits.py:89: UserWarning: The value of the smallest subnormal for type is zero. return self._float_to_str(self.smallest_subnormal) /home/jenkins/.local/lib/python3.9/site-packages/numpy/core/getlimits.py:549: UserWarning: The value of the smallest subnormal for type is zero. setattr(self, word, getattr(machar, word).flat[0]) /home/jenkins/.local/lib/python3.9/site-packages/numpy/core/getlimits.py:89: UserWarning: The value of the smallest subnormal for type is zero. return self._float_to_str(self.smallest_subnormal) /home/jenkins/.local/lib/python3.9/site-packages/numpy/core/getlimits.py:549: UserWarning: The value of the smallest subnormal for type is zero. setattr(self, word, getattr(machar, word).flat[0]) /home/jenkins/.local/lib/python3.9/site-packages/numpy/core/getlimits.py:89: UserWarning: The value of the smallest subnormal for type is zero. return self._float_to_str(self.smallest_subnormal) /home/jenkins/.local/lib/python3.9/site-packages/numpy/core/getlimits.py:549: UserWarning: The value of the smallest subnormal for type is zero. setattr(self, word, getattr(machar, word).flat[0]) /home/jenkins/.local/lib/python3.9/site-packages/numpy/core/getlimits.py:89: UserWarning: The value of the smallest subnormal for type is zero. return self._float_to_str(self.smallest_subnormal) /home/jenkins/.local/lib/python3.9/site-packages/numpy/core/getlimits.py:549: UserWarning: The value of the smallest subnormal for type is zero. setattr(self, word, getattr(machar, word).flat[0]) /home/jenkins/.local/lib/python3.9/site-packages/numpy/core/getlimits.py:89: UserWarning: The value of the smallest subnormal for type is zero. return self._float_to_str(self.smallest_subnormal) /home/jenkins/.local/lib/python3.9/site-packages/numpy/core/getlimits.py:549: UserWarning: The value of the smallest subnormal for type is zero. setattr(self, word, getattr(machar, word).flat[0]) /home/jenkins/.local/lib/python3.9/site-packages/numpy/core/getlimits.py:89: UserWarning: The value of the smallest subnormal for type is zero. return self._float_to_str(self.smallest_subnormal) /home/jenkins/.local/lib/python3.9/site-packages/numpy/core/getlimits.py:549: UserWarning: The value of the smallest subnormal for type is zero. setattr(self, word, getattr(machar, word).flat[0]) /home/jenkins/.local/lib/python3.9/site-packages/numpy/core/getlimits.py:89: UserWarning: The value of the smallest subnormal for type is zero. return self._float_to_str(self.smallest_subnormal) /home/jenkins/.local/lib/python3.9/site-packages/numpy/core/getlimits.py:549: UserWarning: The value of the smallest subnormal for type is zero. setattr(self, word, getattr(machar, word).flat[0]) /home/jenkins/.local/lib/python3.9/site-packages/numpy/core/getlimits.py:89: UserWarning: The value of the smallest subnormal for type is zero. return self._float_to_str(self.smallest_subnormal) /home/jenkins/.local/lib/python3.9/site-packages/numpy/core/getlimits.py:549: UserWarning: The value of the smallest subnormal for type is zero. setattr(self, word, getattr(machar, word).flat[0]) /home/jenkins/.local/lib/python3.9/site-packages/numpy/core/getlimits.py:89: UserWarning: The value of the smallest subnormal for type is zero. return self._float_to_str(self.smallest_subnormal) /home/jenkins/.local/lib/python3.9/site-packages/numpy/core/getlimits.py:549: UserWarning: The value of the smallest subnormal for type is zero. setattr(self, word, getattr(machar, word).flat[0]) /home/jenkins/.local/lib/python3.9/site-packages/numpy/core/getlimits.py:89: UserWarning: The value of the smallest subnormal for type is zero. return self._float_to_str(self.smallest_subnormal) /home/jenkins/.local/lib/python3.9/site-packages/numpy/core/getlimits.py:549: UserWarning: The value of the smallest subnormal for type is zero. setattr(self, word, getattr(machar, word).flat[0]) /home/jenkins/.local/lib/python3.9/site-packages/numpy/core/getlimits.py:89: UserWarning: The value of the smallest subnormal for type is zero. return self._float_to_str(self.smallest_subnormal) /home/jenkins/.local/lib/python3.9/site-packages/numpy/core/getlimits.py:549: UserWarning: The value of the smallest subnormal for type is zero. setattr(self, word, getattr(machar, word).flat[0]) /home/jenkins/.local/lib/python3.9/site-packages/numpy/core/getlimits.py:89: UserWarning: The value of the smallest subnormal for type is zero. return self._float_to_str(self.smallest_subnormal) /home/jenkins/.local/lib/python3.9/site-packages/numpy/core/getlimits.py:549: UserWarning: The value of the smallest subnormal for type is zero. setattr(self, word, getattr(machar, word).flat[0]) /home/jenkins/.local/lib/python3.9/site-packages/numpy/core/getlimits.py:89: UserWarning: The value of the smallest subnormal for type is zero. return self._float_to_str(self.smallest_subnormal) /home/jenkins/.local/lib/python3.9/site-packages/numpy/core/getlimits.py:549: UserWarning: The value of the smallest subnormal for type is zero. setattr(self, word, getattr(machar, word).flat[0]) /home/jenkins/.local/lib/python3.9/site-packages/numpy/core/getlimits.py:89: UserWarning: The value of the smallest subnormal for type is zero. return self._float_to_str(self.smallest_subnormal) /home/jenkins/.local/lib/python3.9/site-packages/numpy/core/getlimits.py:549: UserWarning: The value of the smallest subnormal for type is zero. setattr(self, word, getattr(machar, word).flat[0]) /home/jenkins/.local/lib/python3.9/site-packages/numpy/core/getlimits.py:89: UserWarning: The value of the smallest subnormal for type is zero. return self._float_to_str(self.smallest_subnormal) /home/jenkins/.local/lib/python3.9/site-packages/numpy/core/getlimits.py:549: UserWarning: The value of the smallest subnormal for type is zero. setattr(self, word, getattr(machar, word).flat[0]) /home/jenkins/.local/lib/python3.9/site-packages/numpy/core/getlimits.py:89: UserWarning: The value of the smallest subnormal for type is zero. return self._float_to_str(self.smallest_subnormal) [WARNING] ME(1321666:281473809641152,MainProcess):2025-07-15-11:25:20.337.39 [mindspore/context.py:1412] For 'context.set_context', the parameter 'device_target' will be deprecated and removed in a future version. Please use the api mindspore.set_device() instead. [WARNING] ME(1321666:281473809641152,MainProcess):2025-07-15-11:25:20.345.63 [mindspore/context.py:1412] For 'context.set_context', the parameter 'ascend_config' will be deprecated and removed in a future version. Please use the api mindspore.device_context.ascend.op_precision.precision_mode(), mindspore.device_context.ascend.op_precision.op_precision_mode(), mindspore.device_context.ascend.op_precision.matmul_allow_hf32(), mindspore.device_context.ascend.op_precision.conv_allow_hf32(), mindspore.device_context.ascend.op_tuning.op_compile() instead. [WARNING] DISTRIBUTED(1321666,ffffba6feec0,python):2025-07-15-11:25:20.036.700 [mindspore/ccsrc/distributed/rpc/tcp/tcp_comm.cc:485] Connect] Connection 19 source: 127.0.0.1:45054, destination: 127.0.0.1:10971 [WARNING] DISTRIBUTED(1321666,ffffba6feec0,python):2025-07-15-11:25:20.036.772 [mindspore/ccsrc/distributed/rpc/tcp/tcp_client.cc:76] Connect] Failed to connect to the tcp server : 127.0.0.1:10971, retry to reconnect(1/1)... [WARNING] ME(1321679:281473029893824,MainProcess):2025-07-15-11:25:20.233.310 [mindspore/context.py:1412] For 'context.set_context', the parameter 'device_target' will be deprecated and removed in a future version. Please use the api mindspore.set_device() instead. [WARNING] ME(1321679:281473029893824,MainProcess):2025-07-15-11:25:20.234.103 [mindspore/context.py:1412] For 'context.set_context', the parameter 'ascend_config' will be deprecated and removed in a future version. Please use the api mindspore.device_context.ascend.op_precision.precision_mode(), mindspore.device_context.ascend.op_precision.op_precision_mode(), mindspore.device_context.ascend.op_precision.matmul_allow_hf32(), mindspore.device_context.ascend.op_precision.conv_allow_hf32(), mindspore.device_context.ascend.op_tuning.op_compile() instead. [WARNING] DISTRIBUTED(1321679,ffff24efefa0,python):2025-07-15-11:25:20.236.329 [mindspore/ccsrc/distributed/rpc/tcp/tcp_comm.cc:79] ConnectedEventHandler] Connection from 127.0.0.1:45056 to 127.0.0.1:10971 is successfully created. System errno: Success [WARNING] DISTRIBUTED(1321679,ffff8bf5eec0,python):2025-07-15-11:25:20.236.329 [mindspore/ccsrc/distributed/rpc/tcp/tcp_comm.cc:485] Connect] Connection 19 source: 127.0.0.1:45056, destination: 127.0.0.1:10971 [WARNING] DISTRIBUTED(1321679,ffff8bf5eec0,python):2025-07-15-11:25:20.236.535 [mindspore/ccsrc/distributed/rpc/tcp/tcp_comm.cc:485] Connect] Connection 20 source: 127.0.0.1:45058, destination: 127.0.0.1:10971 [WARNING] DISTRIBUTED(1321679,ffff25f1efa0,python):2025-07-15-11:25:20.236.567 [mindspore/ccsrc/distributed/rpc/tcp/tcp_comm.cc:79] ConnectedEventHandler] Connection from 127.0.0.1:45058 to 127.0.0.1:10971 is successfully created. System errno: Success [WARNING] DISTRIBUTED(1321679,ffff8bf5eec0,python):2025-07-15-11:25:20.236.573 [mindspore/ccsrc/distributed/rpc/tcp/tcp_comm.cc:494] Connect] Waiting for the state of the connection to 127.0.0.1:10971 to be connected...Retry number: 1 [WARNING] ME(1321727:281473284173504,MainProcess):2025-07-15-11:25:20.314.087 [mindspore/context.py:1412] For 'context.set_context', the parameter 'device_target' will be deprecated and removed in a future version. Please use the api mindspore.set_device() instead. [WARNING] ME(1321727:281473284173504,MainProcess):2025-07-15-11:25:20.314.956 [mindspore/context.py:1412] For 'context.set_context', the parameter 'ascend_config' will be deprecated and removed in a future version. Please use the api mindspore.device_context.ascend.op_precision.precision_mode(), mindspore.device_context.ascend.op_precision.op_precision_mode(), mindspore.device_context.ascend.op_precision.matmul_allow_hf32(), mindspore.device_context.ascend.op_precision.conv_allow_hf32(), mindspore.device_context.ascend.op_tuning.op_compile() instead. [WARNING] DISTRIBUTED(1321727,ffff9b1deec0,python):2025-07-15-11:25:20.317.255 [mindspore/ccsrc/distributed/rpc/tcp/tcp_comm.cc:485] Connect] Connection 19 source: 127.0.0.1:45068, destination: 127.0.0.1:10971 [WARNING] DISTRIBUTED(1321727,ffff2fffefa0,python):2025-07-15-11:25:20.317.264 [mindspore/ccsrc/distributed/rpc/tcp/tcp_comm.cc:79] ConnectedEventHandler] Connection from 127.0.0.1:45068 to 127.0.0.1:10971 is successfully created. System errno: Success [WARNING] DISTRIBUTED(1321727,ffff9b1deec0,python):2025-07-15-11:25:20.317.328 [mindspore/ccsrc/distributed/rpc/tcp/tcp_comm.cc:494] Connect] Waiting for the state of the connection to 127.0.0.1:10971 to be connected...Retry number: 1 [WARNING] ME(1321754:281473578692288,MainProcess):2025-07-15-11:25:20.516.663 [mindspore/context.py:1412] For 'context.set_context', the parameter 'device_target' will be deprecated and removed in a future version. Please use the api mindspore.set_device() instead. [WARNING] ME(1321754:281473578692288,MainProcess):2025-07-15-11:25:20.517.570 [mindspore/context.py:1412] For 'context.set_context', the parameter 'ascend_config' will be deprecated and removed in a future version. Please use the api mindspore.device_context.ascend.op_precision.precision_mode(), mindspore.device_context.ascend.op_precision.op_precision_mode(), mindspore.device_context.ascend.op_precision.matmul_allow_hf32(), mindspore.device_context.ascend.op_precision.conv_allow_hf32(), mindspore.device_context.ascend.op_tuning.op_compile() instead. [WARNING] DISTRIBUTED(1321754,ffff45a3efa0,python):2025-07-15-11:25:20.519.906 [mindspore/ccsrc/distributed/rpc/tcp/tcp_comm.cc:79] ConnectedEventHandler] Connection from 127.0.0.1:45084 to 127.0.0.1:10971 is successfully created. System errno: Success [WARNING] DISTRIBUTED(1321754,ffffacabeec0,python):2025-07-15-11:25:20.519.896 [mindspore/ccsrc/distributed/rpc/tcp/tcp_comm.cc:485] Connect] Connection 19 source: 127.0.0.1:45084, destination: 127.0.0.1:10971 [WARNING] DISTRIBUTED(1321754,ffffacabeec0,python):2025-07-15-11:25:20.520.150 [mindspore/ccsrc/distributed/rpc/tcp/tcp_comm.cc:485] Connect] Connection 20 source: 127.0.0.1:45092, destination: 127.0.0.1:10971 [WARNING] DISTRIBUTED(1321754,ffff46a5efa0,python):2025-07-15-11:25:20.520.185 [mindspore/ccsrc/distributed/rpc/tcp/tcp_comm.cc:79] ConnectedEventHandler] Connection from 127.0.0.1:45092 to 127.0.0.1:10971 is successfully created. System errno: Success [WARNING] DISTRIBUTED(1321754,ffffacabeec0,python):2025-07-15-11:25:20.520.198 [mindspore/ccsrc/distributed/rpc/tcp/tcp_comm.cc:494] Connect] Waiting for the state of the connection to 127.0.0.1:10971 to be connected...Retry number: 1 [WARNING] DISTRIBUTED(1321666,ffffba6feec0,python):2025-07-15-11:25:20.536.916 [mindspore/ccsrc/distributed/cluster/topology/compute_graph_node.cc:173] Register] Failed to connect to the meta server node url: 127.0.0.1:10971 [WARNING] DISTRIBUTED(1321666,ffffba6feec0,python):2025-07-15-11:25:20.536.996 [mindspore/ccsrc/distributed/cluster/topology/compute_graph_node.cc:363] ReconnectWithTimeoutWindow] Failed to register and try to reconnect to the meta server. [WARNING] ME(1321743:281473034809024,MainProcess):2025-07-15-11:25:20.576.326 [mindspore/context.py:1412] For 'context.set_context', the parameter 'device_target' will be deprecated and removed in a future version. Please use the api mindspore.set_device() instead. [WARNING] ME(1321743:281473034809024,MainProcess):2025-07-15-11:25:20.577.310 [mindspore/context.py:1412] For 'context.set_context', the parameter 'ascend_config' will be deprecated and removed in a future version. Please use the api mindspore.device_context.ascend.op_precision.precision_mode(), mindspore.device_context.ascend.op_precision.op_precision_mode(), mindspore.device_context.ascend.op_precision.matmul_allow_hf32(), mindspore.device_context.ascend.op_precision.conv_allow_hf32(), mindspore.device_context.ascend.op_tuning.op_compile() instead. [WARNING] DISTRIBUTED(1321743,ffff8c40eec0,python):2025-07-15-11:25:20.579.792 [mindspore/ccsrc/distributed/rpc/tcp/tcp_comm.cc:485] Connect] Connection 19 source: 127.0.0.1:45094, destination: 127.0.0.1:10971 [WARNING] DISTRIBUTED(1321743,ffff253befa0,python):2025-07-15-11:25:20.579.799 [mindspore/ccsrc/distributed/rpc/tcp/tcp_comm.cc:79] ConnectedEventHandler] Connection from 127.0.0.1:45094 to 127.0.0.1:10971 is successfully created. System errno: Success [WARNING] DISTRIBUTED(1321743,ffff8c40eec0,python):2025-07-15-11:25:20.579.867 [mindspore/ccsrc/distributed/rpc/tcp/tcp_comm.cc:494] Connect] Waiting for the state of the connection to 127.0.0.1:10971 to be connected...Retry number: 1 [WARNING] DISTRIBUTED(1321679,ffff8bf5eec0,python):2025-07-15-11:25:20.737.459 [mindspore/ccsrc/distributed/cluster/cluster_context.cc:246] BuildCluster] Topology build timed out., retry(1/1200). [WARNING] DISTRIBUTED(1321727,ffff9b1deec0,python):2025-07-15-11:25:20.817.633 [mindspore/ccsrc/distributed/rpc/tcp/tcp_comm.cc:485] Connect] Connection 20 source: 127.0.0.1:45096, destination: 127.0.0.1:10971 [WARNING] DISTRIBUTED(1321727,ffff351aefa0,python):2025-07-15-11:25:20.817.646 [mindspore/ccsrc/distributed/rpc/tcp/tcp_comm.cc:79] ConnectedEventHandler] Connection from 127.0.0.1:45096 to 127.0.0.1:10971 is successfully created. System errno: Success [WARNING] DISTRIBUTED(1321727,ffff9b1deec0,python):2025-07-15-11:25:20.817.690 [mindspore/ccsrc/distributed/rpc/tcp/tcp_comm.cc:494] Connect] Waiting for the state of the connection to 127.0.0.1:10971 to be connected...Retry number: 2 [WARNING] ME(1321791:281473328934592,MainProcess):2025-07-15-11:25:20.908.520 [mindspore/context.py:1412] For 'context.set_context', the parameter 'device_target' will be deprecated and removed in a future version. Please use the api mindspore.set_device() instead. [WARNING] ME(1321791:281473328934592,MainProcess):2025-07-15-11:25:20.909.394 [mindspore/context.py:1412] For 'context.set_context', the parameter 'ascend_config' will be deprecated and removed in a future version. Please use the api mindspore.device_context.ascend.op_precision.precision_mode(), mindspore.device_context.ascend.op_precision.op_precision_mode(), mindspore.device_context.ascend.op_precision.matmul_allow_hf32(), mindspore.device_context.ascend.op_precision.conv_allow_hf32(), mindspore.device_context.ascend.op_tuning.op_compile() instead. [WARNING] DISTRIBUTED(1321791,ffff9dc8eec0,python):2025-07-15-11:25:20.911.469 [mindspore/ccsrc/distributed/rpc/tcp/tcp_comm.cc:485] Connect] Connection 19 source: 127.0.0.1:45112, destination: 127.0.0.1:10971 [WARNING] DISTRIBUTED(1321791,ffff36c3efa0,python):2025-07-15-11:25:20.911.469 [mindspore/ccsrc/distributed/rpc/tcp/tcp_comm.cc:79] ConnectedEventHandler] Connection from 127.0.0.1:45112 to 127.0.0.1:10971 is successfully created. System errno: Success [WARNING] DISTRIBUTED(1321791,ffff9dc8eec0,python):2025-07-15-11:25:20.911.539 [mindspore/ccsrc/distributed/rpc/tcp/tcp_comm.cc:494] Connect] Waiting for the state of the connection to 127.0.0.1:10971 to be connected...Retry number: 1 [WARNING] ME(1321816:281473315434176,MainProcess):2025-07-15-11:25:20.928.016 [mindspore/context.py:1412] For 'context.set_context', the parameter 'device_target' will be deprecated and removed in a future version. Please use the api mindspore.set_device() instead. [WARNING] ME(1321816:281473315434176,MainProcess):2025-07-15-11:25:20.928.836 [mindspore/context.py:1412] For 'context.set_context', the parameter 'ascend_config' will be deprecated and removed in a future version. Please use the api mindspore.device_context.ascend.op_precision.precision_mode(), mindspore.device_context.ascend.op_precision.op_precision_mode(), mindspore.device_context.ascend.op_precision.matmul_allow_hf32(), mindspore.device_context.ascend.op_precision.conv_allow_hf32(), mindspore.device_context.ascend.op_tuning.op_compile() instead. [WARNING] DISTRIBUTED(1321816,ffff9cfaeec0,python):2025-07-15-11:25:20.930.996 [mindspore/ccsrc/distributed/rpc/tcp/tcp_comm.cc:485] Connect] Connection 19 source: 127.0.0.1:45128, destination: 127.0.0.1:10971 [WARNING] DISTRIBUTED(1321816,ffff35f4efa0,python):2025-07-15-11:25:20.930.996 [mindspore/ccsrc/distributed/rpc/tcp/tcp_comm.cc:79] ConnectedEventHandler] Connection from 127.0.0.1:45128 to 127.0.0.1:10971 is successfully created. System errno: Success [WARNING] DISTRIBUTED(1321816,ffff9cfaeec0,python):2025-07-15-11:25:20.931.074 [mindspore/ccsrc/distributed/rpc/tcp/tcp_comm.cc:494] Connect] Waiting for the state of the connection to 127.0.0.1:10971 to be connected...Retry number: 1 [WARNING] DISTRIBUTED(1321754,ffffacabeec0,python):2025-07-15-11:25:21.021.076 [mindspore/ccsrc/distributed/cluster/cluster_context.cc:246] BuildCluster] Topology build timed out., retry(1/1200). [WARNING] DISTRIBUTED(1321666,ffffba6feec0,python):2025-07-15-11:25:21.037.342 [mindspore/ccsrc/distributed/rpc/tcp/tcp_comm.cc:485] Connect] Connection 20 source: 127.0.0.1:45134, destination: 127.0.0.1:10971 [WARNING] DISTRIBUTED(1321666,ffff546aefa0,python):2025-07-15-11:25:21.037.380 [mindspore/ccsrc/distributed/rpc/tcp/tcp_comm.cc:79] ConnectedEventHandler] Connection from 127.0.0.1:45134 to 127.0.0.1:10971 is successfully created. System errno: Success [WARNING] DISTRIBUTED(1321666,ffffba6feec0,python):2025-07-15-11:25:21.037.395 [mindspore/ccsrc/distributed/rpc/tcp/tcp_comm.cc:494] Connect] Waiting for the state of the connection to 127.0.0.1:10971 to be connected...Retry number: 1 [WARNING] ME(1321853:281473222438592,MainProcess):2025-07-15-11:25:21.384.64 [mindspore/context.py:1412] For 'context.set_context', the parameter 'device_target' will be deprecated and removed in a future version. Please use the api mindspore.set_device() instead. [WARNING] ME(1321853:281473222438592,MainProcess):2025-07-15-11:25:21.392.54 [mindspore/context.py:1412] For 'context.set_context', the parameter 'ascend_config' will be deprecated and removed in a future version. Please use the api mindspore.device_context.ascend.op_precision.precision_mode(), mindspore.device_context.ascend.op_precision.op_precision_mode(), mindspore.device_context.ascend.op_precision.matmul_allow_hf32(), mindspore.device_context.ascend.op_precision.conv_allow_hf32(), mindspore.device_context.ascend.op_tuning.op_compile() instead. [WARNING] DISTRIBUTED(1321853,ffff976feec0,python):2025-07-15-11:25:21.041.346 [mindspore/ccsrc/distributed/rpc/tcp/tcp_comm.cc:485] Connect] Connection 19 source: 127.0.0.1:45136, destination: 127.0.0.1:10971 [WARNING] DISTRIBUTED(1321853,ffff2bffefa0,python):2025-07-15-11:25:21.041.336 [mindspore/ccsrc/distributed/rpc/tcp/tcp_comm.cc:79] ConnectedEventHandler] Connection from 127.0.0.1:45136 to 127.0.0.1:10971 is successfully created. System errno: Success [WARNING] DISTRIBUTED(1321853,ffff976feec0,python):2025-07-15-11:25:21.041.401 [mindspore/ccsrc/distributed/rpc/tcp/tcp_comm.cc:494] Connect] Waiting for the state of the connection to 127.0.0.1:10971 to be connected...Retry number: 1 [WARNING] DISTRIBUTED(1321743,ffff8c40eec0,python):2025-07-15-11:25:21.080.221 [mindspore/ccsrc/distributed/rpc/tcp/tcp_comm.cc:485] Connect] Connection 20 source: 127.0.0.1:45146, destination: 127.0.0.1:10971 [WARNING] DISTRIBUTED(1321743,ffff8c40eec0,python):2025-07-15-11:25:21.080.289 [mindspore/ccsrc/distributed/rpc/tcp/tcp_comm.cc:494] Connect] Waiting for the state of the connection to 127.0.0.1:10971 to be connected...Retry number: 2 [WARNING] DISTRIBUTED(1321743,ffff263defa0,python):2025-07-15-11:25:21.080.392 [mindspore/ccsrc/distributed/rpc/tcp/tcp_comm.cc:79] ConnectedEventHandler] Connection from 127.0.0.1:45146 to 127.0.0.1:10971 is successfully created. System errno: Success [WARNING] DISTRIBUTED(1321679,ffff8bf5eec0,python):2025-07-15-11:25:21.237.587 [mindspore/ccsrc/distributed/cluster/cluster_context.cc:246] BuildCluster] Topology build timed out., retry(2/1200). [WARNING] DISTRIBUTED(1321727,ffff9b1deec0,python):2025-07-15-11:25:21.318.335 [mindspore/ccsrc/distributed/cluster/cluster_context.cc:246] BuildCluster] Topology build timed out., retry(1/1200). [WARNING] DISTRIBUTED(1321791,ffff9dc8eec0,python):2025-07-15-11:25:21.411.855 [mindspore/ccsrc/distributed/rpc/tcp/tcp_comm.cc:485] Connect] Connection 20 source: 127.0.0.1:45152, destination: 127.0.0.1:10971 [WARNING] DISTRIBUTED(1321791,ffff37c5efa0,python):2025-07-15-11:25:21.411.880 [mindspore/ccsrc/distributed/rpc/tcp/tcp_comm.cc:79] ConnectedEventHandler] Connection from 127.0.0.1:45152 to 127.0.0.1:10971 is successfully created. System errno: Success [WARNING] DISTRIBUTED(1321791,ffff9dc8eec0,python):2025-07-15-11:25:21.411.906 [mindspore/ccsrc/distributed/rpc/tcp/tcp_comm.cc:494] Connect] Waiting for the state of the connection to 127.0.0.1:10971 to be connected...Retry number: 2 [WARNING] DISTRIBUTED(1321816,ffff9cfaeec0,python):2025-07-15-11:25:21.431.319 [mindspore/ccsrc/distributed/rpc/tcp/tcp_comm.cc:485] Connect] Connection 20 source: 127.0.0.1:45156, destination: 127.0.0.1:10971 [WARNING] DISTRIBUTED(1321816,ffff36f6efa0,python):2025-07-15-11:25:21.431.355 [mindspore/ccsrc/distributed/rpc/tcp/tcp_comm.cc:79] ConnectedEventHandler] Connection from 127.0.0.1:45156 to 127.0.0.1:10971 is successfully created. System errno: Success [WARNING] DISTRIBUTED(1321816,ffff9cfaeec0,python):2025-07-15-11:25:21.431.362 [mindspore/ccsrc/distributed/rpc/tcp/tcp_comm.cc:494] Connect] Waiting for the state of the connection to 127.0.0.1:10971 to be connected...Retry number: 2 [WARNING] DISTRIBUTED(1321754,ffffacabeec0,python):2025-07-15-11:25:21.521.193 [mindspore/ccsrc/distributed/cluster/cluster_context.cc:246] BuildCluster] Topology build timed out., retry(2/1200). [WARNING] DISTRIBUTED(1321666,ffffba6feec0,python):2025-07-15-11:25:21.537.619 [mindspore/ccsrc/distributed/rpc/tcp/tcp_comm.cc:485] Connect] Connection 21 source: 127.0.0.1:45158, destination: 127.0.0.1:10971 [WARNING] DISTRIBUTED(1321666,ffffba6feec0,python):2025-07-15-11:25:21.537.659 [mindspore/ccsrc/distributed/rpc/tcp/tcp_comm.cc:494] Connect] Waiting for the state of the connection to 127.0.0.1:10971 to be connected...Retry number: 2 [WARNING] DISTRIBUTED(1321666,ffff5368efa0,python):2025-07-15-11:25:21.537.683 [mindspore/ccsrc/distributed/rpc/tcp/tcp_comm.cc:79] ConnectedEventHandler] Connection from 127.0.0.1:45158 to 127.0.0.1:10971 is successfully created. System errno: Success [WARNING] DISTRIBUTED(1321853,ffff976feec0,python):2025-07-15-11:25:21.541.627 [mindspore/ccsrc/distributed/rpc/tcp/tcp_comm.cc:485] Connect] Connection 20 source: 127.0.0.1:45172, destination: 127.0.0.1:10971 [WARNING] DISTRIBUTED(1321853,ffff976feec0,python):2025-07-15-11:25:21.541.669 [mindspore/ccsrc/distributed/rpc/tcp/tcp_comm.cc:494] Connect] Waiting for the state of the connection to 127.0.0.1:10971 to be connected...Retry number: 2 [WARNING] DISTRIBUTED(1321853,ffff316aefa0,python):2025-07-15-11:25:21.541.685 [mindspore/ccsrc/distributed/rpc/tcp/tcp_comm.cc:79] ConnectedEventHandler] Connection from 127.0.0.1:45172 to 127.0.0.1:10971 is successfully created. System errno: Success [WARNING] DISTRIBUTED(1321743,ffff8c40eec0,python):2025-07-15-11:25:21.580.960 [mindspore/ccsrc/distributed/cluster/cluster_context.cc:246] BuildCluster] Topology build timed out., retry(1/1200). [WARNING] DISTRIBUTED(1321679,ffff8bf5eec0,python):2025-07-15-11:25:21.737.711 [mindspore/ccsrc/distributed/cluster/cluster_context.cc:246] BuildCluster] Topology build timed out., retry(3/1200). [WARNING] DISTRIBUTED(1321727,ffff9b1deec0,python):2025-07-15-11:25:21.818.467 [mindspore/ccsrc/distributed/cluster/cluster_context.cc:246] BuildCluster] Topology build timed out., retry(2/1200). [WARNING] DISTRIBUTED(1321791,ffff9dc8eec0,python):2025-07-15-11:25:21.912.406 [mindspore/ccsrc/distributed/cluster/cluster_context.cc:246] BuildCluster] Topology build timed out., retry(1/1200). [WARNING] DISTRIBUTED(1321816,ffff9cfaeec0,python):2025-07-15-11:25:21.931.871 [mindspore/ccsrc/distributed/cluster/cluster_context.cc:246] BuildCluster] Topology build timed out., retry(1/1200). [WARNING] DISTRIBUTED(1321754,ffffacabeec0,python):2025-07-15-11:25:22.021.295 [mindspore/ccsrc/distributed/cluster/cluster_context.cc:246] BuildCluster] Topology build timed out., retry(3/1200). [WARNING] DISTRIBUTED(1321666,ffffba6feec0,python):2025-07-15-11:25:22.038.096 [mindspore/ccsrc/distributed/cluster/cluster_context.cc:246] BuildCluster] Topology build timed out., retry(1/1200). [WARNING] DISTRIBUTED(1321853,ffff976feec0,python):2025-07-15-11:25:22.042.192 [mindspore/ccsrc/distributed/cluster/cluster_context.cc:246] BuildCluster] Topology build timed out., retry(1/1200). [WARNING] DISTRIBUTED(1321743,ffff8c40eec0,python):2025-07-15-11:25:22.081.086 [mindspore/ccsrc/distributed/cluster/cluster_context.cc:246] BuildCluster] Topology build timed out., retry(2/1200). [WARNING] DISTRIBUTED(1321679,ffff8bf5eec0,python):2025-07-15-11:25:22.237.820 [mindspore/ccsrc/distributed/cluster/cluster_context.cc:246] BuildCluster] Topology build timed out., retry(4/1200). [WARNING] DISTRIBUTED(1321727,ffff9b1deec0,python):2025-07-15-11:25:22.318.584 [mindspore/ccsrc/distributed/cluster/cluster_context.cc:246] BuildCluster] Topology build timed out., retry(3/1200). [WARNING] DISTRIBUTED(1321791,ffff9dc8eec0,python):2025-07-15-11:25:22.412.525 [mindspore/ccsrc/distributed/cluster/cluster_context.cc:246] BuildCluster] Topology build timed out., retry(2/1200). [WARNING] DISTRIBUTED(1321816,ffff9cfaeec0,python):2025-07-15-11:25:22.431.982 [mindspore/ccsrc/distributed/cluster/cluster_context.cc:246] BuildCluster] Topology build timed out., retry(2/1200). [WARNING] DISTRIBUTED(1321754,ffffacabeec0,python):2025-07-15-11:25:22.521.396 [mindspore/ccsrc/distributed/cluster/cluster_context.cc:246] BuildCluster] Topology build timed out., retry(4/1200). [WARNING] DISTRIBUTED(1321666,ffffba6feec0,python):2025-07-15-11:25:22.538.203 [mindspore/ccsrc/distributed/cluster/cluster_context.cc:246] BuildCluster] Topology build timed out., retry(2/1200). [MS_DEV_RUNTIME_CONF]Runtime config: async_init_comm:False [WARNING] DISTRIBUTED(1321853,ffff976feec0,python):2025-07-15-11:25:22.542.385 [mindspore/ccsrc/distributed/cluster/cluster_context.cc:249] BuildCluster] Cluster is successfully initialized. [WARNING] DISTRIBUTED(1321853,ffff976feec0,python):2025-07-15-11:25:22.542.443 [mindspore/ccsrc/distributed/cluster/cluster_context.cc:355] PostProcess] This node 7 rank id: 7 [MS_RUNTIME_PROF]The jit_level is: O0, and enable kernelbykernel executor. [MS_DEV_RUNTIME_CONF]Runtime config: async_init_comm:False [WARNING] DISTRIBUTED(1321743,ffff8c40eec0,python):2025-07-15-11:25:22.581.306 [mindspore/ccsrc/distributed/cluster/cluster_context.cc:249] BuildCluster] Cluster is successfully initialized. [WARNING] DISTRIBUTED(1321743,ffff8c40eec0,python):2025-07-15-11:25:22.581.350 [mindspore/ccsrc/distributed/cluster/cluster_context.cc:355] PostProcess] This node 3 rank id: 3 [MS_RUNTIME_PROF]The jit_level is: O0, and enable kernelbykernel executor. [MS_DEV_RUNTIME_CONF]Runtime config: async_init_comm:False [WARNING] DISTRIBUTED(1321679,ffff8bf5eec0,python):2025-07-15-11:25:22.738.028 [mindspore/ccsrc/distributed/cluster/cluster_context.cc:249] BuildCluster] Cluster is successfully initialized. [WARNING] DISTRIBUTED(1321679,ffff8bf5eec0,python):2025-07-15-11:25:22.738.088 [mindspore/ccsrc/distributed/cluster/cluster_context.cc:355] PostProcess] This node 1 rank id: 1 [MS_RUNTIME_PROF]The jit_level is: O0, and enable kernelbykernel executor. [MS_DEV_RUNTIME_CONF]Runtime config: async_init_comm:False [WARNING] DISTRIBUTED(1321727,ffff9b1deec0,python):2025-07-15-11:25:22.818.843 [mindspore/ccsrc/distributed/cluster/cluster_context.cc:249] BuildCluster] Cluster is successfully initialized. [WARNING] DISTRIBUTED(1321727,ffff9b1deec0,python):2025-07-15-11:25:22.818.920 [mindspore/ccsrc/distributed/cluster/cluster_context.cc:355] PostProcess] This node 2 rank id: 2 [MS_RUNTIME_PROF]The jit_level is: O0, and enable kernelbykernel executor. [MS_DEV_RUNTIME_CONF]Runtime config: async_init_comm:False [WARNING] DISTRIBUTED(1321791,ffff9dc8eec0,python):2025-07-15-11:25:22.912.817 [mindspore/ccsrc/distributed/cluster/cluster_context.cc:249] BuildCluster] Cluster is successfully initialized. [WARNING] DISTRIBUTED(1321791,ffff9dc8eec0,python):2025-07-15-11:25:22.912.873 [mindspore/ccsrc/distributed/cluster/cluster_context.cc:355] PostProcess] This node 5 rank id: 5 [MS_RUNTIME_PROF]The jit_level is: O0, and enable kernelbykernel executor. [MS_DEV_RUNTIME_CONF]Runtime config: async_init_comm:False [WARNING] DISTRIBUTED(1321816,ffff9cfaeec0,python):2025-07-15-11:25:22.932.210 [mindspore/ccsrc/distributed/cluster/cluster_context.cc:249] BuildCluster] Cluster is successfully initialized. [WARNING] DISTRIBUTED(1321816,ffff9cfaeec0,python):2025-07-15-11:25:22.932.252 [mindspore/ccsrc/distributed/cluster/cluster_context.cc:355] PostProcess] This node 6 rank id: 6 [MS_RUNTIME_PROF]The jit_level is: O0, and enable kernelbykernel executor. [MS_DEV_RUNTIME_CONF]Runtime config: async_init_comm:False [WARNING] DISTRIBUTED(1321754,ffffacabeec0,python):2025-07-15-11:25:23.021.604 [mindspore/ccsrc/distributed/cluster/cluster_context.cc:249] BuildCluster] Cluster is successfully initialized. [WARNING] DISTRIBUTED(1321754,ffffacabeec0,python):2025-07-15-11:25:23.021.662 [mindspore/ccsrc/distributed/cluster/cluster_context.cc:355] PostProcess] This node 4 rank id: 4 [MS_RUNTIME_PROF]The jit_level is: O0, and enable kernelbykernel executor. [MS_DEV_RUNTIME_CONF]Runtime config: async_init_comm:False [WARNING] DISTRIBUTED(1321666,ffffba6feec0,python):2025-07-15-11:25:23.038.446 [mindspore/ccsrc/distributed/cluster/cluster_context.cc:249] BuildCluster] Cluster is successfully initialized. [WARNING] DISTRIBUTED(1321666,ffffba6feec0,python):2025-07-15-11:25:23.038.499 [mindspore/ccsrc/distributed/cluster/cluster_context.cc:355] PostProcess] This node 0 rank id: 0 [MS_RUNTIME_PROF]The jit_level is: O0, and enable kernelbykernel executor. [WARNING] DISTRIBUTED(1321743,ffff8c40eec0,python):2025-07-15-11:25:24.349.903 [mindspore/ccsrc/distributed/collective/collective_manager.cc:341] CreateCommunicationGroup] Start to create communication group: hccl_world_group [const vector]{0, 1, 2, 3, 4, 5, 6, 7}, async: 0, submit_now: 1 [WARNING] DEVICE(1321743,fffecdc6efa0,python):2025-07-15-11:25:24.350.516 [mindspore/ccsrc/plugin/res_manager/ascend/collective/ascend_communication_group.cc:254] SetGlobalCommInfo] Start to SetGlobalCommInfo for hccl_world_group, master_ip:2130706433, master_port:10971, node_rank:2130706433, total_rank_size:8, local_rank_size8 [WARNING] HCCL_ADPT(1321743,fffecdc6efa0,python):2025-07-15-11:25:24.350.625 [mindspore/ccsrc/utils/dlopen_macro.h:165] DlsymAscend] Dynamically load symbol HcclSetGlobalCommInfo failed, result = /home/jenkins/anaconda3/envs/ci39/lib/python3.9/site-packages/mindspore/communication/../lib/plugin/ascend/libhccl_plugin.so: undefined symbol: HcclSetGlobalCommInfo [WARNING] HCCL_ADPT(1321743,fffecdc6efa0,python):2025-07-15-11:25:24.350.663 [mindspore/ccsrc/plugin/res_manager/ascend/hccl_adapter/hccl_adapter.cc:635] HcclSetGlobalCommInfo] Func HcclSetGlobalCommInfo is not supported in CANN package. [WARNING] DEVICE(1321743,fffecdc6efa0,python):2025-07-15-11:25:24.350.695 [mindspore/ccsrc/plugin/res_manager/ascend/collective/ascend_communication_group.cc:265] SetGlobalCommInfo] End to SetGlobalCommInfo for hccl_world_group [WARNING] DEVICE(1321743,fffecdc6efa0,python):2025-07-15-11:25:24.351.263 [mindspore/ccsrc/plugin/device/cpu/hal/hardware/ms_collective_comm_lib.cc:251] QueryUniqueID] Retry to lookup the unique id for group hccl_world_group from the meta server node...Retry time: 18446744073709551614/400, sleep 1 [WARNING] DISTRIBUTED(1321853,ffff976feec0,python):2025-07-15-11:25:24.374.389 [mindspore/ccsrc/distributed/collective/collective_manager.cc:341] CreateCommunicationGroup] Start to create communication group: hccl_world_group [const vector]{0, 1, 2, 3, 4, 5, 6, 7}, async: 0, submit_now: 1 [WARNING] DEVICE(1321853,fffed906efa0,python):2025-07-15-11:25:24.374.990 [mindspore/ccsrc/plugin/res_manager/ascend/collective/ascend_communication_group.cc:254] SetGlobalCommInfo] Start to SetGlobalCommInfo for hccl_world_group, master_ip:2130706433, master_port:10971, node_rank:2130706433, total_rank_size:8, local_rank_size8 [WARNING] HCCL_ADPT(1321853,fffed906efa0,python):2025-07-15-11:25:24.375.093 [mindspore/ccsrc/utils/dlopen_macro.h:165] DlsymAscend] Dynamically load symbol HcclSetGlobalCommInfo failed, result = /home/jenkins/anaconda3/envs/ci39/lib/python3.9/site-packages/mindspore/communication/../lib/plugin/ascend/libhccl_plugin.so: undefined symbol: HcclSetGlobalCommInfo [WARNING] HCCL_ADPT(1321853,fffed906efa0,python):2025-07-15-11:25:24.375.132 [mindspore/ccsrc/plugin/res_manager/ascend/hccl_adapter/hccl_adapter.cc:635] HcclSetGlobalCommInfo] Func HcclSetGlobalCommInfo is not supported in CANN package. [WARNING] DEVICE(1321853,fffed906efa0,python):2025-07-15-11:25:24.375.164 [mindspore/ccsrc/plugin/res_manager/ascend/collective/ascend_communication_group.cc:265] SetGlobalCommInfo] End to SetGlobalCommInfo for hccl_world_group [WARNING] DEVICE(1321853,fffed906efa0,python):2025-07-15-11:25:24.375.741 [mindspore/ccsrc/plugin/device/cpu/hal/hardware/ms_collective_comm_lib.cc:251] QueryUniqueID] Retry to lookup the unique id for group hccl_world_group from the meta server node...Retry time: 18446744073709551614/400, sleep 2 [WARNING] DISTRIBUTED(1321727,ffff9b1deec0,python):2025-07-15-11:25:24.572.141 [mindspore/ccsrc/distributed/collective/collective_manager.cc:341] CreateCommunicationGroup] Start to create communication group: hccl_world_group [const vector]{0, 1, 2, 3, 4, 5, 6, 7}, async: 0, submit_now: 1 [WARNING] DEVICE(1321727,fffedce0efa0,python):2025-07-15-11:25:24.572.794 [mindspore/ccsrc/plugin/res_manager/ascend/collective/ascend_communication_group.cc:254] SetGlobalCommInfo] Start to SetGlobalCommInfo for hccl_world_group, master_ip:2130706433, master_port:10971, node_rank:2130706433, total_rank_size:8, local_rank_size8 [WARNING] HCCL_ADPT(1321727,fffedce0efa0,python):2025-07-15-11:25:24.572.933 [mindspore/ccsrc/utils/dlopen_macro.h:165] DlsymAscend] Dynamically load symbol HcclSetGlobalCommInfo failed, result = /home/jenkins/anaconda3/envs/ci39/lib/python3.9/site-packages/mindspore/communication/../lib/plugin/ascend/libhccl_plugin.so: undefined symbol: HcclSetGlobalCommInfo [WARNING] HCCL_ADPT(1321727,fffedce0efa0,python):2025-07-15-11:25:24.572.971 [mindspore/ccsrc/plugin/res_manager/ascend/hccl_adapter/hccl_adapter.cc:635] HcclSetGlobalCommInfo] Func HcclSetGlobalCommInfo is not supported in CANN package. [WARNING] DEVICE(1321727,fffedce0efa0,python):2025-07-15-11:25:24.573.003 [mindspore/ccsrc/plugin/res_manager/ascend/collective/ascend_communication_group.cc:265] SetGlobalCommInfo] End to SetGlobalCommInfo for hccl_world_group [WARNING] DEVICE(1321727,fffedce0efa0,python):2025-07-15-11:25:24.573.662 [mindspore/ccsrc/plugin/device/cpu/hal/hardware/ms_collective_comm_lib.cc:251] QueryUniqueID] Retry to lookup the unique id for group hccl_world_group from the meta server node...Retry time: 18446744073709551614/400, sleep 1 [WARNING] DISTRIBUTED(1321816,ffff9cfaeec0,python):2025-07-15-11:25:24.724.588 [mindspore/ccsrc/distributed/collective/collective_manager.cc:341] CreateCommunicationGroup] Start to create communication group: hccl_world_group [const vector]{0, 1, 2, 3, 4, 5, 6, 7}, async: 0, submit_now: 1 [WARNING] DEVICE(1321816,fffede95efa0,python):2025-07-15-11:25:24.725.138 [mindspore/ccsrc/plugin/res_manager/ascend/collective/ascend_communication_group.cc:254] SetGlobalCommInfo] Start to SetGlobalCommInfo for hccl_world_group, master_ip:2130706433, master_port:10971, node_rank:2130706433, total_rank_size:8, local_rank_size8 [WARNING] HCCL_ADPT(1321816,fffede95efa0,python):2025-07-15-11:25:24.725.241 [mindspore/ccsrc/utils/dlopen_macro.h:165] DlsymAscend] Dynamically load symbol HcclSetGlobalCommInfo failed, result = /home/jenkins/anaconda3/envs/ci39/lib/python3.9/site-packages/mindspore/communication/../lib/plugin/ascend/libhccl_plugin.so: undefined symbol: HcclSetGlobalCommInfo [WARNING] HCCL_ADPT(1321816,fffede95efa0,python):2025-07-15-11:25:24.725.278 [mindspore/ccsrc/plugin/res_manager/ascend/hccl_adapter/hccl_adapter.cc:635] HcclSetGlobalCommInfo] Func HcclSetGlobalCommInfo is not supported in CANN package. [WARNING] DEVICE(1321816,fffede95efa0,python):2025-07-15-11:25:24.725.308 [mindspore/ccsrc/plugin/res_manager/ascend/collective/ascend_communication_group.cc:265] SetGlobalCommInfo] End to SetGlobalCommInfo for hccl_world_group [WARNING] DEVICE(1321816,fffede95efa0,python):2025-07-15-11:25:24.725.823 [mindspore/ccsrc/plugin/device/cpu/hal/hardware/ms_collective_comm_lib.cc:251] QueryUniqueID] Retry to lookup the unique id for group hccl_world_group from the meta server node...Retry time: 18446744073709551614/400, sleep 1 [WARNING] DISTRIBUTED(1321791,ffff9dc8eec0,python):2025-07-15-11:25:24.726.057 [mindspore/ccsrc/distributed/collective/collective_manager.cc:341] CreateCommunicationGroup] Start to create communication group: hccl_world_group [const vector]{0, 1, 2, 3, 4, 5, 6, 7}, async: 0, submit_now: 1 [WARNING] DEVICE(1321791,fffee927efa0,python):2025-07-15-11:25:24.726.554 [mindspore/ccsrc/plugin/res_manager/ascend/collective/ascend_communication_group.cc:254] SetGlobalCommInfo] Start to SetGlobalCommInfo for hccl_world_group, master_ip:2130706433, master_port:10971, node_rank:2130706433, total_rank_size:8, local_rank_size8 [WARNING] HCCL_ADPT(1321791,fffee927efa0,python):2025-07-15-11:25:24.726.662 [mindspore/ccsrc/utils/dlopen_macro.h:165] DlsymAscend] Dynamically load symbol HcclSetGlobalCommInfo failed, result = /home/jenkins/anaconda3/envs/ci39/lib/python3.9/site-packages/mindspore/communication/../lib/plugin/ascend/libhccl_plugin.so: undefined symbol: HcclSetGlobalCommInfo [WARNING] HCCL_ADPT(1321791,fffee927efa0,python):2025-07-15-11:25:24.726.699 [mindspore/ccsrc/plugin/res_manager/ascend/hccl_adapter/hccl_adapter.cc:635] HcclSetGlobalCommInfo] Func HcclSetGlobalCommInfo is not supported in CANN package. [WARNING] DEVICE(1321791,fffee927efa0,python):2025-07-15-11:25:24.726.728 [mindspore/ccsrc/plugin/res_manager/ascend/collective/ascend_communication_group.cc:265] SetGlobalCommInfo] End to SetGlobalCommInfo for hccl_world_group [WARNING] DEVICE(1321791,fffee927efa0,python):2025-07-15-11:25:24.727.127 [mindspore/ccsrc/plugin/device/cpu/hal/hardware/ms_collective_comm_lib.cc:251] QueryUniqueID] Retry to lookup the unique id for group hccl_world_group from the meta server node...Retry time: 18446744073709551614/400, sleep 1 [WARNING] DISTRIBUTED(1321666,ffffba6feec0,python):2025-07-15-11:25:24.826.972 [mindspore/ccsrc/distributed/collective/collective_manager.cc:341] CreateCommunicationGroup] Start to create communication group: hccl_world_group [const vector]{0, 1, 2, 3, 4, 5, 6, 7}, async: 0, submit_now: 1 [WARNING] DEVICE(1321666,fffeb7ffefa0,python):2025-07-15-11:25:24.827.582 [mindspore/ccsrc/plugin/res_manager/ascend/collective/ascend_communication_group.cc:254] SetGlobalCommInfo] Start to SetGlobalCommInfo for hccl_world_group, master_ip:2130706433, master_port:10971, node_rank:2130706433, total_rank_size:8, local_rank_size8 [WARNING] HCCL_ADPT(1321666,fffeb7ffefa0,python):2025-07-15-11:25:24.827.672 [mindspore/ccsrc/utils/dlopen_macro.h:165] DlsymAscend] Dynamically load symbol HcclSetGlobalCommInfo failed, result = /home/jenkins/anaconda3/envs/ci39/lib/python3.9/site-packages/mindspore/communication/../lib/plugin/ascend/libhccl_plugin.so: undefined symbol: HcclSetGlobalCommInfo [WARNING] HCCL_ADPT(1321666,fffeb7ffefa0,python):2025-07-15-11:25:24.827.708 [mindspore/ccsrc/plugin/res_manager/ascend/hccl_adapter/hccl_adapter.cc:635] HcclSetGlobalCommInfo] Func HcclSetGlobalCommInfo is not supported in CANN package. [WARNING] DEVICE(1321666,fffeb7ffefa0,python):2025-07-15-11:25:24.827.737 [mindspore/ccsrc/plugin/res_manager/ascend/collective/ascend_communication_group.cc:265] SetGlobalCommInfo] End to SetGlobalCommInfo for hccl_world_group [WARNING] DISTRIBUTED(1321754,ffffacabeec0,python):2025-07-15-11:25:24.833.659 [mindspore/ccsrc/distributed/collective/collective_manager.cc:341] CreateCommunicationGroup] Start to create communication group: hccl_world_group [const vector]{0, 1, 2, 3, 4, 5, 6, 7}, async: 0, submit_now: 1 [WARNING] DEVICE(1321754,fffeee6eefa0,python):2025-07-15-11:25:24.834.207 [mindspore/ccsrc/plugin/res_manager/ascend/collective/ascend_communication_group.cc:254] SetGlobalCommInfo] Start to SetGlobalCommInfo for hccl_world_group, master_ip:2130706433, master_port:10971, node_rank:2130706433, total_rank_size:8, local_rank_size8 [WARNING] HCCL_ADPT(1321754,fffeee6eefa0,python):2025-07-15-11:25:24.834.304 [mindspore/ccsrc/utils/dlopen_macro.h:165] DlsymAscend] Dynamically load symbol HcclSetGlobalCommInfo failed, result = /home/jenkins/anaconda3/envs/ci39/lib/python3.9/site-packages/mindspore/communication/../lib/plugin/ascend/libhccl_plugin.so: undefined symbol: HcclSetGlobalCommInfo [WARNING] HCCL_ADPT(1321754,fffeee6eefa0,python):2025-07-15-11:25:24.834.341 [mindspore/ccsrc/plugin/res_manager/ascend/hccl_adapter/hccl_adapter.cc:635] HcclSetGlobalCommInfo] Func HcclSetGlobalCommInfo is not supported in CANN package. [WARNING] DEVICE(1321754,fffeee6eefa0,python):2025-07-15-11:25:24.834.371 [mindspore/ccsrc/plugin/res_manager/ascend/collective/ascend_communication_group.cc:265] SetGlobalCommInfo] End to SetGlobalCommInfo for hccl_world_group [WARNING] DISTRIBUTED(1321666,fffeb7ffefa0,python):2025-07-15-11:25:24.834.452 [mindspore/ccsrc/distributed/collective/collective_manager.cc:1021] CreateDeviceCommunicator] Begin initialize communication group on the device side: hccl_world_group [WARNING] DISTRIBUTED(1321754,fffeee6eefa0,python):2025-07-15-11:25:24.834.775 [mindspore/ccsrc/distributed/collective/collective_manager.cc:1021] CreateDeviceCommunicator] Begin initialize communication group on the device side: hccl_world_group [WARNING] DEVICE(1321666,fffeb5fbefa0,python):2025-07-15-11:25:24.834.781 [mindspore/ccsrc/plugin/res_manager/ascend/collective/ascend_communication_group.cc:169] InitByRootInfoConfig] Start to initialize communicator by HcclCommInitRootInfoConfig for hccl_world_group, hcclBufferSize is 200 MB, hcclDeterministic is 0 [WARNING] DEVICE(1321754,fffeededefa0,python):2025-07-15-11:25:24.835.077 [mindspore/ccsrc/plugin/res_manager/ascend/collective/ascend_communication_group.cc:169] InitByRootInfoConfig] Start to initialize communicator by HcclCommInitRootInfoConfig for hccl_world_group, hcclBufferSize is 200 MB, hcclDeterministic is 0 [WARNING] DISTRIBUTED(1321743,fffecdc6efa0,python):2025-07-15-11:25:24.851.663 [mindspore/ccsrc/distributed/collective/collective_manager.cc:1021] CreateDeviceCommunicator] Begin initialize communication group on the device side: hccl_world_group [WARNING] DEVICE(1321743,fffecd45efa0,python):2025-07-15-11:25:24.852.045 [mindspore/ccsrc/plugin/res_manager/ascend/collective/ascend_communication_group.cc:169] InitByRootInfoConfig] Start to initialize communicator by HcclCommInitRootInfoConfig for hccl_world_group, hcclBufferSize is 200 MB, hcclDeterministic is 0 [WARNING] DISTRIBUTED(1321853,fffed906efa0,python):2025-07-15-11:25:24.876.385 [mindspore/ccsrc/distributed/collective/collective_manager.cc:1021] CreateDeviceCommunicator] Begin initialize communication group on the device side: hccl_world_group [WARNING] DEVICE(1321853,fffed885efa0,python):2025-07-15-11:25:24.876.869 [mindspore/ccsrc/plugin/res_manager/ascend/collective/ascend_communication_group.cc:169] InitByRootInfoConfig] Start to initialize communicator by HcclCommInitRootInfoConfig for hccl_world_group, hcclBufferSize is 200 MB, hcclDeterministic is 0 [WARNING] DISTRIBUTED(1321727,fffedce0efa0,python):2025-07-15-11:25:25.074.145 [mindspore/ccsrc/distributed/collective/collective_manager.cc:1021] CreateDeviceCommunicator] Begin initialize communication group on the device side: hccl_world_group [WARNING] DEVICE(1321727,fffe8fffefa0,python):2025-07-15-11:25:25.074.645 [mindspore/ccsrc/plugin/res_manager/ascend/collective/ascend_communication_group.cc:169] InitByRootInfoConfig] Start to initialize communicator by HcclCommInitRootInfoConfig for hccl_world_group, hcclBufferSize is 200 MB, hcclDeterministic is 0 [WARNING] DISTRIBUTED(1321816,fffede95efa0,python):2025-07-15-11:25:25.226.415 [mindspore/ccsrc/distributed/collective/collective_manager.cc:1021] CreateDeviceCommunicator] Begin initialize communication group on the device side: hccl_world_group [WARNING] DEVICE(1321816,fffede14efa0,python):2025-07-15-11:25:25.226.907 [mindspore/ccsrc/plugin/res_manager/ascend/collective/ascend_communication_group.cc:169] InitByRootInfoConfig] Start to initialize communicator by HcclCommInitRootInfoConfig for hccl_world_group, hcclBufferSize is 200 MB, hcclDeterministic is 0 [WARNING] DISTRIBUTED(1321791,fffee927efa0,python):2025-07-15-11:25:25.227.481 [mindspore/ccsrc/distributed/collective/collective_manager.cc:1021] CreateDeviceCommunicator] Begin initialize communication group on the device side: hccl_world_group [WARNING] DEVICE(1321791,fffede58efa0,python):2025-07-15-11:25:25.227.883 [mindspore/ccsrc/plugin/res_manager/ascend/collective/ascend_communication_group.cc:169] InitByRootInfoConfig] Start to initialize communicator by HcclCommInitRootInfoConfig for hccl_world_group, hcclBufferSize is 200 MB, hcclDeterministic is 0 Traceback (most recent call last): File "/home/jenkins/mindspore/testcases/testcases/tests/st/backend_ascend/debug/resuming_interface.py", line 62, in rebuild_hccl_interface() File "/home/jenkins/mindspore/testcases/testcases/tests/st/backend_ascend/debug/resuming_interface.py", line 41, in rebuild_hccl_interface init() File "/home/jenkins/anaconda3/envs/ci39/lib/python3.9/site-packages/mindspore/communication/management.py", line 203, in init init_hccl() RuntimeError: Call aclrtSetDevice failed, ret[507033]. Got device count[8] and device id[1], please check if device id is valid. ---------------------------------------------------- - C++ Call Stack: (For framework developers) ---------------------------------------------------- mindspore/ccsrc/plugin/res_manager/ascend/hal_manager/ascend_hal_manager.cc:67 InitDevice [WARNING] DEVICE(1321679,ffff8bf5eec0,python):2025-07-15-11:28:04.330.518 [mindspore/ccsrc/plugin/device/ascend/hal/hardware/ascend_device_res_manager.cc:350] SyncAllStreams] The ascend_res_manager_ is nullptr in scenarios where it is not actually executed [ERROR] ME(1320078:281473043328704,MainProcess):2025-07-15-11:28:06.213.48 [mindspore/parallel/cluster/process_entity/_api.py:363] Worker process 1321679 exit with exception. Error code: 1. [WARNING] ME(1320078:281473043328704,MainProcess):2025-07-15-11:28:06.224.34 [mindspore/parallel/cluster/process_entity/_api.py:369] There's worker exits with exception, kill all other workers. End test case execution due to test case run timeout! Max: 776s