升级到1.7.0.post1，使用sglang挂载大模型到H100时报错undefined symbol:

### System Info / 系統信息

Server error: 500 - [address=0.0.0.0:42651, pid=775] /usr/local/lib/python3.10/dist-packages/sgl_kernel/common_ops.abi3.so: undefined symbol: _ZN3c108ListType3getERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEENS_4Type24SingletonOrSharedTypePtrIS9_EE

之前的1.6.1版本都没问题，升级后就报错了，其它嵌入模型和重排序模型都没问题，是不是sglang的问题？

详细日志：
2025-06-17 04:43:04,220 transformers.utils.import_utils 775 DEBUG    Detected torch version: 2.6.0
Detected torch version: 2.6.0
INFO 06-17 04:43:05 [__init__.py:239] Automatically detected platform cuda.
WARNING 06-17 04:43:05 [cuda.py:409] Detected different devices in the system: NVIDIA H100, NVIDIA GeForce RTX 3090, NVIDIA GeForce RTX 3090. Please make sure to set `CUDA_DEVICE_ORDER=PCI_BUS_ID` to avoid unexpected behavior.
2025-06-17 04:43:07,748 transformers.utils.import_utils 775 DEBUG    Detected flash_attn version: 2.7.4.post1
Detected flash_attn version: 2.7.4.post1
2025-06-17 04:43:07,767 transformers.utils.import_utils 775 DEBUG    Detected flash_attn version: 2.7.4.post1
Detected flash_attn version: 2.7.4.post1
2025-06-17 04:43:07,768 transformers.utils.import_utils 775 DEBUG    Detected flash_attn version: 2.7.4.post1
Detected flash_attn version: 2.7.4.post1
2025-06-17 04:43:07,769 transformers.utils.import_utils 775 DEBUG    Detected flash_attn version: 2.7.4.post1
Detected flash_attn version: 2.7.4.post1
2025-06-17 04:43:07,777 transformers.utils.import_utils 775 DEBUG    Detected torch version: 2.6.0
Detected torch version: 2.6.0
2025-06-17 04:43:07,831 xinference.core.model 775 DEBUG    Starting ModelActor at 0.0.0.0:42651, uid: b'qwen3-0'
2025-06-17 04:43:07,831 xinference.core.model 775 INFO     Start requests handler.
2025-06-17 04:43:07,857 xinference.model.llm.sglang.core 775 INFO     Loading qwen3 with following model config: {'chunked_prefill_size': -1, 'max_prefill_tokens': 128000, 'max_total_tokens': 400000, 'mem_fraction_static': 0.96, 'context_length': 131092, 'max_running_requests': 8, 'tokenizer_mode': 'auto', 'trust_remote_code': True, 'tp_size': 1, 'log_level': 'info', 'triton_attention_reduce_in_fp32': False}
2025-06-17 04:43:08,382 xinference.core.worker 110 ERROR    Failed to load model qwen3-0
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/xinference/core/worker.py", line 1113, in launch_builtin_model
    await model_ref.load()
  File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/context.py", line 262, in send
    return self._process_result_message(result)
  File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/context.py", line 111, in _process_result_message
    raise message.as_instanceof_cause()
  File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/pool.py", line 689, in send
    result = await self._run_coro(message.message_id, coro)
  File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/pool.py", line 389, in _run_coro
    return await coro
  File "/usr/local/lib/python3.10/dist-packages/xoscar/api.py", line 418, in __on_receive__
    return await super().__on_receive__(message)  # type: ignore
  File "xoscar/core.pyx", line 564, in __on_receive__
    raise ex
  File "xoscar/core.pyx", line 526, in xoscar.core._BaseActor.__on_receive__
    async with self._lock:
  File "xoscar/core.pyx", line 527, in xoscar.core._BaseActor.__on_receive__
    with debug_async_timeout('actor_lock_timeout',
  File "xoscar/core.pyx", line 532, in xoscar.core._BaseActor.__on_receive__
    result = await result
  File "/opt/inference/xinference/core/model.py", line 477, in load
    await asyncio.to_thread(self._model.load)
  File "/usr/lib/python3.10/asyncio/threads.py", line 25, in to_thread
    return await loop.run_in_executor(None, func_call)
  File "/usr/lib/python3.10/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/opt/inference/xinference/model/llm/sglang/core.py", line 239, in load
    self._engine = sgl.Runtime(
  File "/usr/local/lib/python3.10/dist-packages/sglang/api.py", line 38, in Runtime
    return Runtime(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/sglang/lang/backend/runtime_endpoint.py", line 374, in __init__
    from sglang.srt.entrypoints.http_server import launch_server
  File "/usr/local/lib/python3.10/dist-packages/sglang/srt/entrypoints/http_server.py", line 49, in <module>
    from sglang.srt.entrypoints.engine import _launch_subprocesses
  File "/usr/local/lib/python3.10/dist-packages/sglang/srt/entrypoints/engine.py", line 42, in <module>
    from sglang.srt.managers.data_parallel_controller import (
  File "/usr/local/lib/python3.10/dist-packages/sglang/srt/managers/data_parallel_controller.py", line 28, in <module>
    from sglang.srt.managers.io_struct import (
  File "/usr/local/lib/python3.10/dist-packages/sglang/srt/managers/io_struct.py", line 33, in <module>
    from sglang.srt.managers.schedule_batch import BaseFinishReason
  File "/usr/local/lib/python3.10/dist-packages/sglang/srt/managers/schedule_batch.py", line 49, in <module>
    from sglang.srt.configs.model_config import ModelConfig
  File "/usr/local/lib/python3.10/dist-packages/sglang/srt/configs/model_config.py", line 30, in <module>
    from sglang.srt.layers.quantization import QUANTIZATION_METHODS
  File "/usr/local/lib/python3.10/dist-packages/sglang/srt/layers/quantization/__init__.py", line 53, in <module>
    from sglang.srt.layers.quantization.awq import AWQConfig
  File "/usr/local/lib/python3.10/dist-packages/sglang/srt/layers/quantization/awq.py", line 18, in <module>
    from sgl_kernel import awq_dequantize
  File "/usr/local/lib/python3.10/dist-packages/sgl_kernel/__init__.py", line 13, in <module>
    from sgl_kernel import common_ops
ImportError: [address=0.0.0.0:42651, pid=775] /usr/local/lib/python3.10/dist-packages/sgl_kernel/common_ops.abi3.so: undefined symbol: _ZN3c108ListType3getERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEENS_4Type24SingletonOrSharedTypePtrIS9_EE
2025-06-17 04:43:08,383 xinference.core.progress_tracker 110 DEBUG    Setting progress, request id: launching-qwen3-0, progress: 1.0
2025-06-17 04:43:08,477 xinference.core.worker 110 ERROR    [request 3992df38-4b70-11f0-9ecb-9668a61d5466] Leave launch_builtin_model, error: [address=0.0.0.0:42651, pid=775] /usr/local/lib/python3.10/dist-packages/sgl_kernel/common_ops.abi3.so: undefined symbol: _ZN3c108ListType3getERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEENS_4Type24SingletonOrSharedTypePtrIS9_EE, elapsed time: 6 s
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/xinference/core/utils.py", line 93, in wrapped
    ret = await func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/xinference/core/worker.py", line 1113, in launch_builtin_model
    await model_ref.load()
  File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/context.py", line 262, in send
    return self._process_result_message(result)
  File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/context.py", line 111, in _process_result_message
    raise message.as_instanceof_cause()
  File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/pool.py", line 689, in send
    result = await self._run_coro(message.message_id, coro)
  File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/pool.py", line 389, in _run_coro
    return await coro
  File "/usr/local/lib/python3.10/dist-packages/xoscar/api.py", line 418, in __on_receive__
    return await super().__on_receive__(message)  # type: ignore
  File "xoscar/core.pyx", line 564, in __on_receive__
    raise ex
  File "xoscar/core.pyx", line 526, in xoscar.core._BaseActor.__on_receive__
    async with self._lock:
  File "xoscar/core.pyx", line 527, in xoscar.core._BaseActor.__on_receive__
    with debug_async_timeout('actor_lock_timeout',
  File "xoscar/core.pyx", line 532, in xoscar.core._BaseActor.__on_receive__
    result = await result
  File "/opt/inference/xinference/core/model.py", line 477, in load
    await asyncio.to_thread(self._model.load)
  File "/usr/lib/python3.10/asyncio/threads.py", line 25, in to_thread
    return await loop.run_in_executor(None, func_call)
  File "/usr/lib/python3.10/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/opt/inference/xinference/model/llm/sglang/core.py", line 239, in load
    self._engine = sgl.Runtime(
  File "/usr/local/lib/python3.10/dist-packages/sglang/api.py", line 38, in Runtime
    return Runtime(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/sglang/lang/backend/runtime_endpoint.py", line 374, in __init__
    from sglang.srt.entrypoints.http_server import launch_server
  File "/usr/local/lib/python3.10/dist-packages/sglang/srt/entrypoints/http_server.py", line 49, in <module>
    from sglang.srt.entrypoints.engine import _launch_subprocesses
  File "/usr/local/lib/python3.10/dist-packages/sglang/srt/entrypoints/engine.py", line 42, in <module>
    from sglang.srt.managers.data_parallel_controller import (
  File "/usr/local/lib/python3.10/dist-packages/sglang/srt/managers/data_parallel_controller.py", line 28, in <module>
    from sglang.srt.managers.io_struct import (
  File "/usr/local/lib/python3.10/dist-packages/sglang/srt/managers/io_struct.py", line 33, in <module>
    from sglang.srt.managers.schedule_batch import BaseFinishReason
  File "/usr/local/lib/python3.10/dist-packages/sglang/srt/managers/schedule_batch.py", line 49, in <module>
    from sglang.srt.configs.model_config import ModelConfig
  File "/usr/local/lib/python3.10/dist-packages/sglang/srt/configs/model_config.py", line 30, in <module>
    from sglang.srt.layers.quantization import QUANTIZATION_METHODS
  File "/usr/local/lib/python3.10/dist-packages/sglang/srt/layers/quantization/__init__.py", line 53, in <module>
    from sglang.srt.layers.quantization.awq import AWQConfig
  File "/usr/local/lib/python3.10/dist-packages/sglang/srt/layers/quantization/awq.py", line 18, in <module>
    from sgl_kernel import awq_dequantize
  File "/usr/local/lib/python3.10/dist-packages/sgl_kernel/__init__.py", line 13, in <module>
    from sgl_kernel import common_ops
ImportError: [address=0.0.0.0:42651, pid=775] /usr/local/lib/python3.10/dist-packages/sgl_kernel/common_ops.abi3.so: undefined symbol: _ZN3c108ListType3getERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEENS_4Type24SingletonOrSharedTypePtrIS9_EE
2025-06-17 04:43:08,478 xinference.core.supervisor 110 DEBUG    [request 3d442524-4b70-11f0-9ecb-9668a61d5466] Enter terminate_model, args: <xinference.core.supervisor.SupervisorActor object at 0x7a1c9518ddf0>,qwen3, kwargs: suppress_exception=True
2025-06-17 04:43:08,478 xinference.core.supervisor 110 DEBUG    [request 3d442524-4b70-11f0-9ecb-9668a61d5466] Leave terminate_model, elapsed time: 0 s
2025-06-17 04:43:08,480 xinference.api.restful_api 1 ERROR    [address=0.0.0.0:42651, pid=775] /usr/local/lib/python3.10/dist-packages/sgl_kernel/common_ops.abi3.so: undefined symbol: _ZN3c108ListType3getERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEENS_4Type24SingletonOrSharedTypePtrIS9_EE
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/xinference/api/restful_api.py", line 1077, in launch_model
    model_uid = await (await self._get_supervisor_ref()).launch_builtin_model(
  File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/context.py", line 262, in send
    return self._process_result_message(result)
  File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/context.py", line 111, in _process_result_message
    raise message.as_instanceof_cause()
  File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/pool.py", line 689, in send
    result = await self._run_coro(message.message_id, coro)
  File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/pool.py", line 389, in _run_coro
    return await coro
  File "/usr/local/lib/python3.10/dist-packages/xoscar/api.py", line 418, in __on_receive__
    return await super().__on_receive__(message)  # type: ignore
  File "xoscar/core.pyx", line 564, in __on_receive__
    raise ex
  File "xoscar/core.pyx", line 526, in xoscar.core._BaseActor.__on_receive__
    async with self._lock:
  File "xoscar/core.pyx", line 527, in xoscar.core._BaseActor.__on_receive__
    with debug_async_timeout('actor_lock_timeout',
  File "xoscar/core.pyx", line 532, in xoscar.core._BaseActor.__on_receive__
    result = await result
  File "/usr/local/lib/python3.10/dist-packages/xinference/core/supervisor.py", line 1190, in launch_builtin_model
    await _launch_model()
  File "/usr/local/lib/python3.10/dist-packages/xinference/core/supervisor.py", line 1125, in _launch_model
    subpool_address = await _launch_one_model(
  File "/usr/local/lib/python3.10/dist-packages/xinference/core/supervisor.py", line 1079, in _launch_one_model
    subpool_address = await worker_ref.launch_builtin_model(
  File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/context.py", line 262, in send
    return self._process_result_message(result)
  File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/context.py", line 111, in _process_result_message
    raise message.as_instanceof_cause()
  File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/pool.py", line 689, in send
    result = await self._run_coro(message.message_id, coro)
  File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/pool.py", line 389, in _run_coro
    return await coro
  File "/usr/local/lib/python3.10/dist-packages/xoscar/api.py", line 418, in __on_receive__
    return await super().__on_receive__(message)  # type: ignore
  File "xoscar/core.pyx", line 564, in __on_receive__
    raise ex
  File "xoscar/core.pyx", line 526, in xoscar.core._BaseActor.__on_receive__
    async with self._lock:
  File "xoscar/core.pyx", line 527, in xoscar.core._BaseActor.__on_receive__
    with debug_async_timeout('actor_lock_timeout',
  File "xoscar/core.pyx", line 532, in xoscar.core._BaseActor.__on_receive__
    result = await result
  File "/usr/local/lib/python3.10/dist-packages/xinference/core/utils.py", line 93, in wrapped
    ret = await func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/xinference/core/worker.py", line 1113, in launch_builtin_model
    await model_ref.load()
  File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/context.py", line 262, in send
    return self._process_result_message(result)
  File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/context.py", line 111, in _process_result_message
    raise message.as_instanceof_cause()
  File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/pool.py", line 689, in send
    result = await self._run_coro(message.message_id, coro)
  File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/pool.py", line 389, in _run_coro
    return await coro
  File "/usr/local/lib/python3.10/dist-packages/xoscar/api.py", line 418, in __on_receive__
    return await super().__on_receive__(message)  # type: ignore
  File "xoscar/core.pyx", line 564, in __on_receive__
    raise ex
  File "xoscar/core.pyx", line 526, in xoscar.core._BaseActor.__on_receive__
    async with self._lock:
  File "xoscar/core.pyx", line 527, in xoscar.core._BaseActor.__on_receive__
    with debug_async_timeout('actor_lock_timeout',
  File "xoscar/core.pyx", line 532, in xoscar.core._BaseActor.__on_receive__
    result = await result
  File "/opt/inference/xinference/core/model.py", line 477, in load
    await asyncio.to_thread(self._model.load)
  File "/usr/lib/python3.10/asyncio/threads.py", line 25, in to_thread
    return await loop.run_in_executor(None, func_call)
  File "/usr/lib/python3.10/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/opt/inference/xinference/model/llm/sglang/core.py", line 239, in load
    self._engine = sgl.Runtime(
  File "/usr/local/lib/python3.10/dist-packages/sglang/api.py", line 38, in Runtime
    return Runtime(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/sglang/lang/backend/runtime_endpoint.py", line 374, in __init__
    from sglang.srt.entrypoints.http_server import launch_server
  File "/usr/local/lib/python3.10/dist-packages/sglang/srt/entrypoints/http_server.py", line 49, in <module>
    from sglang.srt.entrypoints.engine import _launch_subprocesses
  File "/usr/local/lib/python3.10/dist-packages/sglang/srt/entrypoints/engine.py", line 42, in <module>
    from sglang.srt.managers.data_parallel_controller import (
  File "/usr/local/lib/python3.10/dist-packages/sglang/srt/managers/data_parallel_controller.py", line 28, in <module>
    from sglang.srt.managers.io_struct import (
  File "/usr/local/lib/python3.10/dist-packages/sglang/srt/managers/io_struct.py", line 33, in <module>
    from sglang.srt.managers.schedule_batch import BaseFinishReason
  File "/usr/local/lib/python3.10/dist-packages/sglang/srt/managers/schedule_batch.py", line 49, in <module>
    from sglang.srt.configs.model_config import ModelConfig
  File "/usr/local/lib/python3.10/dist-packages/sglang/srt/configs/model_config.py", line 30, in <module>
    from sglang.srt.layers.quantization import QUANTIZATION_METHODS
  File "/usr/local/lib/python3.10/dist-packages/sglang/srt/layers/quantization/__init__.py", line 53, in <module>
    from sglang.srt.layers.quantization.awq import AWQConfig
  File "/usr/local/lib/python3.10/dist-packages/sglang/srt/layers/quantization/awq.py", line 18, in <module>
    from sgl_kernel import awq_dequantize
  File "/usr/local/lib/python3.10/dist-packages/sgl_kernel/__init__.py", line 13, in <module>
    from sgl_kernel import common_ops
ImportError: [address=0.0.0.0:42651, pid=775] /usr/local/lib/python3.10/dist-packages/sgl_kernel/common_ops.abi3.so: undefined symbol: _ZN3c108ListType3getERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEENS_4Type24SingletonOrSharedTypePtrIS9_EE


### Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece？

- [x] docker / docker
- [ ] pip install / 通过 pip install 安装
- [ ] installation from source / 从源码安装

### Version info / 版本信息

1.7.0.post1

### The command used to start Xinference / 用以启动 xinference 的命令

docker run -d \
  --name xinference \
  --restart=always \
  --runtime=nvidia \
  --gpus=all \
  -p 9997:9997 \
  -v /home/ubuntu/xinference_data/models:/data/models \
  -v /home/ubuntu/xinference_data/cache/huggingface:/root/.cache/huggingface \
  -v /home/ubuntu/xinference_data/cache/modelscope:/root/.cache/modelscope \
  xinference:1.7.0.post1


### Reproduction / 复现过程

1.选择Qwen3-30B
2.sgalng
3.pytorch
4.无量化
挂载阶段报错

### Expected behavior / 期待表现

怎么解决这个问题？

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

升级到1.7.0.post1，使用sglang挂载大模型到H100时报错undefined symbol: #3657

System Info / 系統信息

Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece？

Version info / 版本信息

The command used to start Xinference / 用以启动 xinference 的命令

Reproduction / 复现过程

Expected behavior / 期待表现

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

升级到1.7.0.post1，使用sglang挂载大模型到H100时报错undefined symbol: #3657

Description

System Info / 系統信息

Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece？

Version info / 版本信息

The command used to start Xinference / 用以启动 xinference 的命令

Reproduction / 复现过程

Expected behavior / 期待表现

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions