Skip to content

Both max_new_tokens (=512) and max_length(=518) seem to have been set. max_new_tokens will take precedence. Please refer to the documentation for more information.  #1872

@fpy10

Description

@fpy10

System Info / 系統信息

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Mon_Apr__3_17:36:15_Pacific_Daylight_Time_2023
Cuda compilation tools, release 12.1, V12.1.105
Build cuda_12.1.r12.1/compiler.32688072_0

torch 2.3.1+cu121
torchaudio 2.3.1+cu121
torchvision 0.18.1
vector-quantize-pytorch 1.15.3

Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece?

  • docker / docker
  • pip install / 通过 pip install 安装
  • installation from source / 从源码安装

Version info / 版本信息

pip install "xinference[transformers]"

The command used to start Xinference / 用以启动 xinference 的命令

xinference-local

Reproduction / 复现过程

与GLM4沟通第一次后,便运行报错,无法继续沟通
--- Logging error ---
Traceback (most recent call last):
File "C:\Users\87952\miniconda3\envs\xinference\lib\logging\handlers.py", line 73, in emit
if self.shouldRollover(record):
File "C:\Users\87952\miniconda3\envs\xinference\lib\logging\handlers.py", line 196, in shouldRollover
msg = "%s\n" % self.format(record)
File "C:\Users\87952\miniconda3\envs\xinference\lib\logging_init_.py", line 943, in format
return fmt.format(record)
File "C:\Users\87952\miniconda3\envs\xinference\lib\logging_init_.py", line 678, in format
record.message = record.getMessage()
File "C:\Users\87952\miniconda3\envs\xinference\lib\logging_init_.py", line 368, in getMessage
msg = msg % self.args
TypeError: not all arguments converted during string formatting
Call stack:
File "C:\Users\87952\miniconda3\envs\xinference\lib\threading.py", line 973, in _bootstrap
self._bootstrap_inner()
File "C:\Users\87952\miniconda3\envs\xinference\lib\threading.py", line 1016, in _bootstrap_inner
self.run()
File "C:\Users\87952\miniconda3\envs\xinference\lib\threading.py", line 953, in run
self._target(*self._args, **self._kwargs)
File "C:\Users\87952\miniconda3\envs\xinference\lib\concurrent\futures\thread.py", line 83, in _worker
work_item.run()
File "C:\Users\87952\miniconda3\envs\xinference\lib\concurrent\futures\thread.py", line 58, in run
result = self.fn(*self.args, **self.kwargs)
File "C:\Users\87952\miniconda3\envs\xinference\lib\site-packages\xoscar\api.py", line 402, in _wrapper
return next(_gen)
File "C:\Users\87952\miniconda3\envs\xinference\lib\site-packages\xinference\core\model.py", line 318, in _to_json_generator
for v in gen:
File "C:\Users\87952\miniconda3\envs\xinference\lib\site-packages\xinference\model\llm\utils.py", line 558, in _to_chat_completion_chunks
for i, chunk in enumerate(chunks):
File "C:\Users\87952\miniconda3\envs\xinference\lib\site-packages\xinference\model\llm\pytorch\chatglm.py", line 259, in _stream_generator
for chunk_text, _ in self._model.stream_chat(
File "C:\Users\87952\miniconda3\envs\xinference\lib\site-packages\torch\utils_contextlib.py", line 35, in generator_context
response = gen.send(None)
File "C:\Users\Administrator.cache\huggingface\modules\transformers_modules\glm4-chat-pytorch-9b\modeling_chatglm.py", line 1139, in stream_chat
for outputs in self.stream_generate(**inputs, past_key_values=past_key_values,
File "C:\Users\87952\miniconda3\envs\xinference\lib\site-packages\torch\utils_contextlib.py", line 35, in generator_context
response = gen.send(None)
File "C:\Users\Administrator.cache\huggingface\modules\transformers_modules\glm4-chat-pytorch-9b\modeling_chatglm.py", line 1188, in stream_generate
logger.warn(
Message: 'Both max_new_tokens (=512) and max_length(=518) seem to have been set. max_new_tokens will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)'
Arguments: (<class 'UserWarning'>,)
C:\Users\Administrator.cache\huggingface\modules\transformers_modules\glm4-chat-pytorch-9b\modeling_chatglm.py:271: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at ..\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:455.)
context_layer = torch.nn.functional.scaled_dot_product_attention(query_layer, key_layer, value_layer,
Traceback (most recent call last):
File "C:\Users\87952\miniconda3\envs\xinference\lib\asyncio\windows_events.py", line 444, in select
self._poll(timeout)
RuntimeError: <_overlapped.Overlapped object at 0x000002357017F900> still has pending operation at deallocation, the process may crash
Traceback (most recent call last):
File "C:\Users\87952\miniconda3\envs\xinference\lib\asyncio\windows_events.py", line 444, in select
self._poll(timeout)
RuntimeError: <_overlapped.Overlapped object at 0x000002357017F900> still has pending operation at deallocation, the process may crash
Traceback (most recent call last):
File "C:\Users\87952\miniconda3\envs\xinference\lib\asyncio\windows_events.py", line 444, in select
self._poll(timeout)
RuntimeError: <_overlapped.Overlapped object at 0x000002357017F900> still has pending operation at deallocation, the process may crash
Traceback (most recent call last):
File "C:\Users\87952\miniconda3\envs\xinference\lib\asyncio\windows_events.py", line 444, in select
self._poll(timeout)
RuntimeError: <_overlapped.Overlapped object at 0x000001B41194EAF0> still has pending operation at deallocation, the process may crash
Traceback (most recent call last):
File "C:\Users\87952\miniconda3\envs\xinference\lib\asyncio\windows_events.py", line 444, in select
self._poll(timeout)
RuntimeError: <_overlapped.Overlapped object at 0x000001B41194EAF0> still has pending operation at deallocation, the process may crash
Traceback (most recent call last):
File "C:\Users\87952\miniconda3\envs\xinference\lib\asyncio\windows_events.py", line 444, in select
self._poll(timeout)
RuntimeError: <_overlapped.Overlapped object at 0x0000023570BDB750> still has pending operation at deallocation, the process may crash
Traceback (most recent call last):
File "C:\Users\87952\miniconda3\envs\xinference\lib\asyncio\windows_events.py", line 444, in select
self._poll(timeout)
RuntimeError: <_overlapped.Overlapped object at 0x0000023570BDB750> still has pending operation at deallocation, the process may crash

Expected behavior / 期待表现

可以正常运行

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions