Bert2Bert concat axis dimension error at serving time

**Describe the bug**
Converting a Bert2Bert model from [TensorFlow model official](https://github.com/tensorflow/models/blob/master/official/nlp/nhnet/models.py), I get the following errors at serving time:

```
Traceback (most recent call last):
  File "C:/dev/ml/QueryGenerator/query_generator/models/bert2bert/save_model.py", line 245, in <module>
    output = session.run(output_names=None, input_feed=input_feed)
  File "C:\dev\ml\QueryGenerator\venv\lib\site-packages\onnxruntime\capi\onnxruntime_inference_collection.py", line 188, in run
    return self._sess.run(output_names, input_feed, run_options)
onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Non-zero status code returned while running Loop node. Name:'bert2_bert/while_loop' Status Message: Non-zero status code returned while running Concat node. Name:'bert2_bert/while/decoder/decoder/layer_0/self_attention/concat' Status Message: concat.cc:159 onnxruntime::ConcatBase::PrepareForCompute Non concat axis dimensions must match: Axis 0 has mismatched dimensions of 10 and 6
```

It is quite close to the error I had at the end of [this issue](https://github.com/onnx/tensorflow-onnx/issues/1634), but:
- the model is different
- the minimal code is simpler
- the error is different

**System information**
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Windows 10.0.19042
- Tensorflow Version: 2.5.0
- Python version: 3.7.6

**To Reproduce**
It is quite easy to reproduce using the following code:

```python
 import tensorflow as tf
 import onnxruntime
 import tf2onnx
 from official.nlp.nhnet.configs import UNITTEST_CONFIG, BERT2BERTConfig
 from official.nlp.nhnet.models import Bert2Bert, get_bert2bert_layers
 
 MAX_SEQ_LENGTH = 10
 MAX_OUTPUT_LENGTH = 4
 
 # Create the Bert2Bert model
 bert2bert_config_dict = UNITTEST_CONFIG.copy()
 bert2bert_config_dict["max_position_embeddings"] = MAX_SEQ_LENGTH
 bert2bert_config_dict["len_title"] = MAX_OUTPUT_LENGTH
 bert2bert_config = BERT2BERTConfig.from_args(**bert2bert_config_dict)
 bert_layer, decoder_layer = get_bert2bert_layers(params=bert2bert_config)
 
 bert2bert = Bert2Bert(bert2bert_config, bert_layer, decoder_layer)
 
 # Define the serving function
 @tf.function()
 def serve(inputs):
     return bert2bert(inputs=inputs, mode="predict")
 

 # Convert the model to ONNX and save it
 model_proto, _ = tf2onnx.convert.from_function(
     function=serve,
     opset=14,
     input_signature=[{
         'input_ids': tf.TensorSpec(shape=(None, MAX_SEQ_LENGTH,), dtype=tf.int32, name='input_ids'),
         'input_mask': tf.TensorSpec(shape=(None, MAX_SEQ_LENGTH,), dtype=tf.int32, name='input_mask'),
         'segment_ids': tf.TensorSpec(shape=(None, MAX_SEQ_LENGTH,), dtype=tf.int32, name='segment_ids')
     }],
     output_path='model.onnx'
 )
 
 # Try to serve the model
 input_ids = [101, 2023, 2633, 4504, 1999, 6094, 2008, 102, 0, 0]
 sess_options = onnxruntime.SessionOptions()
 sess_options.graph_optimization_level = onnxruntime.GraphOptimizationLevel.ORT_DISABLE_ALL
 session = onnxruntime.InferenceSession('model.onnx',
                                        sess_options,
                                        providers=["CPUExecutionProvider"])
 
 input_feed = {
     "input_ids": [input_ids],
     "input_mask": [[0 if i == 0 else 1 for i in input_ids]],
     "segment_ids": [[0 for _ in input_ids]]
 }
 
output = session.run(output_names=None, input_feed=input_feed)
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Bert2Bert concat axis dimension error at serving time #1757

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Bert2Bert concat axis dimension error at serving time #1757

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions