Skip to content

Conversation

vankov
Copy link
Contributor

@vankov vankov commented Nov 13, 2021

This is an optimization of batch processing in BertSentenceEmbeddings and BertEmbeddings model which improves performance of single machine setups where there are few/single sentence per row.

Description

The HasBatchedAnnotate trait lets models process several rows at once. The idea is to enable batching annotations across rows which can significantly speed up processing, especially on GPU hardware. However, the current implementation of batchAnnotate method in BertSentenceEmbeddings and BertEmbeddings doesn't actually allows this as they process rows one by one therefore and can therefore batch only annotations within a single row. This PR solves the problem by adding all annotations which are passed to batchAnnotate, regardless of which row they belong to, to a single collection and then passing this collection to TensorflowBert to process them using the given batch size.
The change leads to significant performance gain on single machine setup when there is a single or a few annotations per row. Below are the results of the BertEmbeddings benchmark test on the conll2004 dataset:

image

The test is run on a single machine with a GPU. GPU performance is significantly improved when sentence are exploded (i.e. a single sentence per row).

Additional tests revealed that the changes make little to no impact on performance in a cluster setup with multiple GPUs. Apparently such a Spark setup is capable of utilizing the GPUs even when each call of batchAnnotate is feeding the input tensors one by one.

I've re-implemented batchAnnotate method of BertSentenceEmbeddings and BertEmbeddings. Instead of processing the annotations separately for each row, I collect put all the annotations in a single collection and make sure the allocation of annotations per rows is preserved by zipping annotations to their row index. The all annotations are tokenized and passed to the TF graph to compute their (sentence or word) embeddings. At the end embeddings are again distributed across rows.

Motivation and Context

It lets BertSentenceEmbeddings and BertEmbeddings processes sentences in batches when there are a few annotations per row. In such cases, the change significantly improves performance on single machine setups.

How Has This Been Tested?

I've made sure the results of the improved batch processing are identical to the ones produced by the current implementation. This test uses different JAR files so it is not included in the current commit.
I've also tested performance in order to demonstrate the changes indeed improve performance (see table above).
Test were run on a local SparkNLP installation.
Changes don't have any impact on functionality and shouldn't affect or break any existing code

Screenshots (if appropriate):

image

image

image

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • Code improvements with no or little impact
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

It is actually neither of the above, we need one more option 'code optimization with no functional implications but significant impact on performance'

Checklist:

  • My code follows the code style of this project.
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.
  • I have read the CONTRIBUTING page.
  • I have added tests to cover my changes.
  • All new and existing tests passed.

@vankov vankov requested a review from maziyarpanahi November 13, 2021 13:13
@vankov
Copy link
Contributor Author

vankov commented Nov 13, 2021

I also checked other models which override batchAnnotate and it seems that they all have the same problem. So the following models should be also updated if this optimization is accepted:

XxxForTokenClassificaiton models (com.johnsnowlabs.nlp.annotators.classifier.dl)
XxxEmbeddings models (com.johnsnowlabs.nlp.annotators.embeddings)
MarianTransformer (com.johnsnowlabs.nlp.annotators.seq2seq)

NerDL seems to be only model which currently allows batching across rows, but I need to double check this.

@maziyarpanahi maziyarpanahi self-assigned this Nov 13, 2021
@maziyarpanahi maziyarpanahi added DON'T MERGE Do not merge this PR enhancement on-hold cannot be merged right away labels Nov 13, 2021
@maziyarpanahi maziyarpanahi changed the base branch from master to release/333-release-candidate November 17, 2021 10:15
@maziyarpanahi maziyarpanahi merged commit 7936487 into release/333-release-candidate Nov 17, 2021
@KshitizGIT KshitizGIT deleted the optimize_bert_batch_processing branch March 2, 2023 11:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
DON'T MERGE Do not merge this PR enhancement on-hold cannot be merged right away
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants