Optimized batch processing for Bert sentence and word embeddings #6462

vankov · 2021-11-13T13:13:41Z

This is an optimization of batch processing in BertSentenceEmbeddings and BertEmbeddings model which improves performance of single machine setups where there are few/single sentence per row.

Description

The HasBatchedAnnotate trait lets models process several rows at once. The idea is to enable batching annotations across rows which can significantly speed up processing, especially on GPU hardware. However, the current implementation of batchAnnotate method in BertSentenceEmbeddings and BertEmbeddings doesn't actually allows this as they process rows one by one therefore and can therefore batch only annotations within a single row. This PR solves the problem by adding all annotations which are passed to batchAnnotate, regardless of which row they belong to, to a single collection and then passing this collection to TensorflowBert to process them using the given batch size.
The change leads to significant performance gain on single machine setup when there is a single or a few annotations per row. Below are the results of the BertEmbeddings benchmark test on the conll2004 dataset:

The test is run on a single machine with a GPU. GPU performance is significantly improved when sentence are exploded (i.e. a single sentence per row).

Additional tests revealed that the changes make little to no impact on performance in a cluster setup with multiple GPUs. Apparently such a Spark setup is capable of utilizing the GPUs even when each call of batchAnnotate is feeding the input tensors one by one.

I've re-implemented batchAnnotate method of BertSentenceEmbeddings and BertEmbeddings. Instead of processing the annotations separately for each row, I collect put all the annotations in a single collection and make sure the allocation of annotations per rows is preserved by zipping annotations to their row index. The all annotations are tokenized and passed to the TF graph to compute their (sentence or word) embeddings. At the end embeddings are again distributed across rows.

Motivation and Context

It lets BertSentenceEmbeddings and BertEmbeddings processes sentences in batches when there are a few annotations per row. In such cases, the change significantly improves performance on single machine setups.

How Has This Been Tested?

I've made sure the results of the improved batch processing are identical to the ones produced by the current implementation. This test uses different JAR files so it is not included in the current commit.
I've also tested performance in order to demonstrate the changes indeed improve performance (see table above).
Test were run on a local SparkNLP installation.
Changes don't have any impact on functionality and shouldn't affect or break any existing code

Screenshots (if appropriate):

Types of changes

Bug fix (non-breaking change which fixes an issue)
Code improvements with no or little impact
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to change)

It is actually neither of the above, we need one more option 'code optimization with no functional implications but significant impact on performance'

Checklist:

My code follows the code style of this project.
My change requires a change to the documentation.
I have updated the documentation accordingly.
I have read the CONTRIBUTING page.
I have added tests to cover my changes.
All new and existing tests passed.

vankov · 2021-11-13T13:21:50Z

I also checked other models which override batchAnnotate and it seems that they all have the same problem. So the following models should be also updated if this optimization is accepted:

XxxForTokenClassificaiton models (com.johnsnowlabs.nlp.annotators.classifier.dl)
XxxEmbeddings models (com.johnsnowlabs.nlp.annotators.embeddings)
MarianTransformer (com.johnsnowlabs.nlp.annotators.seq2seq)

NerDL seems to be only model which currently allows batching across rows, but I need to double check this.

batching fixed for Bert sentence and word embeddings

e4b4982

vankov requested a review from maziyarpanahi November 13, 2021 13:13

maziyarpanahi self-assigned this Nov 13, 2021

maziyarpanahi added DON'T MERGE Do not merge this PR enhancement on-hold cannot be merged right away labels Nov 13, 2021

Update code style

24eb35a

maziyarpanahi changed the base branch from master to release/333-release-candidate November 17, 2021 10:15

maziyarpanahi approved these changes Nov 17, 2021

View reviewed changes

maziyarpanahi merged commit 7936487 into release/333-release-candidate Nov 17, 2021

maziyarpanahi mentioned this pull request Nov 17, 2021

Release/333 release candidate #6480

Merged

maziyarpanahi mentioned this pull request Jun 13, 2022

Optimizing batch processing for transformer-based Word Embeddings on GPU #9267

Merged

20 tasks

KshitizGIT deleted the optimize_bert_batch_processing branch March 2, 2023 11:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Optimized batch processing for Bert sentence and word embeddings #6462

Optimized batch processing for Bert sentence and word embeddings #6462

Uh oh!

vankov commented Nov 13, 2021 •

edited

Loading

Uh oh!

vankov commented Nov 13, 2021

Uh oh!

Uh oh!

Optimized batch processing for Bert sentence and word embeddings #6462

Optimized batch processing for Bert sentence and word embeddings #6462

Uh oh!

Conversation

vankov commented Nov 13, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Motivation and Context

How Has This Been Tested?

Screenshots (if appropriate):

Types of changes

Checklist:

Uh oh!

vankov commented Nov 13, 2021

Uh oh!

Uh oh!

vankov commented Nov 13, 2021 •

edited

Loading