TensorBoard callback without profile_batch setting cause Errors: CUPTI_ERROR_INSUFFICIENT_PRIVILEGES and CUPTI_ERROR_INVALID_PARAMETER

<em>Please make sure that this is a bug. As per our [GitHub Policy](https://github.com/tensorflow/tensorflow/blob/master/ISSUES.md), we only address code/doc bugs, performance issues, feature requests and build/installation issues on GitHub. tag:bug_template</em>

**System information**
- Have I written custom code (as opposed to using a stock example script provided in TensorFlow): Stateless LSTM from Keras tutorial using tf backend
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Windows 10
- TensorFlow installed from (source or binary): binary
- TensorFlow version (use command below): 2.1.0
- Python version: 3.7.4
- CUDA/cuDNN version: 10.1
- GPU model and memory: MX150 10GB

**Describe the current behavior**
When using tf.keras.callbacks.TensorBoard() without the profile_batch setting, it gives out errors of CUPTI_ERROR_INSUFFICIENT_PRIVILEGES and CUPTI_ERROR_INVALID_PARAMETER from tensorflow/core/profiler/internal/gpu/cupti_tracer.cc.

**Describe the expected behavior**
With profile_batch = 0, these two errors are gone. 
But comes back when profile_batch = 1, or other non-zero values.

**Code to reproduce the issue**
```

from __future__ import print_function
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, LSTM


input_len = 1000
tsteps = 2
lahead = 1
batch_size = 1
epochs = 5

print("*" * 33)
if lahead >= tsteps:
    print("STATELESS LSTM WILL ALSO CONVERGE")
else:
    print("STATELESS LSTM WILL NOT CONVERGE")
print("*" * 33)

np.random.seed(1986)

print('Generating Data...')


def gen_uniform_amp(amp=1, xn=10000):

    data_input = np.random.uniform(-1 * amp, +1 * amp, xn)
    data_input = pd.DataFrame(data_input)
    return data_input


to_drop = max(tsteps - 1, lahead - 1)
data_input = gen_uniform_amp(amp=0.1, xn=input_len + to_drop)

expected_output = data_input.rolling(window=tsteps, center=False).mean()

if lahead > 1:
    data_input = np.repeat(data_input.values, repeats=lahead, axis=1)
    data_input = pd.DataFrame(data_input)
    for i, c in enumerate(data_input.columns):
        data_input[c] = data_input[c].shift(i)

expected_output = expected_output[to_drop:]
data_input = data_input[to_drop:]


def create_model(stateful):
    model = Sequential()
    model.add(LSTM(20,
              input_shape=(lahead, 1),
              batch_size=batch_size,
              stateful=stateful))
    model.add(Dense(1))
    model.compile(loss='mse', optimizer='adam')
    return model

print('Creating Stateful Model...')
model_stateful = create_model(stateful=True)


def split_data(x, y, ratio=0.8):
    to_train = int(input_len * ratio)
    to_train -= to_train % batch_size
    x_train = x[:to_train]
    y_train = y[:to_train]
    x_test = x[to_train:]
    y_test = y[to_train:]

    # tweak to match with batch_size
    to_drop = x.shape[0] % batch_size
    if to_drop > 0:
        x_test = x_test[:-1 * to_drop]
        y_test = y_test[:-1 * to_drop]

    # some reshaping
    reshape_3 = lambda x: x.values.reshape((x.shape[0], x.shape[1], 1))
    x_train = reshape_3(x_train)
    x_test = reshape_3(x_test)

    reshape_2 = lambda x: x.values.reshape((x.shape[0], 1))
    y_train = reshape_2(y_train)
    y_test = reshape_2(y_test)

    return (x_train, y_train), (x_test, y_test)


(x_train, y_train), (x_test, y_test) = split_data(data_input, expected_output)
print('x_train.shape: ', x_train.shape)
print('y_train.shape: ', y_train.shape)
print('x_test.shape: ', x_test.shape)
print('y_test.shape: ', y_test.shape)

print('Creating Stateless Model...')
model_stateless = create_model(stateful=False)

import os
import datetime
ROOT_DIR = os.getcwd()
log_dir = os.path.join('callback_tests')
if not os.path.exists(log_dir):
    os.makedirs(log_dir)
print(log_dir)
tensorboard_callback = tf.keras.callbacks.TensorBoard(log_dir=log_dir)
                                       
print('Training')
history = model_stateless.fit(x_train,
                    y_train,
                    batch_size=batch_size,
                    epochs=epochs,
                    verbose=1,
                    validation_data=(x_test, y_test),
                    shuffle=False,
                    callbacks=[tensorboard_callback]
                    )


```

**Other info / logs**
Train on 800 samples, validate on 200 samples
2020-01-14 21:30:27.591905: I tensorflow/core/profiler/lib/profiler_session.cc:225] Profiler session started.
2020-01-14 21:30:27.594743: I tensorflow/core/profiler/internal/gpu/cupti_tracer.cc:1259] Profiler found 1 GPUs
2020-01-14 21:30:27.599172: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cupti64_101.dll
2020-01-14 21:30:27.704083: E tensorflow/core/profiler/internal/gpu/cupti_tracer.cc:1307] function cupti_interface_->Subscribe( &subscriber_, (CUpti_CallbackFunc)ApiCallback, this)failed with error CUPTI_ERROR_INSUFFICIENT_PRIVILEGES
2020-01-14 21:30:27.716790: E tensorflow/core/profiler/internal/gpu/cupti_tracer.cc:1346] function cupti_interface_->ActivityRegisterCallbacks( AllocCuptiActivityBuffer, FreeCuptiActivityBuffer)failed with error CUPTI_ERROR_INSUFFICIENT_PRIVILEGES
Epoch 1/5
2020-01-14 21:30:28.370429: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_10.dll
2020-01-14 21:30:28.651767: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2020-01-14 21:30:29.662864: E tensorflow/core/profiler/internal/gpu/cupti_tracer.cc:1329] function cupti_interface_->EnableCallback( 0 , subscriber_, CUPTI_CB_DOMAIN_DRIVER_API, cbid)failed with error CUPTI_ERROR_INVALID_PARAMETER
2020-01-14 21:30:29.670282: I tensorflow/core/profiler/internal/gpu/device_tracer.cc:88]  GpuTracer has collected 0 callback api events and 0 activity events.
800/800 [==============================] - 5s 6ms/sample - loss: 0.0011 - val_loss: 0.0011
Epoch 2/5
800/800 [==============================] - 3s 4ms/sample - loss: 8.5921e-04 - val_loss: 0.0010
Epoch 3/5
800/800 [==============================] - 3s 3ms/sample - loss: 8.5613e-04 - val_loss: 0.0010
Epoch 4/5
800/800 [==============================] - 3s 4ms/sample - loss: 8.5458e-04 - val_loss: 9.9713e-04
Epoch 5/5
800/800 [==============================] - 3s 4ms/sample - loss: 8.5345e-04 - val_loss: 9.8825e-04


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

TensorBoard callback without profile_batch setting cause Errors: CUPTI_ERROR_INSUFFICIENT_PRIVILEGES and CUPTI_ERROR_INVALID_PARAMETER #35860

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

TensorBoard callback without profile_batch setting cause Errors: CUPTI_ERROR_INSUFFICIENT_PRIVILEGES and CUPTI_ERROR_INVALID_PARAMETER #35860

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions