Skip to content

Conversation

@bertsky
Copy link
Collaborator

@bertsky bertsky commented Jul 23, 2021

Implements #263

@bertsky
Copy link
Collaborator Author

bertsky commented Jul 23, 2021

Too bad: This currently yields CUDA_ERROR_SYSTEM_DRIVER_MISMATCH in Tensorflow. Should have checked earlier (in core-cuda)...

@bertsky
Copy link
Collaborator Author

bertsky commented Jul 23, 2021

Too bad: This currently yields CUDA_ERROR_SYSTEM_DRIVER_MISMATCH in Tensorflow. Should have checked earlier (in core-cuda)...

It seems that the choice nvidia/cuda:11.3.1-cudnn8-runtime-ubuntu18.04 as base image now requires at least nvidia-driver-470 on the host system. I have 440 and 465 on systems available to me, neither of them can work the image. But that means we are making a sacrifice here: to be able to support the newest Tensorflow/CUDA as well, we are forcing all host systems to get a newer driver. (It just might be that upgrading the driver is easier than upgrading CUDA. But it's still quite inconvenient.)

@bertsky
Copy link
Collaborator Author

bertsky commented Jul 27, 2021

If you have the Nvidia repo source, you can just update cuda-drivers-470 which will take care of all dependencies. (But a fresh installation might work, too.)

Anyway, this does work (based on a locally built ocrd/core-cuda from OCR-D/core#704).

for venv in /usr/local/sub-venv/headless-tf*; do . $venv/bin/activate && python -c "import tensorflow as tf; print(tf.test.is_gpu_available())"; done

– yields True 3x

@bertsky
Copy link
Collaborator Author

bertsky commented Jan 17, 2022

Conflicting files

core

How are you supposed to keep PRs alive which involve subrepos then? I guess I'll have to update OCR-D/core#704 each time core master changes, and then in turn update here.

@bertsky
Copy link
Collaborator Author

bertsky commented Feb 4, 2022

So to sum up, we have two drawbacks here:

  • the base image size for the -cuda variants becomes even larger (for ocrd:core-cuda it's already 12 GB)
  • the host system needs a recent kernel driver to run the images (even for older CUDA)

But

  • considering what we gain here,
  • and how urgent this is (with detectron2 vs CUDA dependency bertsky/ocrd_detectron2#7 now even blocking our ocrd/all:maximum-cuda build),
  • and that this can probably go away as soon as we build thin images,
  • and that non-Docker and non-CUDA-Docker is not even affected,
  • and that it also provides a solution for native installations (i.e. running make cuda-ubuntu or merely make cuda-ldconfig as fixup),
  • and that dragging along these PRs with other changes (esp. if you want to combine them with other branches) is a lot of effort,

I'd say let's merge!

@kba kba merged commit 2b41f68 into OCR-D:master Feb 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants