I'm reading again the Nonlinear Time Series Analysis textbook by Kantz & Schreiber. In section 6.5 they describe a rather straightforward method for estimating a good value of the Theiler window. It is a method by Provenzale et al
Provenzale, A., Smith, L. A., Vio, R. &Murante, G. (1992). Distinguishing between low-dimensional dynamics and randomness in measured time series. Physica D, 58, 31. https://www.sciencedirect.com/science/article/abs/pii/0167278992901002
From Kantz's textbook:
The idea is that in the presence of temporal correlations the probability that a given pair of points has a distance smaller than ε does not only depend on ε but also on the time that has elapsed between the two measurements. This dependence can be detected by plotting the number of pairs as a function of two variables, the time separation Δt and the spatial distance ε. Increase versus Δt is very rapid at the start, but quickly saturates as Δt increases.
I think implementing this method is easy, and a threshold for saturation of increase versus Δt (keyword argument) can be used to choose w.