Inconsistencies while running PySR on local computer VS remote cluster #1017

lerouxerwan · 2025-08-28T13:30:58Z

lerouxerwan
Aug 28, 2025

Hello everyone,

I am trying to run PySR on a remote cluster (set up with Ubuntu 24.04.2, just like my local computer).

I have pulled my github code on the cluster, set up a virtual environment exactly the same as on my local computer (python 3.12.3, pip 25.2, julia 1.11.6, PySR 1.5.8).
I run tests (that are successful on my local computer) on the cluster. These tests basically fit a PySRRegressor to different types of data.
Most tests are successful, but some fail. In particular, when I look at one the failing case, I notice that the 'equation_' dataframes are not even of the same size: 15 for the remote cluster, 13 for the local computer (even though the parameters of PySR, the data and the environment are the same between the remote cluster and the local computer). Note that the random state in PySR is fixed so the issue does not come from that.

My question is: what have I missed ? Is there something else that I should check on the remote cluster to ensure that the virtual environment is exactly the same as on my local computer ? (Because after some printing, I am already certain that the data and parameters are exactly the same, so the issue is likely due to an issue of environment, or could there be another reason ?).

PS: For the test that fails, the input data and PySR parameters are quite complex. This is why I preferred to ask this question without providing any code/example.

MilesCranmer · 2025-08-28T14:52:07Z

MilesCranmer
Aug 28, 2025
Maintainer

You have to use deterministic=True (and parallelism="serial") to get true deterministic behavior. Deterministic parallelism is much harder to set up, but it is on the road map.

Also, how are you specifying the random state?

2 replies

lerouxerwan Aug 28, 2025
Author

Thank you for the answer.

Yes I followed the documentation and set deterministic to True, and parallelism to serial, I also set by the random state to 42 by default (I am using this random state within the custom sub class of PySRRegressor that I created).

I have a test that checks that the fit function of my sub class is indeed deterministic (I use the repeat decorator from pytest) and this test works well both on my local computer and on the remote cluster.

That’s why I believe the issue is likely not about randomness, but more related to the difference of settings between the local computer and the remote cluster

I checked the version of all python packages and of all Julia package in the local environment, and they are the same

Therefore, I was just wondering if I missed something else (other packages, or dependencies) that I should compare/check between the local computer and the remote cluster, and that might explain the tests that fail

MilesCranmer Aug 28, 2025
Maintainer

Interesting! This is very strange because the default random number generator in Julia is Xoshiro, which is designed to be consistent across different hardware and operating systems. (https://docs.julialang.org/en/v1/stdlib/Random/)

So I'm not sure what's going on. If you put it in a script, do you still see differences? I don't know if it's pytest fault or not

lerouxerwan · 2025-08-28T19:48:59Z

lerouxerwan
Aug 28, 2025
Author

I followed your advise and managed to isolate/put the issue in a script below that only depends on numpy=2.24 and pysr=1.5.8

from pysr import PySRRegressor

from utils_dataset import variable_names, X_units, y_units, X, y
from utils_model import params

model = PySRRegressor(**params)
model.fit(X, y, variable_names=variable_names, X_units=X_units, y_units=y_units)
print(f'{model.equations_} equations with complexity: {model.equations_['complexity'].to_list()}')

and this script prints:

"13 equations with complexity: [1, 3, 4, 5, 7, 9, 10, 11, 12, 14, 16, 17, 19]" on my local computer
"15 equations with complexity: [1, 3, 4, 5, 6, 7, 9, 10, 11, 12, 13, 15, 16, 18, 20]" on the remote cluster

I put this script in a python file main.py and it simply requires to have the following python files in the same folder: one python file for the data utils_dataset.py, the other with parameters for PySR utils_model.py

This later file may be the most interesting for you as it contains 'params' the dictionary specifying PySR parameters. Maybe from this file, you will be able to spot which parameters generate this weird behavior, or which obvious mistake I might have made.

Thanks in advance for your feedback !

9 replies

lerouxerwan Aug 29, 2025
Author

Thanks for the answer,

Yes to be succinct, I did not show params in my previous messages, it was in the "utils_model.py" python file

Here it is:

params = {
  'model_selection': 'best', 
  'adaptive_parsimony_scaling': np.float64(816.4197141003457), 
  'alpha': 3.17,
  'annealing': False, 
  'autodiff_backend': None, 
  'batch_size': 50, 
  'batching': False, 
  'binary_operators': None,
  'bumper': False,
  'cluster_manager': None, 
  'complexity_mapping': None, 
  'complexity_of_constants': None,
  'complexity_of_operators': None, 
  'complexity_of_variables': None, 
  'constraints': None,
  'crossover_probability': np.float64(0.03645040661041819), 
  'denoise': False, 
  'deterministic': True,
  'dimensional_constraint_penalty': 100000000, 
  'dimensionless_constants_only': False,
  'early_stop_condition': None, 
  'elementwise_loss': None, 
  'expression_spec': None, 
  'extra_jax_mappings': None,
  'extra_sympy_mappings': None, 
  'extra_torch_mappings': None, 
  'fast_cycle': False,
  'fraction_replaced': np.float64(0.0004429864911833153),
  'fraction_replaced_hof': np.float64(0.09963726885503711), 
  'heap_size_hint_in_bytes': None,
  'hof_migration': True, 
  'input_stream': 'stdin',
  'logger_spec': None, 
  'loss_function': None, 
  'max_evals': None,
  'maxdepth': None, 
  'maxsize': 20, 
  'migration': True, 
  'ncycles_per_iteration': 472, 
  'nested_constraints': None,
  'niterations': 109, 
  'optimize_probability': np.float64(0.07970689105408878), 
  'optimizer_algorithm': 'BFGS',
  'optimizer_f_calls_limit': 8324, 
  'optimizer_iterations': 8, 
  'optimizer_nrestarts': 1,
  'output_directory': None, 
  'output_jax_format': False, 
  'output_torch_format': False, 
  'parallelism': 'serial',
  'parsimony': 0.0, 
  'perturbation_factor': np.float64(0.09045990289487015), 
  'population_size': 31,
  'populations': 59, 
  'precision': 64, 
  'print_precision': 5,
  'probability_negate_constant': np.float64(0.006406594415059639), 
  'procs': None, 'progress': True,
  'random_state': 42, 
  'run_id': None, 
  'select_k_features': None, 
  'should_optimize_constants': True,
  'should_simplify': True, 
  'skip_mutation_failures': True, 
  'tempdir': None, 'timeout_in_seconds': None,
  'topn': 20, 'tournament_selection_n': 18, 
  'tournament_selection_p': 0.982, 
  'turbo': False,
  'unary_operators': ['square', 'sqrt'], 
  'update': False, 
  'update_verbosity': None, 
  'use_frequency': True,
  'use_frequency_in_tournament': True, 
  'verbosity': 0, 
  'warm_start': False, 
  'warmup_maxsize_by': None,
  'weight_add_node': np.float64(3.716987016612159), 
  'weight_delete_node': np.float64(0.8731863557229343),
  'weight_do_nothing': np.float64(0.3037130122259829), 
  'weight_insert_node': np.float64(0.011084426436652957),
  'weight_mutate_constant': np.float64(0.022677443556439466),
  'weight_mutate_operator': np.float64(0.39883855941752566), 
  'weight_optimize': 0.0,
  'weight_randomize': np.float64(0.00037043799727333546), 
  'weight_rotate_tree': np.float64(2.2030242310409216),
  'weight_simplify': np.float64(0.0025569918001819283), 
  'weight_swap_operands': np.float64(0.1265512527625602)
}

Maybe you will be able to spot which parameters generate this weird behavior, or which obvious mistake I might have made.

MilesCranmer Aug 29, 2025
Maintainer

It might be helpful to try to make a MWE? https://en.wikipedia.org/wiki/Minimal_reproducible_example

Particularly this part:

The important feature of a minimal reproducible example is that it is as small and as simple as possible, such that it is just sufficient to demonstrate the problem, but without any additional complexity or dependencies that will make resolution harder

Basically I would try first with a really simple example, no custom hyperparameters (except the determinism ones), and see if the bug still happens.

lerouxerwan Aug 29, 2025
Author

Thanks for the response,

Next week, I will try to find an example as simple as possible that triggers the bug

However, I am not confident that this is possible, because in many other cases (especially with default parameters) the results were the same between my local machine and the remote cluster. So I think the bug (if it is a bug, and not some mistake that I made somewhere) highlighted here, is probably just a corner case for some specific parameters.

MilesCranmer Aug 29, 2025
Maintainer

It's hard to see just by a glance what might be causing it, since I can't thing of anything that would do this other than mismatched versions.

One final test before you build a MWE: can you set an environment variable when running it to force the threads? JULIA_NUM_THREADS=1 python main.py for example. I'm just curious if there is any parallelism happening somewhere that might be messing with the results.

MilesCranmer Aug 29, 2025
Maintainer

Oh wait. I just saw you wrote this earlier:

I am using this random state within the custom sub class of PySRRegressor

Are you sure this is not the cause of the issue?

If confident, ca you share the full code for this? Maybe as a single file? perhaps as a git gist. Ideally I can actually run this code.

And make it as minimal as possible while still getting the different behavior.

lerouxerwan · 2025-09-01T10:20:55Z

lerouxerwan
Sep 1, 2025
Author

I will create a minimal reproducible example using only PySRRegressor (and not a custom subclass) and with as much default parameter of PySRRegressor, and open a "bug" for this minimal reproducible example.

In the meantime, I will close this discussion. Thanks again for your answers

0 replies

Inconsistencies while running PySR on local computer VS remote cluster #1017

Uh oh!

lerouxerwan Aug 28, 2025

Replies: 3 comments · 11 replies

Uh oh!

MilesCranmer Aug 28, 2025 Maintainer

Uh oh!

lerouxerwan Aug 28, 2025 Author

Uh oh!

MilesCranmer Aug 28, 2025 Maintainer

Uh oh!

lerouxerwan Aug 28, 2025 Author

Uh oh!

lerouxerwan Aug 29, 2025 Author

Uh oh!

MilesCranmer Aug 29, 2025 Maintainer

Uh oh!

lerouxerwan Aug 29, 2025 Author

Uh oh!

MilesCranmer Aug 29, 2025 Maintainer

Uh oh!

MilesCranmer Aug 29, 2025 Maintainer

Uh oh!

lerouxerwan Sep 1, 2025 Author

lerouxerwan
Aug 28, 2025

Replies: 3 comments 11 replies

MilesCranmer
Aug 28, 2025
Maintainer

lerouxerwan Aug 28, 2025
Author

MilesCranmer Aug 28, 2025
Maintainer

lerouxerwan
Aug 28, 2025
Author

lerouxerwan Aug 29, 2025
Author

MilesCranmer Aug 29, 2025
Maintainer

lerouxerwan Aug 29, 2025
Author

MilesCranmer Aug 29, 2025
Maintainer

MilesCranmer Aug 29, 2025
Maintainer

lerouxerwan
Sep 1, 2025
Author