The default number of the `num_warps=4` and `threads_per_warp=32` are come form the NV GPU. It is not optimal for Intel GPU. We need use a proper default value based on GPU arch, double GRF and related setting for Intel GPU.