When networkTopology.mode is not hard, but subgroups exist, scheduling is not possible.

### Description

When networkTopology.mode is not hard, but subgroups exist, scheduling is not possible.


### Steps to reproduce the issue

Create job like below yaml:

```
apiVersion: batch.volcano.sh/v1alpha1
kind: Job
metadata:
  name: vcjob-with-subgroups
  namespace: default
spec:
  schedulerName: volcano
  queue: test
#  networkTopology:
#    mode: soft
#    highestTierAllowed: 1
  tasks:
    - name: worker
      replicas: 2
      partitionPolicy:
        totalPartitions: 2
        partitionSize: 1
        minPartitions: 2
      template:
        spec:
          restartPolicy: OnFailure
          containers:
            - name: worker
              image: nginx:latest
              resources:
                requests:
                  cpu: "2"
                  memory: "40Gi"
```

We see that the pod cannot be scheduled: 
```
[user@host dir]# kubectl get vcjob
NAME                   STATUS    MINAVAILABLE   RUNNINGS   AGE
vcjob-with-subgroups   Pending   2                         5s
[user@host dir]# kubectl get pg
NAME                                                        STATUS    MINMEMBER   RUNNINGS   AGE
vcjob-with-subgroups-fd291270-29cb-4f7b-9fb6-62ff5dc3354b   Inqueue   2                      8s
[user@host dir]# kubectl get pod
NAME                            READY   STATUS    RESTARTS   AGE
vcjob-with-subgroups-worker-0   0/1     Pending   0          12s
vcjob-with-subgroups-worker-1   0/1     Pending   0          12s
```

### Describe the results you received and expected

The pod is running.

### What version of Volcano are you using?

master

### Any other relevant information

The scheduler has the following logs:
```
E1224 06:03:19.760817       1 allocate.go:330] "Can not find default subJob or tasks for job" job="default/vcjob-with-subgroups-fd291270-29cb-4f7b-9fb6-62ff5dc3354b" subJobExist=false tasksExist=false
```

Based on the logs, the cause is easily identified: For jobs with subgroups but no hard topology, scheduling will enter the [else](https://github.com/volcano-sh/volcano/blob/06457b06bbf45a8bf98d0b6cccbce65f592b14d2/pkg/scheduler/actions/allocate/allocate.go#L316). However, due to the existence of subgroups, `job.SubJob` and `actx.tasksNoHardTopology` will actually be indexed using their own SubJobID. Therefore, `sjExist` and `tasksExist` will both return false, causing scheduling failure.

Solution: 
* Use `allocateForJob` for scheduling when a hard topology is set or a subgroup exists.

Performance: 

Although using `allocateForJob` will enter the `Network Topology` scheduling logic, using two nested loops. However, since no hard topology is used, meaning that the two nested loops will not get caught in the `Network Topology` logic, there are no performance issues.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

When networkTopology.mode is not hard, but subgroups exist, scheduling is not possible. #4871

Description

Steps to reproduce the issue

Describe the results you received and expected

What version of Volcano are you using?

Any other relevant information

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

When networkTopology.mode is not hard, but subgroups exist, scheduling is not possible. #4871

Description

Description

Steps to reproduce the issue

Describe the results you received and expected

What version of Volcano are you using?

Any other relevant information

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions