Skip to content

Out of memory bug in DCU calculation (Stress Memory) #3710

@pxlxingliang

Description

@pxlxingliang

Describe the bug

I use Sugon DCU to calculate the SCF of 216 Si, and when calcualte the stress, ABACUS stopped, and throw below error:

009_216Si.zip

Unexpected Device Error /public/home/abacus/abacus-dcu/source/module_psi/kernels/rocm/memory_op.hip.cu:48: hipErrorOutOfMemory, out of memory

Expected behavior

No response

To Reproduce

No response

Environment

No response

Additional Context

No response

Task list for Issue attackers (only for developers)

  • Verify the issue is not a duplicate.
  • Describe the bug.
  • Steps to reproduce.
  • Expected behavior.
  • Error message.
  • Environment details.
  • Additional context.
  • Assign a priority level (low, medium, high, urgent).
  • Assign the issue to a team member.
  • Label the issue with relevant tags.
  • Identify possible related issues.
  • Create a unit test or automated test to reproduce the bug (if applicable).
  • Fix the bug.
  • Test the fix.
  • Update documentation (if necessary).
  • Close the issue and inform the reporter (if applicable).

Metadata

Metadata

Assignees

Labels

GPU & DCU & HPCGPU and DCU and HPC related any issues

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions