Skip to content

Gitea preprod#1538

Draft
SebastianGode wants to merge 466 commits intopreprodfrom
gitea-preprod
Draft

Gitea preprod#1538
SebastianGode wants to merge 466 commits intopreprodfrom
gitea-preprod

Conversation

@SebastianGode
Copy link
Contributor

No description provided.

…g storage

- Remove tempauth from Swift proxy pipeline (was causing 503 errors)
- Add formpost middleware for unauthenticated uploads
- Configure clouds.yaml with auth_type: none and direct endpoint
- Simplifies authentication for internal-only log storage cluster
…on scanning

Problem: Kolla was scanning and setting permissions on ALL files in the NFS
mount (/srv/node) including git repository files, causing 503 errors during
object uploads due to startup delays and resource exhaustion.

Solution:
- Mount NFS to /mnt/nfs-data instead of /srv/node
- Use subPath to mount only swift-storage/sda to /srv/node/sda
- Create /srv/node as emptyDir to avoid contamination from NFS root
- Init container creates swift-storage/sda directory in NFS share

This isolates Swift storage from other NFS content and prevents Kolla from
processing thousands of irrelevant files during startup.
The volume is named 'nfs-storage' but volumeClaimTemplates was still using
'storage', causing PVC not found error. StatefulSet volumeClaimTemplates
automatically creates PVCs with pattern <name>-<statefulset>-<ordinal>.
The volumeClaimTemplate 'storage' creates PVC named 'storage-<sts-name>-0'.
Reference this PVC explicitly for nfs-storage volume to avoid PVC not found.
StatefulSet volumeClaimTemplate named 'storage' automatically creates a volume
with that name. All volumeMounts must reference 'storage', not 'nfs-storage'.
Problem: With 1 replica, Swift still requires quorum of 2 responses (replicas/2+1),
causing 503 errors since we only have 1 storage node.

Solution: Explicitly set quorum sizes to 1 for object, container, and account
operations to allow writes with single successful response.
Problem: Storage server returns 507 Insufficient Storage because Swift checks
disk space of /srv/node (emptyDir) instead of /srv/node/sda (NFS mount).

Solution: Disable fallocate and set fallocate_reserve to 0 to bypass disk
space checks. Safe for preprod since we have 500GB available on NFS.
Swift checks available disk space on the 'devices' path (/srv/node) before
accepting writes. With our subdirectory mount architecture (/srv/node/sda
mounted from NFS under /srv/node emptyDir), this check sees the overlay
filesystem (29GB free) instead of the NFS mount (500GB free).

Setting disk_size_min to 1KB effectively disables the minimum space check,
allowing writes to proceed. This is safe since the actual NFS capacity
is monitored externally.
Change devices path from /srv/node to /srv/node/sda/.. so that Swift's
disk space check (statvfs) operates on the NFS mount (500GB) instead of
the emptyDir overlay filesystem (30GB).

The /srv/node/sda/.. path resolves to the actual NFS mount point, allowing
Swift to see the correct available space.
Instead of using subPath mounts, mount the NFS volume directly to /srv/node.
This allows Swift's statvfs() disk space check to see the actual NFS
filesystem (500GB) instead of the overlay filesystem (30GB).

Changes:
- Init container creates sda subdirectories directly in NFS mount
- All containers mount storage volume to /srv/node (no subPath)
- Removed srv-node emptyDir volume (no longer needed)
- devices=/srv/node in object-server.conf now points to real NFS mount

This resolves the 507 Insufficient Storage error by giving Swift accurate
visibility into available storage capacity.
With NFS mounted directly to /srv/node, Swift's disk space check now
correctly sees 500GB available. Removed workaround settings:
- devices back to /srv/node (was /srv/node/sda/..)
- Removed disable_fallocate, fallocate_reserve, disk_size_min

Swift should now properly detect available storage capacity.
Even with NFS mounted to /srv/node, Swift's fallocate check fails because
/srv/node/sda is a subdirectory, not a mount point. Adding back:
- disable_fallocate = true
- fallocate_reserve = 0

This bypasses Swift's space pre-allocation checks which don't work correctly
with subdirectory-based device organization.
Changed NFS mount path from root (/) to /swift-storage to avoid
mounting the git repository files. This gives Swift a clean directory
structure without thousands of unrelated files.

The /swift-storage directory on the NFS server contains only Swift
data structures (sda device with accounts/containers/objects/tmp).
Swift requires device paths to be actual mount points for proper
disk space detection. Changed from mounting to /srv/node to mounting
directly to /srv/node/sda with /srv/node as an emptyDir parent.

This makes /srv/node/sda appear as a separate filesystem mount to
Swift's ismount() checks, allowing disk space detection to work.
With /srv/node/sda now being a real NFS mount point (not a subdirectory),
enable mount_check=true so Swift properly validates the device mount.
Removed all workaround settings - Swift should now work normally.
- Use csi-disk StorageClass for dynamic EVS provisioning
- 100GB volume with ReadWriteOnce access
- CSI driver handles disk attachment and mounting automatically
- Will need to convert filesystem from ext4 to XFS after first mount

This replaces the problematic NFS setup that caused 507 errors.
…8.150.167)

- Replace NFS with node-local XFS storage using OTC EVS disk
- Disk /dev/sdk formatted with XFS (isize=1024) mounted at /mnt/swift-storage
- Configure 100GB on node 192.168.150.167
- Add nodeAffinity to bind pod and PV to correct node
- Fixes persistent 507 errors from NFS incompatibility

The OTC EVS disk 'swift-proxy' is attached to node 192.168.150.167,
formatted with XFS, and mounted at /mnt/swift-storage.
- Format 100GB OTC EVS disk (/dev/sdk) with XFS (isize=1024)
- Mount at /mnt/swift-storage on node 192.168.150.167
- Switch from NFS to local-storage with nodeAffinity
- Fixes persistent 507 errors from NFS incompatibility

The disk is initialized and mounted with proper XFS settings
required by Swift (isize=1024 for metadata storage).
@gitguardian
Copy link

gitguardian bot commented Feb 12, 2026

⚠️ GitGuardian has uncovered 1 secret following the scan of your pull request.

Please consider investigating the findings and remediating the incidents. Failure to do so may lead to compromising the associated services or software components.

🔎 Detected hardcoded secret in your pull request
GitGuardian id GitGuardian status Secret Commit Filename
26230584 Triggered Generic Private Key a620f59 services/gitea-api-adapter/certs/key.pem View secret
🛠 Guidelines to remediate hardcoded secrets
  1. Understand the implications of revoking this secret by investigating where it is used in your code.
  2. Replace and store your secret safely. Learn here the best practices.
  3. Revoke and rotate this secret.
  4. If possible, rewrite git history. Rewriting git history is not a trivial act. You might completely break other contributing developers' workflow and you risk accidentally deleting legitimate data.

To avoid such incidents in the future consider


🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.

- Set volumeName to empty string to override default from values.yaml,
  allowing the csi-disk StorageClass to dynamically provision PVs
- Remove runAsNonRoot/runAsUser/runAsGroup from pod and container
  securityContext since gitea/gitea uses s6-overlay which must start
  as root and drops privileges internally
- Keep fsGroup:1000 for PVC file access and seccomp profile
s6-overlay starts as root then drops to uid 1000. The init container
also runs as root, creating app.ini owned by root. Add chown to ensure
the git user (1000) can write to /data/gitea at runtime.
Gitea also writes to /data/git/.ssh/ for authorized_keys management.
Expand chown to cover all of /data so the git user (1000) has write
access everywhere.
Gitea uses LevelDB for queue storage which requires an exclusive lock.
RollingUpdate causes two pods to run simultaneously sharing the same
PVC, leading to 'resource temporarily unavailable' on the LevelDB lock.
Recreate strategy ensures the old pod is terminated before the new one
starts.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants