Skip to content

Parsing a string literal is quadratic in the length #16624

@huonw

Description

@huonw

E.g. this runs string literals "xxxx...xxxx" with various lengths through the compiler.

echo 'Len  Time (s)'
for N in 5000 10000 20000 40000 80000 160000; do
    echo -n "$N "
    python -c "print('fn main() { \"' + 'x' * $N + '\"; }')" |
        rustc - --no-trans -Z time-passes  |
        sed -n 's/time: \([0-9.]*\).*parsing/\1/p'
done
Len  Time (s)
5000 0.030
10000 0.120
20000 0.490
40000 1.936
80000 8.491
160000 40.287

Each step takes about 4× as long than the previous, even though the string is only 2× longer.

Work-around: using concat! to break the string into smaller parts.

Perf profile (for 40000):

 89.88%  rustc  librustrt-4e7c5e5c.so   [.] str::is_utf8::h28c8d0366e23047eFZq
  5.18%  rustc  libstd-4e7c5e5c.so      [.] io::Writer::write_fmt::Adaptor$LT$$x27a$C$$x20T$GT$.fmt..FormatWriter::wri
  0.25%  rustc  librbml-4e7c5e5c.so     [.] reader::maybe_get_doc::h0e11190b08318145iNa
  0.21%  rustc  librustrt-4e7c5e5c.so   [.] je_mallocx
...

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-parserArea: The lexing & parsing of Rust source code to an AST

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions