Skip to content

HTMLReader handles numeric and named entity references different ways #1

@Quutti

Description

@Quutti

function THtmlReader.ReadNumericEntityNode: Boolean;

ReadNumericEntityNode function handles the readings different way than ReadNamedEntityNode. Numeric entity is read as TEXT_NODE and named entities are read as ENTITY_REFERENCE_NODE, also different events are triggered which causes HTMLParser to handle them in separate ways, which may cause problems when parsing HTML. I.e. /&lt/; and /&/#60/; are handled in separate ways.

You guys know if this is intended functionality or not? Does HTML parsing spec state that these has to be parsed on different ways or something?

I can also provide PR for fixing this if needed.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions