-
Notifications
You must be signed in to change notification settings - Fork 5.1k
Description
I am using HtmlAgilityPack within a heavily threaded application to process html files. Occassionally there are badly misformed HTML files with many nested child objects which cause a stack overflow error.
HtmlAgilityPack has an option to limit child nodes:
//get the title
try
{
var htmlDoc = new HtmlDocument();
htmlDoc.OptionMaxNestedChildNodes = 200; //this prevents stack overflow
htmlDoc.LoadHtml(page.html);
var t = htmlDoc.DocumentNode.Descendants("title").FirstOrDefault();
if (t != null)
{
title = System.Net.WebUtility.HtmlDecode(t.InnerText);
//get rid of tabs, linebreaks
title = title.Replace('\r', ' ').Replace('\t', ' ').Trim().Replace("\n", "###");
}
}
catch (Exception e)
{
//ignore title
}
[This code is running in one of 16 threads. The entire program is running on a 16-core Ubuntu or Azure VM.]
This fixed the error on Ubuntu. However, on Windows I keep getting stack overflow error. Are there differences in stack management between Ubuntu and Windows that could cause this?
Update: I fixed it by manually increasing maxStackSize in the Thread constructor to 16MB. My understanding is that it's 1MB for 32bit and 4MB for 64 bit processes. It it possible that the (a) the stack is used differently on Linux and Window netcore runtime or (b) the default stack size values are different on Linux and Windows?