The SGML Eskulap .NET is a software component that can read data conforming to some grammar and build an in-memory representation of the structure. There is no dependency on Internet Explorer’s MSHTML dll or W3C’s HTML Tidy or ActiveX/COM object and can be esealy used as server side component.
The main goal to develop this library is ability to read many of the real world web documents that have malformed, implicit structure, navigate tree of these documents and extract required information for further analysis.

Tag Soup

In Web development, "tag soup" refers to formatted markup written for a web page that is very much like HTML but does not consist of correct HTML syntax and document structure. SGML Eskulap .NET treated HTML syntax or structural errors leniently, there has been little pressure for web developers to follow published standards, and therefore there is a need for SGML Eskulap .NET be able to treat what looks like HTML as "tag soup", accepting and correcting for invalid syntax and structure. So there are lot of complexities that are involved in trying to provide consistent in-memory, tree-like document representation.

Invalid document structure

Invalid structure where elements are improperly nested according to the DTD for the document. Examples of this include nesting a "ul" element directly inside another "ul" element for any of the HTML 4.01 or XHTML DTDs. Many graphic web editors still produce invalid markup. Moreover, many professional web designers and authors pay little attention to issues of validity. It is common to see invalid markup in many of the sites throughout the World Wide Web.
The Validation engine (part of SGML Eskulap .NET) can detect an out-of-order document structure and recover it to according with DTD Schema. This transformation uses the HTML 4.01 Transitional document type definition (DTD) as a reference (with some modifications). If you need, you can choose another DTD Schema while creating or loading SgmlDocument object.

