This project is read-only.

Project Description

The HTML Eskulap .NET makes it easier for .Net developers to read many of the real world web documents that have malformed, implicit structure, navigate tree of these documents and extract required information for further analysis by using XPath technique.

Using this library, you can:

  • Load SGML from streams or Internet resources.
  • Correct HTML syntax.
  • Correct SGML document structure using DTD.
  • Query SGML trees using XPath queries.
  • Query SGML trees using LINQ queries.
  • Manipulate in-memory SGML trees.
  • Convert SGML to XML.
  • Auto detect character encoding in a document.

Example

The following example loads a DTD schema from Internet, creates SGML document based on it and loads Google web page with several links.

DtdSchema schema = DtdSchema.Load(new Uri("http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"), "html");

SgmlDocument google = new SgmlDocument(schema);
google.Load(new Uri("http://www.google.com/search?q=Crawling"));

string html = google.ToString();

IEnumerable<SElement> links = google.SelectElements("html/body//p/a");
foreach (SElement anchor in links) 
{
    Debug.Print(anchor.Attribute("href"));
}

Last edited May 14, 2012 at 8:49 PM by Kir_Privalov, version 11