Indexed Artifacts (6.17M)

Popular Categories

HTML Parsers

Sort: popular | newest
jsoup is a Java library for working with real-world HTML. It provides a very convenient API for extracting and manipulating data, using the best of DOM, CSS, and jquery-like methods. jsoup implements the WHATWG HTML5 specification, and parses HTML to the same DOM as modern browsers do.
Last Release on Jan 4, 2017
An HTML parser and tag balancer.
Last Release on May 14, 2015
TagSoup is a SAX-compliant parser written in Java that, instead of parsing well-formed or valid XML, parses HTML as it is found in the wild: poor, nasty and brutish, though quite often far from short. TagSoup is designed for people who have to process this stuff using some semblance of a rational application design. By providing a SAX interface, it allows standard XML tools to be applied to even the worst HTML. TagSoup also includes a command-line processor that reads HTML files and can generate either ...
Last Release on May 15, 2015
JTidy is a Java port of HTML Tidy, a HTML syntax checker and pretty printer. Like its non-Java cousin, JTidy can be used as a tool for cleaning up malformed and faulty HTML. In addition, JTidy provides a DOM interface to the document that is being processed, which effectively makes you able to use JTidy as a DOM parser for real-world HTML.
Last Release on May 14, 2015
HtmlCleaner is an HTML parser written in Java. It transforms dirty HTML to well-formed XML following the same rules that most web-browsers use.
Last Release on Feb 11, 2017
HTML Parser is the high level syntactical analyzer.
Last Release on May 15, 2015
Jericho HTML Parser is a java library allowing analysis and manipulation of parts of an HTML document, including server-side tags, while reproducing verbatim any unrecognised or invalid HTML.
Last Release on Oct 25, 2015


A patched version of the nu.validator v1.2.1 HTML parser.
Last Release on May 20, 2015
Jodd Lagarto is fast and versatile all purpose HTML parser. Includes Jerry and CSSelly.
Last Release on Apr 10, 2017
HTML Lexer is the low level lexical analyzer.
Last Release on May 15, 2015