TagSoup is a SAX-compliant parser written in Java that, instead of parsing well-formed or valid XML, parses HTML as it is found in the wild: poor, nasty and brutish, though quite often far from short. TagSoup is designed for people who have to process this stuff using some semblance of a rational application design. By providing a SAX interface, it allows standard XML tools to be applied to even the worst HTML. TagSoup also includes a command-line processor that reads HTML files and can generate either ...
License | Apache 2.0 |
---|---|
Categories | HTML Parsers |
Tags | htmlparser |
HomePage | http://home.ccil.org/~cowan/XML/tagsoup/ |
Date | Jan 06, 2008 |
Files | pom (1 KB) jar (87 KB) View All |
Repositories | CentralAdobePublicGeomajasMarketceteraMulesoftSonatype |
Ranking | #2471 in MvnRepository (See Top Artifacts) #3 in HTML Parsers |
Used By | 173 artifacts |
Compile Dependencies (0)
Category/License | Group / Artifact | Version | Updates |
---|
Licenses
License | URL |
---|---|
Apache License 2.0 | http://www.apache.org/licenses/LICENSE-2.0.txt |