TagSoup is a SAX-compliant parser written in Java that, instead of parsing well-formed or valid XML, parses HTML as it is found in the wild: poor, nasty and brutish, though quite often far from short. TagSoup is designed for people who have to process this stuff using some semblance of a rational application design. By providing a SAX interface, it allows standard XML tools to be applied to even the worst HTML. TagSoup also includes a command-line processor that reads HTML files and can generate either ...
License | Apache 2.0 |
---|---|
Categories | HTML Parsers |
Tags | htmlparserdom |
HomePage | http://home.ccil.org/~cowan/XML/tagsoup/ |
Date | Aug 22, 2011 |
Files | pom (1 KB) jar (88 KB) View All |
Repositories | CentralClojarsEGov PubGeomajasLoeyaeMulesoftRedhat GASciJava PublicTerrestrisUSITWSO2 Public |
Ranking | #2750 in MvnRepository (See Top Artifacts) #4 in HTML Parsers |
Used By | 186 artifacts |
Compile Dependencies (0)
Category/License | Group / Artifact | Version | Updates |
---|
Licenses
License | URL |
---|---|
Apache License 2.0 | http://www.apache.org/licenses/LICENSE-2.0.txt |
Developers
Name | Dev Id | Roles | Organization | |
---|---|---|---|---|
John Cowan |