TagSoup is a SAX-compliant parser written in Java that, instead of parsing well-formed or valid XML, parses HTML as it is found in the wild: poor, nasty and brutish, though quite often far from short. TagSoup is designed for people who have to process this stuff using some semblance of a rational application design. By providing a SAX interface, it allows standard XML tools to be applied to even the worst HTML. TagSoup also includes a command-line processor that reads HTML files and can generate either ...

LicenseApache 2.0
CategoriesHTML Parsers
Tagshtmlparserdom
Ranking#2695 in MvnRepository (See Top Artifacts)
#3 in HTML Parsers
Used By180 artifacts

VersionVulnerabilitiesRepositoryUsagesDate
1.2.x
1.2.1CentralAug 22, 2011
1.2CentralJan 06, 2008
1.1.x
1.1.3CentralJun 11, 2007
1.0.x
1.0.1Central
0
Jun 09, 2007
0.9.x
0.9.7CentralDec 20, 2005