Indexed Artifacts (6.05M)

Popular Categories

TagSoup is a SAX-compliant parser written in Java that, instead of parsing well-formed or valid XML, parses HTML as it is found in the wild: poor, nasty and brutish, though quite often far from short. TagSoup is designed for people who have to process this stuff using some semblance of a rational application design. By providing a SAX interface, it allows standard XML tools to be applied to even the worst HTML. TagSoup also includes a command-line processor that reads HTML files and can generate either ...

LicenseApache 2.0
CategoriesHTML Parsers
Tagshtmlparser
Used By149 artifacts



VersionRepositoryUsagesDate
1.2.x
1.2.1Central100(Aug, 2011)
1.2Central50(Jan, 2008)
1.1.x
1.1.3Central2(Jun, 2007)
1.0.x
1.0.1Central 0 (Jun, 2007)
0.9.x
0.9.7Central1(Dec, 2005)