Indexed Artifacts (13.2M)

Popular Categories

TagSoup is a SAX-compliant parser written in Java that, instead of parsing well-formed or valid XML, parses HTML as it is found in the wild: poor, nasty and brutish, though quite often far from short. TagSoup is designed for people who have to process this stuff using some semblance of a rational application design. By providing a SAX interface, it allows standard XML tools to be applied to even the worst HTML. TagSoup also includes a command-line processor that reads HTML files and can generate either ...

LicenseApache 2.0
CategoriesHTML Parsers
Date(Aug 22, 2011)
Filespom (1 KB)  jar (88 KB)  View All
RepositoriesCentralAdobeClojarsGeomajasRedhat GASonatype
Used By128 artifacts

Compile Dependencies (0)

Category/License Group / ArtifactVersionUpdates


Apache License 2.0


NameEmailDev IdRolesOrganization
John Cowan