Popular Tags

ajax analysis annotations ant apache api archetype aspect asynchronously beans binding bpm build buildsystem bytecode cache cms codecoverage codehaus collections concurrency container database directory distributed doc eclipse ejb esb format framework graph graphics hadoop hibernate html http ide imap io jbi jdbc jdo jini jms jmx jndi jsf jsp language logging mail maven metadata microsoft mock net osgi parser pdf persistence plugin pool portal portlet query regexp rmi rpc rss ruleengine scheduling scm scripting security server servlet soa soap socket spring ssh svg swt system taglib template testing transaction ui web webdav webframework webserver webservice workflow xml xquery xslt

[See All Tags]
home » org.ccil.cowan.tagsoup » tagsoup » 1.2


TagSoup is a SAX-compliant parser written in Java that, instead of parsing well-formed or valid XML, parses HTML as it is found in the wild: poor, nasty and brutish, though quite often far from short. TagSoup is designed for people who have to process this stuff using some semblance of a rational application design. By providing a SAX interface, it allows standard XML tools to be applied to even the worst HTML. TagSoup also includes a command-line processor that reads HTML files and can generate either clean HTML or well-formed XML that is a close approximation to XHTML.

Artifact Download (JAR) (88 KB)
POM File View
Issue Tracker

This artifact depends on ...

Group Artifact Version

This artifact is used by ...

Group Artifact Version
com.github.riccardove.easyjasub easyjasub-lib 0.0.17
com.github.riccardove.easyjasub easyjasub-lib 0.1.0
com.randomnoun.common common-public 0.1.1
org.apache.tika tika-parsers 0.5
org.apache.tika tika-parsers 0.6
org.apache.tika tika-parsers 0.7
org.apache.tika tika-parsers 0.8
org.apache.tika tika-parsers 0.9
net.sf.ofx4j ofx4j 1.0 1.0.0
org.mnode.ical4j ical4j 1.0.2
org.mnode.ical4j ical4j 1.0.3
org.mnode.ical4j ical4j 1.0.4
net.sf.ofx4j ofx4j 1.1
com.randomnoun.maven.doxia doxia-module-html 1.1.0
net.sf.ofx4j ofx4j 1.2
net.sf.ofx4j ofx4j 1.3
net.sf.ofx4j ofx4j 1.4
com.googlecode.jgenhtml jgenhtml 1.5
net.sf.ofx4j ofx4j 1.6
net.sf.ofx4j ofx4j 1.6-RC1
net.sf.ofx4j ofx4j 1.6-RC2
org.apache.camel camel-bundle 1.6.0
org.apache.camel camel-bundle 1.6.1
org.apache.camel camel-bundle 2.0-M1
org.apache.camel camel-bundle 2.0-M2
org.apache.camel camel-bundle 2.0-M3
org.w3c.css css-validator 20100131
org.xml-cml cmlxom 3.1
org.basex basex 7.3.1


License URL
Apache License 2.0