Artifacts using webarchive-commons version 1.1.8
1. Heritrix 3: 'commons' Subproject (utility Classes)10 usages
org.archive.heritrix » heritrix-commonsApacheLGPL
The Archive Commons Code Libraries project contains general Java utility
libraries, as used by the Heritrix crawler and other projects.
Last Release on Jul 27, 2022
NLPA is a framework designed to operate in conjuction with BDP4J
(https://github.com/sing-group/bdp4j) and able to extract texts from
Twitter, Youtube Comments, text files, raw email files (.eml) or WARC
(Web Archive) files. The extracted text can be preprocessed into a
Dataset using task (org.bdp4j.pipe.Pipe) definitions. This framework
incorporates more than 30 preprocessing tasks to transform the text.
Last Release on Jul 26, 2021
9. WARC Discovery
uk.bl.wa.discovery » warc-discoveryApacheGPL
WARC Discovery
Last Release on Nov 28, 2020