NLPA is a framework designed to operate in conjuction with BDP4J
(https://github.com/sing-group/bdp4j) and able to extract texts from
Twitter, Youtube Comments, text files, raw email files (.eml) or WARC
(Web Archive) files. The extracted text can be preprocessed into a
Dataset using task (org.bdp4j.pipe.Pipe) definitions. This framework
incorporates more than 30 preprocessing tasks to transform the text.
License | GPL 3.0 |
---|---|
HomePage | https://github.com/sing-group/nlpa |
Ranking | #839060 in MvnRepository (See Top Artifacts) |