Indexed Artifacts (17.5M)

Popular Categories

Artifacts using Boilerpipe Boilerplate Removal and Fulltext Extraction From HTML Pages (28)

Sort: popular | newest
Apache Tika Parsers
Last Release on Apr 21, 2020
Jahia Server Implementation
Last Release on May 4, 2020
Apache Solr Content Extraction Library integrates Apache Tika content extraction framework into Solr
Last Release on Jul 13, 2020
epic
Last Release on Jul 19, 2019
This module is intended to be used while indexing documents. It is implemented as an UpdateProcessor to be placed in an UpdateChain. Its purpose is to identify language from documents and tag the document with language code.
Last Release on Jul 13, 2020
Apache Tika Parsers
Last Release on Dec 7, 2019
XWiki Platform - Office - Importer
Last Release on May 26, 2020


Apache Solr DataImportHandler Extras
Last Release on Jul 13, 2020
Lucene index management and query
Last Release on Mar 11, 2016
This is my common library.
Last Release on Jan 23, 2019