Indexed Artifacts (21.2M)

Popular Categories

Artifacts using Boilerpipe Boilerplate Removal and Fulltext Extraction From HTML Pages (30)

Sort: popular | newest
Tika Parsers
Last Release on May 26, 2021
Jahia Server Implementation
Last Release on Feb 2, 2021
Apache Solr Content Extraction Library integrates Apache Tika content extraction framework into Solr
Last Release on Jun 16, 2021
Last Release on Jul 19, 2019
This module is intended to be used while indexing documents. It is implemented as an UpdateProcessor to be placed in an UpdateChain. Its purpose is to identify language from documents and tag the document with language code.
Last Release on Jun 16, 2021
Apache Tika Parsers
Last Release on Mar 22, 2021
XWiki Platform - Office - Importer
Last Release on May 24, 2021

This is my common library.
Last Release on Jan 23, 2019
Apache Solr DataImportHandler Extras
Last Release on Jun 16, 2021
Lucene index management and query
Last Release on Mar 11, 2016