Indexed Artifacts (17.4M)

Popular Categories

Artifacts using pdfbox-tools version 2.0.12

Apache Tika Parsers
Last Release on Apr 21, 2020
Alfresco Repository
Last Release on Jun 30, 2020
Apache Solr Content Extraction Library integrates Apache Tika content extraction framework into Solr
Last Release on May 27, 2020
# Tess4J ## Description: A Java JNA wrapper for Tesseract OCR API. Tess4J is released and distributed under the Apache License, v2.0. ## Features: The library provides optical character recognition (OCR) support for: TIFF, JPEG, GIF, PNG, and BMP image formats Multi-page TIFF images PDF document format
Last Release on Jan 3, 2020
Apache PDFBox Application
Last Release on Jun 7, 2020
This module is intended to be used while indexing documents. It is implemented as an UpdateProcessor to be placed in an UpdateChain. Its purpose is to identify language from documents and tag the document with language code.
Last Release on May 27, 2020
Apache Solr DataImportHandler Extras
Last Release on May 27, 2020


The Apache PDFBox library is an open source Java tool for working with PDF documents. This artefact contains examples on how the library can be used.
Last Release on Jun 7, 2020
Hcbm Boot OCR
Last Release on Apr 7, 2020
基于libreoffice实现的文档转换工具模块
Last Release on Jul 15, 2019