Indexed Artifacts (17.4M)

Popular Categories

Artifacts using Apache PDFBox Tools (61)

Sort: popular | newest
Apache Tika Parsers
Last Release on Apr 21, 2020
Apache Tika Parsers
Last Release on Apr 21, 2020
OpenCms is an enterprise-ready, easy to use website content management system based on Java and XML technology. Offering a complete set of features, OpenCms helps content managers worldwide to create and maintain beautiful websites fast and efficiently.
Last Release on Sep 5, 2019
Alfresco Repository
Last Release on Jun 30, 2020
GATE - general architecture for text engineering - is open source software capable of solving almost any text processing problem. This artifact enables you to embed the core GATE Embedded with its essential dependencies. You will able to use the GATE Embedded API and load and store GATE XML documents. This artifact is the perfect dependency for CREOLE plugins or for applications that need to customize the GATE dependencies due to conflict with their own ...
Last Release on Jan 17, 2020
Apache Solr Content Extraction Library integrates Apache Tika content extraction framework into Solr
Last Release on May 27, 2020
# Tess4J ## Description: A Java JNA wrapper for Tesseract OCR API. Tess4J is released and distributed under the Apache License, v2.0. ## Features: The library provides optical character recognition (OCR) support for: TIFF, JPEG, GIF, PNG, and BMP image formats Multi-page TIFF images PDF document format
Last Release on Jan 3, 2020


OSGi bundle that contains tika-parsers. Repackaged to include the full ooxml-schemas instead of the poi-ooxml-schemas subset. This is done to provide more parsing capabilities when using Tika. https://issues.apache.org/jira/browse/TIKA-2094
Last Release on Feb 6, 2020
Apache PDFBox Application
Last Release on Jun 7, 2020
Apache Tika Parsers
Last Release on Dec 7, 2019