Skip to main content

FileGazer - Installation

This document describes the installation and commissioning of FileGazer.

At this point, a fundamental distinction is made between a standalone installation and the installation of a Docker environment.

If a complete and simple installation is needed quickly, the docker-compose variant is recommended.

Requirements

Java

The current version (as of Jan 2026) requires Java version 21. Other versions are not supported.

Tesseract (OCR)

FileGazer comes with all the libraries needed to run the application. However, if OCR is required (via Tika), the Tesseract application must be installed on the respective host.

If FileGazer is running in a Docker environment, this is not necessary, since the FileGazer Docker image has both the appropriate Java version and Tesseract installed accordingly.

By default, FileGazer uses German (deu) to recognize text. This setting is stored in the application.properties and can be changed at any time.

In any case, when installing Tesseract, make sure that the appropriate language pack is installed. The default installation is German.

Selecting a language in the application.properties that is not installed will result in significant errors.

Further information:

Linux (Ubuntu)

sudo apt-get update sudo apt-get install tesseract-ocr sudo apt-get install tesseract-ocr-deu

Linux (RedHat)

sudo dnf install tesseract sudo dnf install tesseract-langpack-deu

MacOs

brew install tesseract tesseract-lang

Windows

An installation package for Windows is available for download at

https://github.com/UB-Mannheim/tesseract

Installation is performed via a GUI

Standalone Installation

To install FileGazer, simply copy the current JAR file (for example, filegazer1.0.3.jar) to the target system. Use a separate, new directory.

Start FileGazer with:

java -jar filegazer1.0.3.jar

When the application starts for the first time, all necessary directories are created and the default configuration files are created:

DirectoryDescription
./docDocumentation
./etcConfigurations. At this level: FileGazerContentAnalyse.xml and FileGazerScripts.xml
./etc/scripts(Groovy) scripts called via scheduler or event
./etc/xsltTransformations that can adapt the resulting XML when calling the REST service
./ExampleDataExample files/documents
./logLog files
./processingBatch processing directories

Once launched, FileGazer will be available at

http://localhost:8080

and should display this screen:

FileGazer

This application allows you to examine files and analyze the resulting XML. It also provides (read-only) access to FileGazerContentAnalyse.xml and FileGazerScripts.xml.

Batch Processing

The default configuration includes sample scripts that demonstrate how scripting can be used within FileGazer.

Among other things, five scripts are provided for use in batch mode.

These scripts always work according to the same principle: Files placed in the corresponding ./IN directory are processed within 10 seconds, and the results are placed in ./OUT. The original files from ./IN are deleted.

Example: The MIME types of one or more files must be determined. The files to be examined are placed under ./processing/doc2mimetype/in.

FileGazer then examines the files, determines the MIME type, and places the file under ./processing/doc2mimetype/out/[mimetype] ([mimetype] is the name of the MIME type).

There are currently 5 batch processing scripts available:

ScriptDirectoryDescription
ProcessingDefault.groovy./processing/defaultAn XML file containing all analysis information is stored in the out directory (same name plus .xml). This file contains all analysis information.
ProcessingDoc2Classifing.groovy./processing/doc2classifingSubdirectories are created in the out directory according to the selected document classification, and the files are copied there.
ProcessingDoc2MimeType.groovy./processing/doc2mimetypeSubdirectories are created in the out directory according to the MIMETypes, and the files are copied there.
ProcessingDoc2PDF.groovy./processing/doc2pdfThe files from the IN directory are converted to PDF files (if technically possible) and copied to OUT
ProcessingDocText.groovy./processing/doc2txtThe files from the IN directory are converted to text (if technically possible) (using OCR if necessary) and copied to OUT

Docker Installation

FileGazer is available as a Docker image on docker-hub (https://hub.docker.com/r/samoak/filegazer) and can be loaded using

    docker pull samoak/filegazer 

In addition to FileGazer and the correct Java version, the image also includes the latest installation of Tesseract.

The image is started using:

    docker run -d -v /home/myUser/filegazer/log:/home/filegazer/log -v /home/myUser/filegazer/processing:/home/filegazer/processing -p 8080:8080 filegazer:latest

The two directories in the Docker container (/home/filegazer/log and /home/filegazer/processing) are stored on the host under the user home/myUser/filegazer/log and home/myUser/filegazer/processing.

docker-compose

In addition to Tesseract, FileGazer requires the Gotenberg REST service. This is only available as a Docker image.

FileGazer can also be used in the standard installation, which then accesses the Gotenberg Docker image.

Alternatively, the following docker-compose file can be used to run FileGazer and Gotenberg together in a Docker environment.

version: '3.8'
services:

gotenberg:
container_name: gotenberg
image: docker.io/gotenberg/gotenberg:latest
ports:
- '3000:3000'
restart: unless-stopped
# The gotenberg chromium route is used to convert .eml files. We do not
# want to allow external content like tracking pixels or even javascript.
command:
- "gotenberg"
- "--chromium-disable-javascript=true"
- "--chromium-allow-list=file:///tmp/.*"
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:3000/health"]
interval: 10s
timeout: 5s
retries: 5

filegazer:
depends_on:
gotenberg:
condition: service_healthy
container_name: filegazer
image: samoak/filegazer:latest
environment:
- OPENAI_API_KEY=xxxxxxreplace_with_your_own_xxxxxxxxx
- GEMINI_API_KEY=xxxxxxreplace_with_your_own_xxxxxxxxx
- CLAUDE_API_KEY=xxxxxxreplace_with_your_own_xxxxxxxxx
ports:
- '8080:8080'
# Adjust directory for your need
volumes:
- ./filegazer/log:/home/filegazer/log
- ./filegazer/processing:/home/filegazer/processing
- ./filegazer/etc:/home/filegazer/etc
restart: unless-stopped
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8080/actuator/health"]
interval: 10s
timeout: 5s
retries: 5

networks:
default:
name: filegazer_net

Start with

    sudo docker-compose up -d

Stopp with

    sudo docker-compose down

Post-Install configurations

This section describes possible or necessary adjustments that may be required after installation.

application.properties

Tesseract

Scripting

Deactivating "Batch processing"