FileGazer - Installation
This document describes the installation and commissioning of FileGazer.
At this point, a fundamental distinction is made between a standalone installation and the installation of a Docker environment.
If a complete and simple installation is needed quickly, the docker-compose variant is recommended.
Requirements
Java
The current version (as of Jan 2026) requires Java version 21. Other versions are not supported.
Tesseract (OCR)
FileGazer comes with all the libraries needed to run the application. However, if OCR is required (via Tika), the Tesseract application must be installed on the respective host.
If FileGazer is running in a Docker environment, this is not necessary, since the FileGazer Docker image has both the appropriate Java version and Tesseract installed accordingly.
By default, FileGazer uses German (deu) to recognize text. This setting is stored in the application.properties and can be changed at any time.
In any case, when installing Tesseract, make sure that the appropriate language pack is installed. The default installation is German.
Selecting a language in the application.properties that is not installed will result in significant errors.
Further information:
Linux (Ubuntu)
sudo apt-get update sudo apt-get install tesseract-ocr sudo apt-get install tesseract-ocr-deu
Linux (RedHat)
sudo dnf install tesseract sudo dnf install tesseract-langpack-deu
MacOs
brew install tesseract tesseract-lang
Windows
An installation package for Windows is available for download at
https://github.com/UB-Mannheim/tesseract
Installation is performed via a GUI
Standalone Installation
To install FileGazer, simply copy the current JAR file (for example, filegazer1.0.3.jar) to the target system. Use a separate, new directory.
Start FileGazer with:
java -jar filegazer1.0.3.jar
When the application starts for the first time, all necessary directories are created and the default configuration files are created:
| Directory | Description |
|---|---|
| ./doc | Documentation |
| ./etc | Configurations. At this level: FileGazerContentAnalyse.xml and FileGazerScripts.xml |
| ./etc/scripts | (Groovy) scripts called via scheduler or event |
| ./etc/xslt | Transformations that can adapt the resulting XML when calling the REST service |
| ./ExampleData | Example files/documents |
| ./log | Log files |
| ./processing | Batch processing directories |
Once launched, FileGazer will be available at
http://localhost:8080
and should display this screen:

This application allows you to examine files and analyze the resulting XML. It also provides (read-only) access to FileGazerContentAnalyse.xml and FileGazerScripts.xml.
Batch Processing
The default configuration includes sample scripts that demonstrate how scripting can be used within FileGazer.
Among other things, five scripts are provided for use in batch mode.
These scripts always work according to the same principle: Files placed in the corresponding ./IN directory are processed within 10 seconds, and the results are placed in ./OUT. The original files from ./IN are deleted.
Example: The MIME types of one or more files must be determined. The files to be examined are placed under ./processing/doc2mimetype/in.
FileGazer then examines the files, determines the MIME type, and places the file under ./processing/doc2mimetype/out/[mimetype] ([mimetype] is the name of the MIME type).
There are currently 5 batch processing scripts available:
| Script | Directory | Description |
|---|---|---|
| ProcessingDefault.groovy | ./processing/default | An XML file containing all analysis information is stored in the out directory (same name plus .xml). This file contains all analysis information. |
| ProcessingDoc2Classifing.groovy | ./processing/doc2classifing | Subdirectories are created in the out directory according to the selected document classification, and the files are copied there. |
| ProcessingDoc2MimeType.groovy | ./processing/doc2mimetype | Subdirectories are created in the out directory according to the MIMETypes, and the files are copied there. |
| ProcessingDoc2PDF.groovy | ./processing/doc2pdf | The files from the IN directory are converted to PDF files (if technically possible) and copied to OUT |
| ProcessingDocText.groovy | ./processing/doc2txt | The files from the IN directory are converted to text (if technically possible) (using OCR if necessary) and copied to OUT |
Docker Installation
FileGazer is available as a Docker image on docker-hub (https://hub.docker.com/r/samoak/filegazer) and can be loaded using
docker pull samoak/filegazer
In addition to FileGazer and the correct Java version, the image also includes the latest installation of Tesseract.
The image is started using:
docker run -d -v /home/myUser/filegazer/log:/home/filegazer/log -v /home/myUser/filegazer/processing:/home/filegazer/processing -p 8080:8080 filegazer:latest
The two directories in the Docker container (/home/filegazer/log and /home/filegazer/processing) are stored on the host under the user home/myUser/filegazer/log and home/myUser/filegazer/processing.
docker-compose
In addition to Tesseract, FileGazer requires the Gotenberg REST service. This is only available as a Docker image.
FileGazer can also be used in the standard installation, which then accesses the Gotenberg Docker image.
Alternatively, the following docker-compose file can be used to run FileGazer and Gotenberg together in a Docker environment.
version: '3.8'
services:
gotenberg:
container_name: gotenberg
image: docker.io/gotenberg/gotenberg:latest
ports:
- '3000:3000'
restart: unless-stopped
# The gotenberg chromium route is used to convert .eml files. We do not
# want to allow external content like tracking pixels or even javascript.
command:
- "gotenberg"
- "--chromium-disable-javascript=true"
- "--chromium-allow-list=file:///tmp/.*"
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:3000/health"]
interval: 10s
timeout: 5s
retries: 5
filegazer:
depends_on:
gotenberg:
condition: service_healthy
container_name: filegazer
image: samoak/filegazer:latest
environment:
- OPENAI_API_KEY=xxxxxxreplace_with_your_own_xxxxxxxxx
- GEMINI_API_KEY=xxxxxxreplace_with_your_own_xxxxxxxxx
- CLAUDE_API_KEY=xxxxxxreplace_with_your_own_xxxxxxxxx
ports:
- '8080:8080'
# Adjust directory for your need
volumes:
- ./filegazer/log:/home/filegazer/log
- ./filegazer/processing:/home/filegazer/processing
- ./filegazer/etc:/home/filegazer/etc
restart: unless-stopped
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8080/actuator/health"]
interval: 10s
timeout: 5s
retries: 5
networks:
default:
name: filegazer_net
Start with
sudo docker-compose up -d
Stopp with
sudo docker-compose down
Post-Install configurations
This section describes possible or necessary adjustments that may be required after installation.