Loading Performance and Resource Consumption

Tuning Loading Performance

Note: This section will be obsolete with the revamped storage engine.

To improve the loading performance of Tentris for large datasets, the following virtual memory parameters should be set in the terminal.

sudo sysctl vm.dirty_writeback_centisecs=30000
sudo sysctl vm.dirty_ratio=90
sudo sysctl vm.dirty_background_ratio=80
sudo sysctl vm.dirty_expire_centisecs=300000000

We are currently working on a new storage engine which should remove the need to set these parameters in the future.

Index Storage Footprint

Note: The storage efficiency improves substantially with the revamped storage engine.

Disk space requirement always depends on the specifics of the dataset, especially on the total amount of text in literals. For reverence, we provide numbers for the WatDiv) dataset and for the text-literal-heavy DBpedia and Wikidata datasets.

	Triples	Space	Triples/GB
WatDiv 1B Triples	1B	58GB	16.6M
Wikidata 2025-01-22 truthy beta	7.8B	858GB	8.7M
DBpedia snapshot 2022-12	1.15B	175GB	6.3M

Snapshots Storage Footprint

Tentris is designed to store snapshots incrementally. On modern file systems with copy-on-write (CoW, supported on, e.g., XFS, ZFS, Btrfs), a snapshot requires storage only for the data that was added since the last snapshot.

Note: On filesystems without CoW (e.g., ext3/4, ntfs, macOS' apfs, FAT32), a snapshot is a full copy of the index. Multiple snapshots pile up quickly to a large storage footprint. Tentris will support CoW snapshots independent of the file system with the revamped storage engine.

Keyboard shortcuts

Tentris Beta Documentation

Loading Performance and Resource Consumption

Tuning Loading Performance

Index Storage Footprint

Snapshots Storage Footprint