Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Loading Performance and Resource Consumption

Tuning Loading Performance

Note: This section will be obsolete with the revamped storage engine.

To improve the loading performance of Tentris for large datasets, the following virtual memory parameters should be set in the terminal.

sudo sysctl vm.dirty_writeback_centisecs=30000
sudo sysctl vm.dirty_ratio=90
sudo sysctl vm.dirty_background_ratio=80
sudo sysctl vm.dirty_expire_centisecs=300000000

We are currently working on a new storage engine which should remove the need to set these parameters in the future.

Index Storage Footprint

Note: The storage efficiency improves substantially with the revamped storage engine.

Disk space requirement always depends on the specifics of the dataset, especially on the total amount of text in literals. For reverence, we provide numbers for the WatDiv) dataset and for the text-literal-heavy DBpedia and Wikidata datasets.

TriplesSpaceTriples/GB
WatDiv 1B Triples1B58GB16.6M
Wikidata 2025-01-22 truthy beta7.8B858GB8.7M
DBpedia snapshot 2022-121.15B175GB6.3M

Snapshots Storage Footprint

Tentris is designed to store snapshots incrementally. On modern file systems with copy-on-write (CoW, supported on, e.g., XFS, ZFS, Btrfs), a snapshot requires storage only for the data that was added since the last snapshot.

Note: On filesystems without CoW (e.g., ext3/4, ntfs, macOS' apfs, FAT32), a snapshot is a full copy of the index. Multiple snapshots pile up quickly to a large storage footprint. Tentris will support CoW snapshots independent of the file system with the revamped storage engine.