Loading Performance and Resource Consumption
Tuning Loading Performance
Note: This section will be obsolete with the revamped storage engine.
To improve the loading performance of Tentris for large datasets, the following virtual memory parameters should be set in the terminal.
sudo sysctl vm.dirty_writeback_centisecs=30000
sudo sysctl vm.dirty_ratio=90
sudo sysctl vm.dirty_background_ratio=80
sudo sysctl vm.dirty_expire_centisecs=300000000
We are currently working on a new storage engine which should remove the need to set these parameters in the future.
Index Storage Footprint
Note: The storage efficiency improves substantially with the revamped storage engine.
Disk space requirement always depends on the specifics of the dataset, especially on the total amount of text in literals. For reverence, we provide numbers for the WatDiv) dataset and for the text-literal-heavy DBpedia and Wikidata datasets.
Triples | Space | Triples/GB | |
---|---|---|---|
WatDiv 1B Triples | 1B | 58GB | 16.6M |
Wikidata 2025-01-22 truthy beta | 7.8B | 858GB | 8.7M |
DBpedia snapshot 2022-12 | 1.15B | 175GB | 6.3M |
Snapshots Storage Footprint
Tentris is designed to store snapshots incrementally. On modern file systems with copy-on-write (CoW, supported on, e.g., XFS, ZFS, Btrfs), a snapshot requires storage only for the data that was added since the last snapshot.
Note: On filesystems without CoW (e.g., ext3/4, ntfs, macOS' apfs, FAT32), a snapshot is a full copy of the index. Multiple snapshots pile up quickly to a large storage footprint. Tentris will support CoW snapshots independent of the file system with the revamped storage engine.