Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Dataset Anonymization

The anonymize subcommand enables the sharing of RDF datasets and graphs with confidential information without leaking sensitive information. The anonymization process removes all data while preserving the data's original structure.

How it works

The anonymization process uses a bijective function mapping input RDF terms to random output IRIs. The output IRIs match the following regex pattern http://example.com/[a-zA-Z]+. The function is implemented based on an in-memory map. When an RDF term T is anonymized for the first time, a new random IRI I is generated and associated with the term T; subsequent occurrances of T are replaced by I. IRIs are created using a cryptographically secure random number generator from OpenSSL.

The anonymization process takes place fully in-memory. No information about the mapping or the original data is persisted in the database. The generated mapping is immediately discarded when the program exits. Different invocations of the program always result in different output; even with the same input data.

Source code

The source code of the anonymization algorithm is publicly available in rdf4cpp. The Tentris binary simply uses this algorithm; however, you can directly use rdf4cpp::Dataset::anonymize.

Examples

Anonymizing a turtle file

tentris anonymize < secret.ttl > anon.ttl

Example Output for the Mona Lisa Graph

Below, the result of the anonymization process for the Mona Lisa graph from Introduction to Knowledge Graphs is provided.

Note: Each invocation results in a different graph.

<http://example.org/IfXHRlbIuXliRwkq> 
	<http://example.org/dUMZqmcIyPOCKNzL> <http://example.org/PhNocESfEftRsFVw> ;
	<http://example.org/mWocxoZHzGeOLCsn> <http://example.org/CuedkluuFuMditkC> .

<http://example.org/SEVWpxLqiSTDzQuv>
	<http://example.org/dUMZqmcIyPOCKNzL> <http://example.org/PhNocESfEftRsFVw> ;
	<http://example.org/faYhdyFbFefXFyNY> <http://example.org/XeBBndYqWyGSsedh> ;
	<http://example.org/mWocxoZHzGeOLCsn> <http://example.org/MMIZDsGzVCQhmJBx> ;
	<http://example.org/diyOPzaKxmfpETuU> <http://example.org/IfXHRlbIuXliRwkq> ;
	<http://example.org/nZXhJkVNOWPaAfGx> <http://example.org/PoNHEDJsjCikAxDM> .

<http://example.org/PoNHEDJsjCikAxDM>
	<http://example.org/dUMZqmcIyPOCKNzL> <http://example.org/phztEdzhmcqhcDkm> ;
	<http://example.org/dNEChiLJIqYVyqrm> <http://example.org/ZBdMsRjdUoOvsNeE> ;
	<http://example.org/HSJUmqxiGonkCmoF> <http://example.org/shBsTMuwaYplgzoW> .

<http://example.org/shBsTMuwaYplgzoW> 
	<http://example.org/dUMZqmcIyPOCKNzL> <http://example.org/PhNocESfEftRsFVw> ;
	<http://example.org/mWocxoZHzGeOLCsn> <http://example.org/CZGbPCKoYcQeNPQS> .
	
<http://example.org/suCtwVxXLahzYzHe> 
	<http://example.org/dUMZqmcIyPOCKNzL> <http://example.org/dprlGUMsicGOLAFo> ;
	<http://example.org/ZkyvVIcgmjMBZBZj> <http://example.org/PoNHEDJsjCikAxDM> .

Options

  • -I <IFMT>,--input-format <IFMT>: Specify the input RDF format.
    Options: n-triples, turtle, n-quads, tri-g, detect.
    Default: detect (detects the format based on the file extension of the input file).
  • -O <OFMT>,--output-format <OFMT>: Specify the output RDF format.
    Options: n-triples, turtle, n-quads, tri-g, detect.
    Default: detect (detects the format based on the file extension of the output file).