RECITALS :: Value-Added Services

To facilitate industrial applications of identity management and data sharing, the RECITALS platform orchestrates the components of the RECITALS core and extends them into pipelines that are capable of addressing entire tasks in an end-to-end way. More specifically, the following services are provided:

The self-sovereign wallet provides a secure and user-centric digital wallet system that empowers individuals to have complete control over their personal identity and data within the RECITALS ecosystem, without relying on a central authority. The resulting wallet will boast the following features:
- (i) Decentralization: To eliminate the risk of hacks or losses, due to centralized points of failure, this wallet leverages decentralized identifiers and verifiable credentials, which allow users to present their identity information without the need for centralized authorities or intermediaries.
- (ii) Self-sovereignty: users manage access permissions, consent, and data sharing with other entities within the RECITALS platform. Instead of relying on third-party services to safeguard their assets, they have complete control over their private keys, which are required to access and manage their digital assets, deciding who can access their data and for what purposes.
- (iii) Privacy: based on the privacy-preserving technologies of RECITALS Core, such as its advanced cryptography and anonymization methods, the wallet protects users’ transaction data and maintains their anonymity.
- (iv) Interoperability: it enables users to manage multiple digital assets through a single account without needing to create separate accounts for each network.
- (v) Intuitive Interface: the wallet offers a wizard-like intuitive graphical user interface that is accessible even to novice users, as it can be combined with LLM-based interface to enable user interaction in natural language.
Privacy-preserving Record Linkage (PPRL) aims to connect duplicate records across various data bases without disclosing sensitive information about the entities involved. This task is particularly challenging because it involves multiple parties, each with their own datasets, and requires secure and privacy-preserving techniques to prevent the exposure of sensitive information. First, it transforms data into a quasi-identifiable format, where certain identifying information (such as names, social security numbers, or addresses) is removed or encrypted. Then, linking is performed through cryptographic techniques like secure multi-party computation, which carries out secure collaborative computation among the different parties involved in the record linkage process, i.e., it ensures that the matching of records based on encoded and encrypted identifiers is carried out without any party having access to the plaintext data of the others. State of the art, open-source tools will be incorporated for this task, like pyJedAI, which is developed by NKUA.
Privacy-preserving Federated Learning enables the collaborative training of machine and deep learning models across multiple entities and data sources, without the need to centralize the data. RECITALS will implement the state-of-the-art relevant algorithms for federated learning like:
- (i) federated averaging, where individual models are trained locally and then are aggregated into a main model, thus sharing only model updates,
- (ii) knowledge distillation, where a large teacher model is trained centrally, and its knowledge is then transferred locally to smaller student models, and
- (iii) federated reinforcement learning, where agents are locally trained and then collaborate, learning from each other’s experiences, to improve performance and generalization.
Through this component, RECITALS not only enhances the accuracy and efficiency of its machine learning models, but also fosters a decentralized and collaborative approach to data analysis.
The LLM-based Interface combines an intuitive, wizard-like graphical user interface with an open-source Large Language Model (LLM), like Llama 2, which offer a valuable means to interact with and gain insights from a data or system in a human-readable format. RECITALS will leverage the advantages of LLMs to offer a user-friendly and intuitive interface for its end-users. The Retrieval Augmented Generation (RAG) approach will be used to ensure factual consistency and response reliability, while reducing or even eliminating hallucinations . Using a trained LLM as basis, RAG will receive as an input the documentation of all functionalities offered by the RECITALS platform. This means that no private or sensitive information will be fed to the LLM. Users will then pose natural language requests to the LLM, which will treat RECITALS documentation as context. The LLM will then produce the necessary output in the form of instructions for performing the requested functionality among those offered by the RECITALS platform. Note that RAG is adaptive to situations where the indexed corpus evolves over time. Despite the static parametric knowledge of the underlying LLM, RAG allows it to bypass retraining, enabling access to the latest information for generating reliable outputs via retrieval-based generation.
The explAIner component integrates xAI functionality into the RECITALS platform to enhance the transparency of its automatic processes, while ensuring that they are not only compliant with EU regulations but also understandable, and user-centric. This way, trust in the RECITALS platform will be enhanced. Emphasis will be placed on post-hoc explainability, which elucidates the functionality of models trained by the other components of the RECITALS platform.
In this context, at least the following techniques will be supported:
- (i) Local Interpretable Model-agnostic Explanations (LIME), which explains individual predictions by approximating the complex model with a simpler, interpretable one in the local neighborhood of the instance being explained.
- (ii) SHapley Additive exPlanations (SHAP), which explains individual predictions by attributing a value to each feature, representing its contribution to the prediction.
- (iii) Partial Dependence Plots (PDP), which focus on a specific decision, displaying the relationship between a feature and the prediction, while controlling the other features.
- (iv) Individual Conditional Expectation (ICE), which estimates the average change in the model’s prediction for a small change in a single feature, while keeping all other features constant.
By leveraging explainable and interpretable AI techniques, this component provides a balance between deriving valuable insights from private and sensitive data and protecting the privacy and security of users, thus strengthening user trust in the RECITALS platform, while respecting user autonomy and privacy.
This Domain-specific Compliance Manager complements the generic one in RECITALS core. Emphasis is placed on the three domains examined in this project, namely the energy, the telecommunications and the healthcare (see Section 1.2.1.4 for more details). To this end, this component will extend the CM and DPV resources with appropriate knowledge and compliance checking procedures that can be integrated and operationalised within the use-cases and pilots. This process will demonstrate the extensibility of the CM to incorporate changes in use-cases and regulations and will reduce the burden of audits and compliance checking through automation.
Privacy-preserving Data Analytics provides methods for extracting insights from sensitive data in a highly secure and efficient manner without comprising their privacy and security. To balance the need for data utility and the need to protect individual privacy, the implemented algorithms leverage the techniques provided by the Cryptography and Anonymization Managers. The provided methods will rely on data aggregation, combining data from multiple sources or records to minimize the risk of re-identification, and on techniques for learning quasi-identifiers, using proxies or pseudonyms to replace direct identifiers, while still allowing for data analysis. RECITALS ensures that data remains encrypted and confidential throughout the analytics process. Thus, data consumers can make data-driven decisions and perform queries, statistical analyses, and machine learning tasks to develop insights on top of sensitive data, while adhering to all relevant privacy regulations.