Value-Added Services
To facilitate industrial applications of identity management and data sharing, the RECITALS platform orchestrates the components of the RECITALS core and extends them into pipelines that are capable of addressing entire tasks in an end-to-end way. The following services are provided:
Self-Sovereign Wallet
Provides a secure and user-centric digital wallet system that empowers individuals to have complete control over their personal identity and data within the RECITALS ecosystem, without relying on a central authority. The resulting wallet will boast the following features:
- Decentralization: Eliminates the risk of hacks or losses by leveraging decentralized identifiers and verifiable credentials.
- Self-sovereignty: Users manage access permissions, consent, and data sharing with complete control over their private keys.
- Privacy: Based on RECITALS Core privacy-preserving technologies, protecting users' transaction data and maintaining anonymity.
- Interoperability: Enables management of multiple digital assets through a single account.
- Intuitive Interface: Wizard-like graphical user interface accessible even to novice users, combined with LLM-based natural language interaction.
Privacy-Preserving Record Linkage (PPRL)
Aims to connect duplicate records across various databases without disclosing sensitive information about the entities involved. This task is particularly challenging because it involves multiple parties, each with their own datasets, and requires secure and privacy-preserving techniques.
- Data transformation: Transforms data into quasi-identifiable format where certain identifying information is removed or encrypted.
- Cryptographic linking: Uses secure multi-party computation for collaborative computation among different parties.
- Privacy protection: Ensures matching of records without any party having access to plaintext data of others.
- State-of-the-art tools: Incorporates open-source tools like pyJedAI, developed by NKUA.
Privacy-Preserving Federated Learning
Enables the collaborative training of machine and deep learning models across multiple entities and data sources, without the need to centralize the data. RECITALS implements state-of-the-art algorithms:
- Federated averaging: Individual models trained locally and aggregated into a main model, sharing only model updates.
- Knowledge distillation: Large teacher model trained centrally with knowledge transferred locally to smaller student models.
- Federated reinforcement learning: Agents locally trained and collaborating, learning from each other's experiences.
Through this component, RECITALS not only enhances the accuracy and efficiency of its machine learning models, but also fosters a decentralized and collaborative approach to data analysis.
LLM-Based Interface
Combines an intuitive, wizard-like graphical user interface with an open-source Large Language Model (LLM), like Llama 2, offering a valuable means to interact with and gain insights from a data or system in a human-readable format.
Retrieval Augmented Generation (RAG):
- Factual consistency: Ensures factual consistency and response reliability while reducing or eliminating hallucinations
- Documentation-based: Receives RECITALS platform documentation as input (no private or sensitive information)
- Natural language interaction: Users pose natural language requests treated with RECITALS documentation as context
- Actionable instructions: Produces instructions for performing requested functionality
- Adaptive learning: Adaptive to evolving indexed corpus, bypassing retraining need
explAIner Component
Integrates xAI functionality into the RECITALS platform to enhance the transparency of its automatic processes, ensuring they are compliant with EU regulations while being understandable and user-centric. Emphasis is placed on post-hoc explainability.
Supported Techniques:
- Local Interpretable Model-agnostic Explanations (LIME): Explains individual predictions by approximating complex models with simpler, interpretable ones.
- SHapley Additive exPlanations (SHAP): Explains predictions by attributing a value to each feature representing its contribution.
- Partial Dependence Plots (PDP): Displays the relationship between a feature and the prediction while controlling other features.
- Individual Conditional Expectation (ICE): Estimates average change in model's prediction for a small change in a single feature.
By leveraging explainable and interpretable AI techniques, this component provides a balance between deriving valuable insights from private and sensitive data and protecting the privacy and security of users.
Domain-Specific Compliance Manager
Complements the generic one in RECITALS core. Emphasis is placed on the three domains examined in this project: energy, telecommunications, and healthcare.
- Extended resources: Extends CM and DPV resources with appropriate knowledge and compliance checking procedures
- Integration capability: Can be integrated and operationalized within the use-cases and pilots
- Extensibility: Demonstrates extensibility of CM to incorporate changes in use-cases and regulations
- Automation: Reduces the burden of audits and compliance checking through automation
Privacy-Preserving Data Analytics
Provides methods for extracting insights from sensitive data in a highly secure and efficient manner without compromising their privacy and security. To balance data utility and privacy protection, implemented algorithms leverage techniques from the Cryptography and Anonymization Managers.
- Data aggregation: Combines data from multiple sources or records to minimize re-identification risk.
- Quasi-identifier learning: Uses proxies or pseudonyms to replace direct identifiers while allowing data analysis.
- End-to-end encryption: Ensures data remains encrypted and confidential throughout the analytics process.
Data consumers can make data-driven decisions and perform queries, statistical analyses, and machine learning tasks to develop insights on top of sensitive data, while adhering to all relevant privacy regulations.