Önálló labor

Adminisztratív információk

Általános szabályok és tanácsok az önálló laborhoz
Előadások beosztása

Aktuális témák 2024

A laborban több aktív kutatási területeken lehet önálló labor, szakdolgozat, és diplomaterv témát választani. Ezeknek a területeknek a leírása található alább. Ha valamelyik tématerület érdekel, keresd meg a tématerületért felelős kollégánkat, és beszéljetek lehetséges konkrét feladatokról a területen belül. Ne feledjétek, hogy az önálló labor keretében egy-egy feladaton kisebb csoportban (team-ben) is lehet dolgozni. Az témáink a következő területekhez kapcsolódnak:
All, Embedded-Systems, Internet-of-Things, Malware, Machine-Learning, Software-Security, Security-Analysis, ICS/SCADA, Attack generation, Privacy, Security, Federated-Learning, Game-Theory, Economics

Privacy & Anonymization

Kategória: Privacy, Machine-Learning

The word privacy is derived from the Latin word "privatus" which means set apart from what is public, personal and belonging to oneself, and not to the state. There are multiple angles of privacy and multiple techniques to improve them to varying extent. Students can work on the following topics:

(De-)Anonymization of Medical Data: ECG (Electrocardiogram) and CTG (Cardiotocography), diagnostic images (MRI, X-ray), are very sensitive datasets containing the medical records of individuals.The task is to (de-)anonymize such datasets (or some aggregates computed over such data) for data sharing with strong, preferably provable privacy guarantees which are also GDPR compliant.
(Contact: Gergely Ács)
Poisoning Differential Privacy: Differential Privacy is the de facto privacy model used to anonymize datasets (see US-Census data). Small noise is added to the data which hides the participation of any single individual in the dataset, but not the general statistics of the population as a whole. The noise is calibrated to the influence of any record. However, if the data is coming from untrusted sources, the attacker can inject fake records into the dataset in order to increase the added noise that eventually degrades the utility of the anonymized data. The task is to design and implement such an attack.
(Contact: Gergely Ács)
Differential Privacy Amplification: Nowadays, the standard privacy-preserving mechanism is Differential Privacy. It aims to hide the presence or the absence of a data point in the final result by adding noise to the original query, making the two outcomes (one with and one without a single data point) statistically indistinguishable (up to the privacy- parameter). For example, the average salary of BME last year's graduates are published with added noise, so even if an adversary knows all alums' salaries except its target, it cannot deduce that with certainty. Besides the size of the added noise, privacy protection can further be increased by so-called amplification techniques, such as sampling from the data (instead of utilizing all). For instance, only half of the alums are considered for this statistic.
The student's task is to learn and experiment with these amplification techniques and to find the optimal setting (amplification mechanisms and its parameters) to obtain a desirable trade-off between the provided privacy protection and the obtained accuracy.
(Contact: Balázs Pejó)
Own idea: If you have any own project idea related to data privacy, and we find it interesting, you can work on that under our guidance.
(Contact: Gergely Ács or Balázs Pejó)

Required skills: none
Preferred skills: basic programming skills (e.g., python)

Létszám: 6 hallgató

Kapcsolat: Gergely Ács (CrySyS Lab), Balázs Pejó (CrySyS Lab)

Machine Learning & Security & Privacy

Kategória: Privacy, Security, Machine-Learning

Machine Learning (Artificial Intelligence) has become undisputedly popular in recent years. The number of security critical applications of machine learning has been steadily increasing over the years (self-driving cars, user authentication, decision support, profiling, risk assessment, etc.). However, there are still many open security problems of machine learning. Students can work on the following topics:

Security of Machine learning based Malware Detection: Adversarial examples are maliciously modified program code where the modification is hard to detect yet the prediction of the model on this slightly modified code is very different compared to the unmodified code. For example, the malware developer modifies a few bytes in the malware binary which causes the malware detector to misclassify the malware as benign. A potential task can be to develop solutions to detect adversarial examples, develop robust training algorithms for malware detection, or design backdoor and membership attacks.
(Contact: Gergely Ács)
Robustness of Large Language Models: Large Language Models (LLMs) are a new class of machine learning models that are trained on large text corpora. They are capable of generating text that is indistinguishable from human-written text. The increasing reliance on Large Language Models (LLMs) across academia and industry necessitates a comprehensive understanding of their robustness to prompts. The task is to study and test different adversarial prompts against LLMs (such as adversarial attacks, or prompt injection, or any other adversarial prompts). (Contact: Gergely Ács)
Detection and Attribution of Fake Images Over the last year, there has been a growing interest in text-to-image generation models that create images based on prompt descriptions. While these models exhibit promising performance, there is a going concern about the potential misuse of the artificially generated images they produce. The task is develop detection and attribution methods of fake images generated by text-to-image generation models, that is, to detect which images are generated by AI and by which model exactly.
(Contact: Gergely Ács)
Meta Learning: In online media, there is legit news as well as fake news. While the former is usually of higher quality, the latter is often associated with low-quality writing. Several machine learning models focus on classifying these two in the scientific literature. Moreover, in the scientific literature itself, there are lower and higher-quality publications as well.
The student's task is to get familiar with these models and experiment with their applicability to differentiate non-peer-reviewed scientific papers (e.g., on ArXiv) from articles that appeared in well-established venues (such as S&P, CCS, etc.).
(Contact: Balázs Pejó)
Own idea: If you have any own project idea related to the security/privacy of machine learning, and we find it interesting, you can work on that under our guidance.
(Contact: Gergely Ács or Balázs Pejó)

Required skills: none
Preferred skills: basic programming skills (e.g., python), machine learning (not required)

Létszám: 6 hallgató

Kapcsolat: Gergely Ács (CrySyS Lab), Balázs Pejó (CrySyS Lab)

Federated Learning - Security & Privacy & Contribution Scores

Kategória: Privacy, Security, Federated-Learning, Game-Theory

Federated learning enables multiple actors to build a common, robust machine learning model without sharing data, thus allowing to address critical issues such as data privacy, data security, data access rights and access to heterogeneous data. Its applications are spread over a number of industries including defense, telecommunications, IoT, and pharmaceutics. Students can work on the following topics:

Federated Learning Framework for Medical Data: Federated learning is going to be adopted in health care, where different organizations want to train a common model for different purposes (tumor/disease classification, prediction of survival time, finding an explainable pattern of Covid on whole slide images of livers, etc.) but organizations lack of sufficient training data individually. The task is to develop federated learning framework for such tasks.
(Contact: Gergely Ács)
Security and Privacy of Federated Learning: Federated learning allows multiple parties to collaborate in order to train a common model, by only sharing model updates instead of their training data (e.g., mobile devices train a common model for input text prediction, or hospitals train a better model for tumor classification). Even if this architecture seems more privacy-preserving at first sight, recent works have highlighted numerous privacy and security attacks to infer private and sensitive information. The task is to develop privacy and/or security attacks against federated learning (data poisoning, backdoors, reconstruction attacks), and/or mitigate these attacks.
(Contact: Gergely Ács)
Free RIder Detection using Attacks (FRIDA): In Federated Learning, multiple individuals train a single model together in a privacy-friendly way, i.e., their underlying datasets remain hidden from the other participants. As a consequence of this distributed setup, dishonest participants might behave maliciously by free-riding (enjoying the commonly trained model while not contributing to it).
The student's interdisciplinary task is to read about the Membership Inference Attacks and the free-riding problem in Federated Learning. Furthermore, to propose a framework that connects the two, i.e., use Membership Inference Attack to determine whether the participant used actual data or just random noise during training.
(Contact: Balázs Pejó)
Contribution Score Poisoning: It is well known that it is possible to poison the training data to decrease the model's performance in general (un-targeted attack) or for a specific class (targeted attack). Moreover, it is also possible to poison the data such that the desired fairness objective is destroyed or the privacy of the data samples is compromised. Contribution measuring techniques, such as the Shapley value, assign values to each participant, reflecting their importance or usefulness for the training. The question naturally arises; by injecting malicious participants into the participant pool, is it possible to manipulate the contribution scores of other participants (i.e., arbitrarily increase or decrease).
The student's task is to get familiar with Contribution Score Computation techniques as well as poisoning attacks within Federated Learning and empirically test (aka with experiments) whether such control is feasible and to what extent.
(Contact: Balázs Pejó)
Fairness of Shapley Approximations: In any distributed setting with a single common product, such as in Federated Learning (where multiple participants train a Machine Learning model together in a privacy-friendly way), the contribution of the individuals is a crucial question. For instance, when several pharmaceutical companies train a model together, which leads to a huge breakthrough, how should they split the pay-off corresponding to the model? Equal distribution is as unfair as the one based on the dataset sizes, as neither considers the data quality. What does fair mean in the first place? Shapley defined four fundamental fairness properties and proved that his reward allocation scheme is the only one that satisfies all. On the other hand, it is exponentially hard to compute, so it is standard practice to approximate it in real life.
The student's task is to study existing approximation methods and verify (theoretically or empirically) to what extent these methods respect the four desired properties.
(Contact: Balázs Pejó or Gergely Biczók)
Own idea: If you have any own project idea related to the security/privacy of federated learning, and we find it interesting, you can work on that under our guidance.
(Contact: Gergely Ács or Balázs Pejó or Gergely Biczók)

Required skills: none
Preferred skills: basic programming skills (e.g., python), machine learning (not required)

Létszám: 6 hallgató

Kapcsolat: Gergely Ács (CrySyS Lab), Balázs Pejó (CrySyS Lab), Gergely Biczók (CrySyS Lab)

Economics of (cyber)security and (data)privacy

Kategória: Economics, Privacy, Security, Game-Theory, Machine-Learning

As evidenced in the last 10-15 years, cybersecurity is not a purely technical discipline. Decision-makers, whether sitting at security providers (IT companies), security demanders (everyone using IT) or the security industry, are mostly driven by economic incentives. Understanding these incentives are vital for designing systems that are secure in real-life scenarios. Parallel to this, data privacy has also shown the same characteristics: proper economic incentives and controls are needed to design systems where sharing data is beneficial to both data subject and data controller. An extreme example to a flawed attempt at such a design is the Cambridge Analytica case.
The prospective student will identify a cybersecurity or data privacy economics problem, and use elements of game theory and other domain-specific techniques and software tools to transform the problem into a model and propose a solution. Potential topics include:

CPSFlipIt: attacker-defender dynamics in cyber-physical systems
Risk management for cyber-physical/OT systems
Incentives in secure software development: why should programmers have proper security training?
Interdependent privacy: modeling inference with probabilistic graphical models
BYOT: Bring Your Own Topic!

Required skills: model thinking, good command of English
Preferred skills: basic knowledge of game theory, basic programming skills (e.g., python, matlab, NetLogo)

Létszám: 6 hallgató

Kapcsolat: Gergely Biczók (CrySyS Lab), Balázs Pejó (CrySyS Lab)