Balazs Pejo (CV) was born in 1989 in Budapest, Hungary. He received a BSc degree in Mathematics from the Budapest University of Technology and Economics (BME, Hungary) in 2012 and two MSc degree in Computer Science in the Security and Privacy program of EIT Digital from the University of Trento (UNITN, Italy) and Eotvos Lorand University (ELTE, Hungary) in 2014. He earned the PhD degree in Informatics from the University of Luxembourg (UNILU, Luxembourg) in 2019. Currently, he is a member of the Laboratory of Cryptography and Systems Security (CrySyS Lab).
List of Courses
- Contribution Scores
- Inference Attacks
- Differential Privacy
- Robust Learning
- Game Theory
Student Project Proposals
- FRIDA: Free RIder Detection using Attacks:
A membership Inference attack determines whether or not a particular sample was used for training. Could such information be used in a Federated Learning setting to find free riders, i.e., by testing those who do not use any sample for training?
- FRAP: Capture the Fairness/Robustness/Accuracy/Privacy Trade-Off:
There are clear connection between P and A, R and A, and F and A. It is also known that F, R, and P all influence each other pair-wise. Could these trade-off be measured and could we determine the optimal setting based on some incentives?
- Accuracy vs Privacy - Optimizing the Complexity:
More complex ML models perform better, mostly because they can learn more. Consequently, they could potentially leak more information than their simpler counterparts. In which situation does the accuracy gain outweigh the privacy leakage?
- Meta Science:
A handful of high-quality and well-established privacy and security conferences, such as SP, CSS, etc. Could an NLP-based ML model differentiate between those and other non-peer-reviewed papers on ArXiv?
- Testing Data Inference:
The underlying data is separated into training and testing for every ML model. While Membership Inference aims to determine whether a particular data point was part of a training set, currently, there are no known techniques to indicate a data point in the test set. Is it even possible?
- Improving Machine Learning by Preclassification:
Machine Learning (ML) algorithm performs better on bigger datasets, so using more data is generally a good idea. On the other hand, not all data was created equal: could the model's accuracy be improved by carefully selecting different training data for each learning phase?
- Fairness of Shapley Approximation:
Shapley value is the only fair reward distribution, yet, it is exponentially hard to compute. Hence, there are a handful of approximation mechanisms. The question is which approximation, to what extent does, satisfy the desired fairness properties?
- Amplified DP:
Differential Privacy is a de-facto privacy protection mechanism with various amplification (i.e., privacy guarantee boosting) techniques. It is natural to ask, which combination of amplification technique and corresponding privacy parameters results in the highest utility amongst the (e,d)-DP mechanisms?
- Robust ML:
There are several techniques for how the effect of the malicious participants can be mitigated in Federated Learning (i.e., in the client selection phase and the aggregation phase). It begs the question, which is the optimal combination of techniques and corresponding parameters?
List of Students
- Frank Marcell (BSc, BME): Altruism in Fuzzy Message Detection
- Nikolett Kapui (BSc, BME): SQLi Detection Using Machine Learning
- Andras Totth (BSc, BME): Distributed Approximation of the Shapley Value
- Mathias Parisot (BSc, VU-AMS): Property Inference Attacks on Convolutional Neural Networks
- [2023-]: Conference on Computer and Communications Security (CCS)
- [2023-]: Artificial Intelligence and Statistics (AISTAT)
- [2021-]: Emerging Security Information, Systems and Technologies (SECUWARE)
- [2020-2023]: Privacy Enhancing Technologies Symposium (PETS)
- [2020-2022]: Workshop on Privacy in Natural Language Processing (PrivateNLP)
List of Publications
- To Appear
- Wouter Heyndrickx; Lewis Mervin; Tobias Morawietz; Noé Sturm; Lukas Friedrich; Adam Zalewski; Anastasia Pentina; Lina Humbeck; Martijn Oldenhof; Ritsuya Niwayama; Peter Schmidtke; Nikolas Fechner; Jaak Simm; Ádám Arany; Nicolas Drizard; Rama Jabal; Arina Afanasyeva; Regis Loeb; Shlok Verma; Simon Harnqvist; Matthew Holmes; Balázs Pejó; Maria Telenczuk; Nicholas Holway; Arne Dieckmann; Nicola Rieke; Friederike Zumsande; Djork-Arné Clevert; Michael Krug; Christopher Luscombe; Darren Green; Peter Ertl; Péter Antal; David Marcus; Nicolas Do Huu; Hideyoshi Fuji; Stephen Pickett; Gergely Ács; Eric Boniface; Bernd Beck; Yax Sun; Arnaud Gohier; Friedrich Rippmann; Ola Engkvist; Andreas H. Göller; Yves Moreau; Mathieu N. Galtier; Ansgar Schuffenhauer; Hugo Ceulemans: "MELLODDY: cross pharma federated learning at unprecedented scale unlocks benefits in QSAR without compromising proprietary information", Journal of Chemical Information and Modeling (JCIM)
- Bowen Liu; Balázs Pejó; Qiang Tang: "Privacy-preserving Federated Singular Value Decomposition", MDPI Journal of Applied Sciences (AppSci)
- Balázs Pejó; Nikolett Kapui: "SQLi Detection with ML: a data-source perspectiv", 20th International Conference on Security and Cryptography (SECRYPT)
- Balázs Pejó; Gergely Biczó: "Quality Inference in Federated Learning with Secure Aggregation", IEEE Transactions on Big Data (IEEE TBD)
- Martijn Oldenhof; Gergely Ács; Balázs Pejó; Ansgar Schuffenhauer; Nicholas Holway; Noé Sturm; Arne Dieckmann; Oliver Fortmeier; Eric Boniface; Clément Mayer; Arnaud Gohier; Peter Schmidtke; Ritsuya Niwayama; Dieter Kopecky; Lewis Mervin; Prakash Chandra Rathi; Lukas Friedrich; András Formanek; Péter Antal; Jordon Rahaman; Adam Zalewski; Wouter Heyndrickx; Ezron Oluoch; Manuel Stößel; Michal Vančo; David Endico; Fabien Gelus; Thaïs de Boisfossé; Adrien Darbier; Ashley Nicollet; Matthieu Blottière; Maria Telenczuk; Van Tien Nguyen; Thibaud Martinez; Camille Boillet; Kelvin Moutet; Alexandre Picosson; Aurélien Gasser; Inal Djafar; Antoine Simon; Ádám Arany; Jaak Simm; Yves Moreau; Ola Engkvist; Hugo Ceulemans; Camille Marini; Mathieu Galtier: "Industry-Scale Orchestrated Federated Learning for Drug Discovery", 35th Annual Conference on Innovative Applications of Artificial Intelligence (IAAI)