Fb Fellow Highlight: Striving for provable ensures within the theoretical foundations of machine studying

Every year, PhD students from around the world apply for the Facebook Fellowship, a program to promote and support PhD students doing innovative and relevant research in the fields of computer science and engineering.

As a continuation of our Fellowship Spotlight series, we highlight the Facebook Fellow 2020 in applied statistics Lydia Zakynthinou.

Lydia is a PhD student at Northeastern University’s Khoury College of Computer Science, where she is advised by Jonathan Ullman and Huy Lê Nguyễn. Her research focuses on the theoretical foundations of machine learning and data protection.

During her studies at the National Technical University of Athens in Greece, Lydia developed an interest in the theoretical foundations of machine learning and algorithms. She was particularly fascinated by algorithms as they can be applied directly to solving real-world problems, especially in a world that values ​​big data.

“Algorithms are everywhere,” says Lydia. “But it is a challenge to determine the trade-offs between the resources they consume, such as computational speed, accuracy, loss of data protection, and amount of data, so that we as researchers can make informed decisions about the algorithms we use. “She points to a simple example of such a compromise:” Sometimes training an entire deep neural network is really slow, but it’s the best we have in terms of accuracy. “That encouraged Lydia to dig deeper into the theoretical Address the basics of machine learning.

Lydia’s research seeks to answer two main questions:

  • How can you ensure that an algorithm generalizes well and does not over-fit the data set?
  • How can it be ensured that the confidentiality of personal data is guaranteed?

The effectiveness of an algorithm depends on its ability to learn more about the population to which it is being applied. But algorithms are designed to learn and to be accurate about the dataset they are trained on, which leads to two undesirable phenomena: overfitting (d leakage. This is where generalization and differential privacy come into play.

When an algorithm generalizes well, its performance for the data set is guaranteed to be close to that for the population. There are currently many frameworks that seek to achieve this, but they are often incompatible with each other. Lydia’s work proposes a new framework that brings together current theories aimed at understanding the properties that an algorithm must have to ensure generalization.

Differential privacy deals with the second side effect, loss of privacy. It is a mathematically rigorous technique that essentially guarantees that no attacker, regardless of their additional knowledge, can find out much more about a person than they would have if that person’s data had never been included in the data set. It has become the standard criterion for ensuring privacy in machine learning models and has been adopted in several real world applications. “By nature, differential privacy also ensures generalization,” Lydia emphasizes.

Lydia’s work analyzes core statistical problems and suggests a theoretical framework that unifies current theories and makes it possible to develop new algorithms that achieve different levels of data protection and generalize well to the population to which they are applied. “In general, we should strive for verifiable guarantees,” says Lydia, especially when it comes to data protection. “Because machine learning is applied that way, I have to make sure that [an algorithm] behaves as we think. “

To learn more about Lydia Zakynthinou and her research, visit her Website.

Comments are closed.