Computer Science PhD Student

Bar Ilan University

Hello! I am a second year PhD student in the Natural Language Processing Lab at Bar-Ilan University, supervised by prof. Yoav Goldberg. I’m working on bridging the gap between modalities, and then revealing the capabilities multimodal models have when considering more than one modality at a time. Currently, my research is focused on image-text alignment and controlled generation (what kind of changes do we need to make to generate images that are more faithful to the input prompt?). I find it particularly interesting to work on better architectures, which can be faster (not attention based?) or simply more scalable than current ones.

Previously, I obtained my Masters in Computer Science, and was jointly supervised by Yoav Goldberg and Reut Tsarfaty. Then, I was driven by fascination with underspecified language. Why do models often misinterpret ambiguous language, and we don’t? How can we make implicit information more explicit? My thesis work aimed to answer these questions in the context of verbal omissions in coordination structures. As a fun detour, I stumbled upon intriguing behavior by DALL-E when fed with ambiguous language, where it depicts multiple interpretations in a single image, instead of settling on one. We detailed this behavior in a short paper.

My CV is available here.

Education

  • PhD in Computer Science (in-progress)

    Bar Ilan University

  • MSc in Computer Science

    Bar Ilan University

Recent Publications

Linguistic Binding in Diffusion Models: Enhancing Attribute Correspondence through Attention Map Alignment

We propose SynGen, an inference-time method which first syntactically analyses the prompt to identify entities and their modifiers, and then uses a novel loss function that encourages the cross-attention maps to agree with the linguistic binding reflected by the syntax.
Linguistic Binding in Diffusion Models: Enhancing Attribute Correspondence through Attention Map Alignment

Conjunct Resolution in The Face of Verbal Omissions

This work establishes a pragmatic framework for understanding verbal omissions in VP coordination structures, devises a scalable data collection method, and curates a large dataset with over 10,000 natural examples and crowd-sourced solutions. We show current neural baseline models demonstrate moderate success in resolving these omissions, with ample room for improvement.
Conjunct Resolution in The Face of Verbal Omissions

DALLE-2 is Seeing Double: Flaws in Word-to-Concept Mapping in Text2Image Models

We point out two surprising flaws in the way text2image models map words to visual concepts. For instance, we demonstrate a semantic leakage between different words in the prompt, and cases where words with multiple meanings are depicted with all their meanings at once.
DALLE-2 is Seeing Double: Flaws in Word-to-Concept Mapping in Text2Image Models