Recent work on bilingual training has demonstrated the benefits of inducing aligned word embeddings that mitigate data sparsity. For example, [1] employs a multitask framework leveraging co-occurrence statistics from parallel data to generate shared representations, while [2] extends the distributional hypothesis to multilingual settings by inducing joint-space embeddings that capture compositional semantics without explicit word alignments. These approaches establish a solid foundation for our model, which harnesses bilingual training to capture both semantic representations and context-dependent language production in a grounded color reference task.
Additional research has illustrated the effectiveness of sharing representations across languages with minimal architectural changes. In [3], the use of an artificial token to indicate the target language enabled a single neural machine translation model to perform zero-shot translation, illustrating implicit crosslingual transfer. Moreover, [4] demonstrates that a unified multi-task model spanning diverse domains attains competitive performance by learning shared representations. These insights motivate our bilingual strategy, where a shared vocabulary supports the generation of language-specific utterances while benefiting from cross-lingual inductive biases.
In summary, our work draws on the complementary strengths of bilingual representation learning from [1] and [2] and crosslingual sharing techniques from [3] and [4]. We introduce a bilingual model for color reference games that not only exhibits human-like contextual sensitivity and improves pragmatic informativeness but also faithfully captures language-specific semantic distinctions. This contribution extends the current literature on bilingual and multilingual models in grounded communication tasks by effectively integrating semantic understanding with pragmatic language production.