This page is on the 2022 long paper on Conversational Question Answering on Heterogeneous Sources. We provide an extended video as an introduction to the work (a shorter video is also available here).
GitHub link to CONVINSE code Directly download CONVINSE code


Please do not use this demo for comparison purposes! This demo is hosted only for demonstrating the general workflow, and the construction of the structured representation (SR). For running the method on a CPU, we adjusted several parts of the pipeline, which can lead to decreased performance.


Conversational question answering (ConvQA) tackles sequential information needs where contexts in follow-up questions are left implicit. Current ConvQA systems operate over homogeneous sources of information: either a knowledge base (KB), or a text corpus, or a collection of tables. This paper addresses the novel issue of jointly tapping into all of these together, this way boosting answer coverage and confidence. We present CONVINSE, an end-to-end pipeline for ConvQA over heterogeneous sources, operating in three stages: i) learning an explicit structured representation of an incoming question and its conversational context, ii) harnessing this frame-like representation to uniformly capture relevant evidences from KB, text, and tables, and iii) running a fusion-in-decoder model to generate the answer. We construct and release the first benchmark, ConvMix, for ConvQA over heterogeneous sources, comprising 3000 real-user conversations with 16000 questions, along with entity annotations, completed question utterances, and question paraphrases. Experiments demonstrate the viability and advantages of our method, compared to state-of-the-art baselines.


For feedback and clarifications, please contact: Philipp Christmann (pchristm AT mpi HYPHEN inf DOT mpg DOT de), Rishiraj Saha Roy (rishiraj AT mpi HYPHEN inf DOT mpg DOT de) or Gerhard Weikum (weikum AT mpi HYPHEN inf DOT mpg DOT de).

To know more about our group, please visit

Download ConvMix

Training Set (1680 Conversations) Dev Set (560 Conversations) Test Set (760 Conversations) Please check out CompMix, our new dataset for heterogeneous QA, collating the completed versions of the conversational questions in ConvMix. The ConvMix and CompMix benchmarks are licensed under a Creative Commons Attribution 4.0 International License.
Creative Commons License

ConvMix Leaderboard

Model P@1 MRR Hit@5
Christmann et al. '23
0.406 0.471 0.561
CONVINSE (top-k FiD) 0.343 0.378 0.431
Christmann et al. '22
0.342 0.365 0.386
Question Resolution
Voskarides et al. '20
+ BM25 + FiD
0.282 0.289 0.297
Question Rewriting
Raposo et al. '22
+ BM25 + FiD
0.271 0.278 0.285

What do conversations in ConvMix look like?

Who wrote Slaughterhouse-Five?
Kurt Vonnegut
[KB, Text, Infobox]
Which war is discussed in the book?
World War II
[KB, Text]
What year was it’s first film adaptation released?
[KB, Text, Table, Infobox]
Who directed it?
George Roy Hill
[KB, Text, Table, Infobox]
What was the final film that he made?
Funny Farm
[KB, Text, Table]

Who played Ron in the Harry Potter movies?
Rupert Grint
[KB, Text]
Who played Dumbledore?
R. Harris, M. Gambon
[Text, Table]
What’s the run time for all the movies combined?
1179 minutes
[KB, Infobox]
Who was the production designer for the films?
Stuart Craig
[KB, Text, Table]
Which movie did he win an award for working on in 1980?
The Elephant Man

What was the last album recorded by the Beatles?
Let It Be
[KB, Text, Table]
Where was their last paying concert held?
Candlestick Park
What year did they break up?
[KB, Text, Infobox]
Who was their manager?
Brian Epstein
[KB, Text]
What was their nickname?
Fab Four
[KB, Text]

Who is the actor of Rick Grimes in The Walking Dead?
Andrew Lincoln
[KB, Text, Table]
What about Daryl Dixon?
Norman Reedus
[KB, Text, Table]
did he also play in Saturday night live?
whom did he play?
Daryl Dixon
production company of the series?
NBC Studios
[KB, Text, Infobox]

Which national team does Kylian Mbappé play soccer for?
France football team
[KB, Text, Infobox, Table]
How many goals did he score for his home country in 2018?
place of his birth?
[KB, Text, Infobox]
award he got in 2017?
Golden Boy
[KB, Table]
Who is the award conferred by?
[KB, Text, Infobox]

The sources in square brackets are the ones the respective answer can be found in.

How was ConvMix created?

The ConvMix benchmark was created by real humans, and we tried to ensure that the collected data is as natural as possible. Overall, it contains 3,000 conversations with 16,000 unique questions. Master crowdworkers on Amazon Mechanical Turk (AMT) selected an entity of interest in a specific domain, and then started issuing conversational questions on this entity, potentially drifting to other topics of interest throughout the course of the conversation. By letting users choose the entities themselves, we aimed to ensure that they are more interested into the topics the conversations are based on. After writing a question, users were asked to find the answer in eithers Wikidata, Wikipedia text, a Wikipedia table or a Wikipedia infobox, whatever they find more natural for the specific question at hand. Since Wikidata requires some basic understanding of knowledge bases, we provided video guidelines that illustrated how Wikidata can be used for detecting answers, following an example conversation. We provide not only the question and answer, but also the answer source the user found the answer in, a paraphrase, a completed question, and question entities. For further details on ConvMix, please refer to the paper.


"Conversational Question Answering on Heterogeneous Sources", Philipp Christmann, Rishiraj Saha Roy, and Gerhard Weikum. In SIGIR '22, Madrid, Spain, 11 - 15 July 2022.
[Preprint] [Code] [Poster] [Slides] [Video] [Extended video] [User study]