Conceitos essenciais
ChatGPT demonstrates proficiency in identifying topic structures but struggles with hierarchical rhetorical structures in dialogue analysis.
Resumo
In the study, ChatGPT's performance in discourse analysis tasks was evaluated. It showed good understanding of linear topic structures but faced challenges with hierarchical rhetorical structures. The impact of In Context Learning and prompt components on its performance was also explored.
Large language models like ChatGPT have shown remarkable capabilities in various natural language tasks. However, their ability to understand discourse structures remains less explored. The study aimed to systematically inspect ChatGPT’s performance in two discourse analysis tasks: topic segmentation and discourse parsing, focusing on its deep semantic understanding of linear and hierarchical discourse structures underlying dialogue.
To instruct ChatGPT for these tasks, a prompt template consisting of task description, output format, and structured input was crafted. Experiments were conducted on popular datasets for both tasks. Results showed that while ChatGPT excelled at identifying topic structures in general-domain conversations, it struggled with specific-domain conversations and hierarchical rhetorical structures.
Further investigation into the impact of In Context Learning (ICL) revealed that ICL could enhance ChatGPT's understanding of hierarchical structures significantly. Ablation studies on prompt components indicated that the output format played a crucial role in performance.
However, despite its capabilities, ChatGPT still faced challenges with robustness and following specified formats consistently. Case studies highlighted its success in understanding linear topics but failure in grasping complex hierarchical relations.
The study provides insights into the potential and limitations of large language models like ChatGPT for discourse analysis tasks.
Estatísticas
DialSeg711 consists of 711 English dialogues.
CNTD dataset contains 1041 Chinese chitchat conversations.
TIAGE dataset includes 300 English chitchat dialogues.
ZYS dataset comprises 505 Chinese banking consultation conversations.
STAC dataset has 1,062 dialogues from an online game.
Molweni dataset is based on Ubuntu Chat with 9,000 instances for training.
Citações
"ChatGPT demonstrates proficiency in identifying topic structures but struggles considerably with hierarchical rhetorical structures."
"In-depth investigation indicates that ChatGPT can give more reasonable topic structures than human annotations but only linearly parses the hierarchical rhetorical structures."