Biography
I received my Master Degree in Computer Engineering from Northeastern University (in Boston). I was a member of Northeastern NLP Group and worked with Prof. Lu Wang (now at UMich). I'm also privileged to work with Prof. Lifu Huang from Virginia Tech.
I'm generally interested in computational linguistics and natural language processing. My goal is to design methods that can understand human languages of various domains, and also generate factual and coherent outputs, with a focus on long-form discourse (e.g., argumentation, summarization). Recently I am working on improving trustworthiness and creativity of (multimodal) LLMs for various tasks.
Selected Publications
- VIVA : A Benchmark for Vision-Grounded Decision-Making with Human Values Zhe Hu, Yixiao Ren, Jing Li, Yu Yin arXiv 2024
- Cracking the Code of Juxtaposition: Can AI Models Understand the Humorous Contradictions Zhe Hu, Tuo Liang, Jing Li, Yiren Lu, Yunlai Zhou, Yiran Qiao, Jing Ma, Yu Yin arXiv 2024
- Unlocking Varied Perspectives: A Persona-Based Multi-Agent Framework with Debate-Driven Text Planning for Argument Generation Zhe Hu, Hou Pong Chan, Jing Li, Yu Yin arXiv 2024
- AMERICANO: Argument Generation with Discourse-driven Decomposition and Agent Interaction Zhe Hu, Hou Pong Chan, Yu Yin INLG 2024 (oral)
- MOCHA: A Multi-Task Training Approach for Coherent Text Generation from Cognitive Perspective Zhe Hu, Hou Pong Chan, Lifu Huang EMNLP 2022, short paper
- PLANET: Dynamic Content Planning in Autoregressive Transformers for Long-form Text Generation Zhe Hu, Hou Pong Chan, Jiachen Liu, Xinyan Xiao, Hua Wu, Lifu Huang ACL 2022
- Controllable Dialogue Generation with Disentangled Multi-grained Style Specification and Attribute Consistency Reward Zhe Hu*, Zhiwei Cao*, Hou Pong Chan, Jiachen Liu, Xinyan Xiao, Jinsong Su, Hua Wu[*equal contribution] IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP) 2022
- Context-Aware Interaction Network for Question Matching Zhe Hu, Zuohui Fu, Yu Yin, Gerard de Melo EMNLP 2021, short paper
- An Entity-Driven Framework for Abstractive Summarization Eva Sharma*, Luyang Huang*, Zhe Hu*, Lu Wang[*equal contribution] EMNLP 2019
- Argument Generation with Retrieval, Planning, and Realization Xinyu Hua, Zhe Hu, Lu Wang ACL 2019
- M.Sc in Computer Engineering, Northeastern University
- B.Eng in Communication Engineering, Beijing University of Posts and Telecommunications (BUPT)
- Program Committee Member and Reviewer:
- ACL ARR
- Knowledge-Based Systems
- 2024: COLM, NeurIPS, ICLR, AAAI, CoNLL, NLPCC
- 2023: ACL, EMNLP, AAAI, CoNLL, EMNLP NewSum workshop
- 2022: EMNLP, AAAI
- 2021: ACL, EMNLP, NAACL, EACL, AAAI
- 2020: ACL, EMNLP
- 2019: EMNLP NewSum workshop
PDF Project Abstract
Cracking the Code of Juxtaposition: Can AI Models Understand the Humorous Contradictions
Recent advancements in large multimodal language models have demonstrated remarkable proficiency across a wide range of tasks. Yet, these models still struggle with understanding the nuances of human humor through juxtaposition, particularly when it involves nonlinear narratives that underpin many jokes and humor cues. This paper investigates this challenge by focusing on comics with contradictory narratives, where each comic consists of two panels that create a humorous contradiction. We introduce the YesBut benchmark, which comprises tasks of varying difficulty aimed at assessing AI's capabilities in recognizing and interpreting these comics, ranging from literal content comprehension to deep narrative reasoning. Through extensive experimentation and analysis of recent commercial or open-sourced large (vision) language models, we assess their capability to comprehend the complex interplay of the narrative humor inherent in these comics. Our results show that even state-of-the-art models still lag behind human performance on this task. Our findings offer insights into the current limitations and potential improvements for AI in understanding human creative expressions.
PDF Code Abstract
AMERICANO: Argument Generation with Discourse-driven Decomposition and Agent Interaction
Argument generation is a challenging task in natural language processing, which requires rigorous reasoning and proper content organization. Inspired by recent chain-of-thought
prompting that breaks down a complex task into intermediate steps, we propose AMERICANO, a novel framework with agent interaction for argument generation. Our approach decomposes the generation process into sequential actions grounded on argumentation theory,
which first executes actions sequentially to generate argumentative discourse components, and then produces a final argument conditioned on the components. To further mimic the human writing process and improve the left-to-right
generation paradigm of current autoregressive language models, we introduce an argument refinement module which automatically evaluates and refines argument drafts based on feedback received. We evaluate our framework on the task of counterargument generation using a subset of Reddit/CMV dataset. The results
show that our method outperforms both end-toend and chain-of-thought prompting methods and can generate more coherent and persuasive arguments with diverse and rich contents.
PDF Code Poster Abstract
MOCHA: A Multi-Task Training Approach for Coherent Text Generation from Cognitive Perspective
Teaching neural models to generate narrative coherent texts is a critical problem. Recent pre-trained language models have achieved promising results, but there is still a gap between human written texts and machine-generated outputs. In this work, we propose a novel multi-task training strategy for long text generation grounded on the cognitive theory of writing, which empowers the model to learn essential subskills needed for writing including planning and reviewing besides end-to-end generation. We extensively evaluate our model on three open-ended generation tasks including story generation, news article writing and argument generation. Experiments show that our model achieves better results on both few-shot and fully-supervised settings than strong baselines, and human evaluations confirm that our model can generate more coherent outputs.
PDF Poster Slides Project Abstract
PLANET: Dynamic Content Planning in Autoregressive Transformers for Long-form Text Generation
Despite recent progress of pre-trained language models on generating fluent text,
existing models still suffer from incoherence in long-form text generation tasks that require proper content control
and planning to form a coherent high-level logical flow. In this work, we propose PLANET, a novel generation
framework leveraging autoregressive self-attention mechanism to conduct content planning and surface realization
dynamically. To guide the generation of output sentences, our framework enriches the Transformer decoder with
latent representations to maintain sentence-level semantic plans grounded by bag-of-words. Moreover, we introduce
a new coherence-based contrastive learning objective to further improve the coherence of output. Extensive experiments
are conducted on two challenging opinion generation tasks including counter-argument generation and opinion article
generation. Both automatic and human evaluations show that our method significantly outperforms strong baselines and
generates more coherent text with richer contents.
PDF Abstract
Controllable Dialogue Generation with Disentangled Multi-grained Style Specification and Attribute Consistency Reward
Controllable text generation is an appealing but challenging
task, which allows users to specify particular attributes of
the generated responses. In this paper, we propose a controllable
dialogue generation model to steer the response generation
under multi-attribute constraints. Specifically, we first
define and categorize the commonly-used control attributes
into global and local ones, which possess different granularities
of effects on response generation. Then, we significantly
extend the conventional Seq2seq framework by introducing
a novel two-stage decoder, which first uses a multi-grained
style specification layer to impose the stylistic constraints and
determine the word-level control states of responses based on
the attributes, and then employs a response generation layer
to generate final responses maintaining both semantic relevancy
to the contexts and fidelity to the attributes. Furthermore,
we train our model with an attribute consistency reward
to promote response control with explicit supervision signals.
Extensive experiments and in-depth analyses on two datasets
indicate that our model can significantly outperform competitive
baselines in terms of response quality, content diversity
and controllability.
PDF Poster Slides Video Abstract
Context-Aware Interaction Network for Question Matching.
Impressive milestones have been achieved in
text matching by adopting a cross-attention
mechanism to capture pertinent semantic connections between two sentences. However,
these cross-attention mechanisms focus on
word-level links between the two inputs, neglecting the importance of contextual information. We propose a context-aware interaction network (COIN) to properly align two sequences and infer their semantic relationship.
Specifically, each interaction block includes
(1) a context-aware cross-attention mechanism
to effectively integrate contextual information,
and (2) a gate fusion layer to flexibly interpolate aligned representations. We apply multiple stacked interaction blocks to produce alignments at different levels and gradually refine
the attention results. Experiments on two question matching datasets and detailed analyses
confirm the effectiveness of our model.
Errata: Equation (9) in section 2.3 should be: \(L(S_A,S_B^+, S_B^-) = \max \{0, 1 + Coh(S_A, S_B^-) - Coh(S_A, S_B^+)\} \)
PDF Poster Code Demo Abstract
An Entity-Driven Framework for Abstractive Summarization.
Abstractive summarization systems aim to produce more coherent and concise summaries than their extractive counterparts.
Popular neural models have achieved impressive results for single-document summarization, yet their outputs are often
incoherent and unfaithful to the input. In this paper, we introduce SENECA, a novel System for ENtitydrivEn Coherent
Abstractive summarization framework that leverages entity information to generate informative and coherent abstracts.
Our framework takes a two-step approach: (1) an entity-aware content selection module first identifies salient sentences
from the input, then (2) an abstract generation module conducts cross-sentence information compression and abstraction
to generate the final summary, which is trained with rewards to promote coherence, conciseness, and clarity. The two
components are further connected using reinforcement learning. Automatic evaluation shows that our model significantly
outperforms previous state-of-the-art on ROUGE and our proposed coherence measures on New York Times and CNN/Daily Mail
datasets. Human judges further rate our system summaries as more informative and coherent than those by popular
summarization models.
PDF Code Demo Abstract
Argument Generation with Retrieval, Planning, and Realization.
Automatic argument generation is an appealing but challenging task. In this paper, we study the specific problem
of counter-argument generation, and present a novel framework, CANDELA. It consists of a powerful retrieval system
and a novel two-step generation model, where a text planning decoder first decides on the main talking points and a
proper language style for each sentence, then a content realization decoder reflects the decisions and constructs an
informative paragraph-level argument. Furthermore, our generation model is empowered by a retrieval system indexed
with 12 million articles collected from Wikipedia and popular English news media, which provides access to
high-quality content with diversity. Automatic evaluation on a large-scale dataset collected from Reddit shows
that our model yields significantly higher BLEU, ROUGE, and METEOR scores than the state-of-the-art and non-trivial
comparisons. Human evaluation further indicates that our system arguments are more appropriate for refutation and
richer in content.