What's New
[Feb 2022] One paper accepted at ACL 2022!
[Aug 2021] One paper accepted at EMNLP 2021!
[Nov 2020] Attended EMNLP 2020 (virtual)
[Nov 2019] Travel to Hong Kong, China - EMNLP 2019
[Aug 2019] Our paper "An Entity-Driven Framework for Abstractive Summarization" is accepted at EMNLP 2019!
Biography
I am a senior research engineer at Baidu Inc.
I received my Master Degree in Computer Engineering from Northeastern University (in Boston).
I was a member of Northeastern NLP Group and worked with Prof. Lu Wang (now at UMich) on text generation and abstractive summarization.
Research Interests
I'm generally interested in computational linguistics and natural language processing. My goal is to design methods that can understand human languages from large-scale data of various domains, and also generate rich and coherent outputs of high quality and controllability.
Recently, I'm working on topics including:
- controllable text generation and long-form text generation;
- coherent and high-quality summary extraction and generation;
- natural language inference and textual semantic similarity modeling;
Experience
Publications
- PLANET: Dynamic Content Planning in Autoregressive Transformers for Long-form Text Generation
Zhe Hu, Hou Pong Chan, Jiachen Liu, Xinyan Xiao, Hua Wu, Lifu Huang
In Proceedings of the Association for Computational Linguistics (ACL), 2022. (to appear)
PDF
Poster
Slides
Project
Abstract
Despite recent progress of pre-trained language models on generating fluent text,
existing models still suffer from incoherence in long-form text generation tasks that require proper content control
and planning to form a coherent high-level logical flow. In this work, we propose PLANET, a novel generation
framework leveraging autoregressive self-attention mechanism to conduct content planning and surface realization
dynamically. To guide the generation of output sentences, our framework enriches the Transformer decoder with
latent representations to maintain sentence-level semantic plans grounded by bag-of-words. Moreover, we introduce
a new coherence-based contrastive learning objective to further improve the coherence of output. Extensive experiments
are conducted on two challenging opinion generation tasks including counter-argument generation and opinion article
generation. Both automatic and human evaluations show that our method significantly outperforms strong baselines and
generates more coherent text with richer contents.
- Context-Aware Interaction Network for Question Matching
Zhe Hu, Zuohui Fu, Yu Yin, Gerard de Melo
In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), short paper, 2021.
PDF
Poster
Slides
Video
Abstract
Impressive milestones have been achieved in
text matching by adopting a cross-attention
mechanism to capture pertinent semantic connections between two sentences. However,
these cross-attention mechanisms focus on
word-level links between the two inputs, neglecting the importance of contextual information. We propose a context-aware interaction network (COIN) to properly align two sequences and infer their semantic relationship.
Specifically, each interaction block includes
(1) a context-aware cross-attention mechanism
to effectively integrate contextual information,
and (2) a gate fusion layer to flexibly interpolate aligned representations. We apply multiple stacked interaction blocks to produce alignments at different levels and gradually refine
the attention results. Experiments on two question matching datasets and detailed analyses
confirm the effectiveness of our model.
- Controllable Dialogue Generation with Disentangled Multi-grained Style Specification and Attribute Consistency Reward
Zhe Hu*, Zhiwei Cao*, Hou Pong Chan, Jiachen Liu, Xinyan Xiao, Jinsong Su, Hua Wu[*equal contribution]
In arXiv, 2021
PDF
Abstract
Controllable text generation is an appealing but challenging
task, which allows users to specify particular attributes of
the generated responses. In this paper, we propose a controllable
dialogue generation model to steer the response generation
under multi-attribute constraints. Specifically, we first
define and categorize the commonly-used control attributes
into global and local ones, which possess different granularities
of effects on response generation. Then, we significantly
extend the conventional Seq2seq framework by introducing
a novel two-stage decoder, which first uses a multi-grained
style specification layer to impose the stylistic constraints and
determine the word-level control states of responses based on
the attributes, and then employs a response generation layer
to generate final responses maintaining both semantic relevancy
to the contexts and fidelity to the attributes. Furthermore,
we train our model with an attribute consistency reward
to promote response control with explicit supervision signals.
Extensive experiments and in-depth analyses on two datasets
indicate that our model can significantly outperform competitive
baselines in terms of response quality, content diversity
and controllability.
- Enhanced Sentence Alignment Network for Efficient Short Text Matching
Zhe Hu, Zuohui Fu, Cheng Peng, Weiwei Wang
In Proceedings of EMNLP 2020 Workshop on Noisy User-generated Text (EMNLP Workshop), short paper, 2020.
PDF
Slides
Abstract
Cross-sentence attention has been widely applied in text matching, in which model learns
the aligned information between two intermediate
sequence representations to capture their
semantic relationship. However, the intermediate
representations are generated based on
the preceding layers and the models may suffer
from error propagation and unstable matching,
especially when multiple attention layers
are used. In this paper, we propose an
enhanced sentence alignment network with
simple gated feature augmentation, where the
model is able to flexibly integrate both original
word and contextual features to improve
the cross-sentence attention. Moreover, our
model is less complex with fewer parameters
compared to many state-of-the-art structures.
Experiments on three benchmark datasets validate
our model capacity for text matching.
- An Entity-Driven Framework for Abstractive Summarization
Eva Sharma*, Luyang Huang*, Zhe Hu*, Lu Wang[*equal contribution]
In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), 2019.
Errata: Equation (9) in section 2.3 should be: \(L(S_A,S_B^+, S_B^-) = \max \{0, 1 + Coh(S_A, S_B^-) - Coh(S_A, S_B^+)\} \)
PDF
Poster
Code
Demo
Abstract
Abstractive summarization systems aim to produce more coherent and concise summaries than their extractive counterparts.
Popular neural models have achieved impressive results for single-document summarization, yet their outputs are often
incoherent and unfaithful to the input. In this paper, we introduce SENECA, a novel System for ENtitydrivEn Coherent
Abstractive summarization framework that leverages entity information to generate informative and coherent abstracts.
Our framework takes a two-step approach: (1) an entity-aware content selection module first identifies salient sentences
from the input, then (2) an abstract generation module conducts cross-sentence information compression and abstraction
to generate the final summary, which is trained with rewards to promote coherence, conciseness, and clarity. The two
components are further connected using reinforcement learning. Automatic evaluation shows that our model significantly
outperforms previous state-of-the-art on ROUGE and our proposed coherence measures on New York Times and CNN/Daily Mail
datasets. Human judges further rate our system summaries as more informative and coherent than those by popular
summarization models.
- Argument Generation with Retrieval, Planning, and Realization
Xinyu Hua, Zhe Hu, Lu Wang
In Proceedings of the Association for Computational Linguistics (ACL), 2019.
PDF
Code
Demo
Abstract
Automatic argument generation is an appealing but challenging task. In this paper, we study the specific problem
of counter-argument generation, and present a novel framework, CANDELA. It consists of a powerful retrieval system
and a novel two-step generation model, where a text planning decoder first decides on the main talking points and a
proper language style for each sentence, then a content realization decoder reflects the decisions and constructs an
informative paragraph-level argument. Furthermore, our generation model is empowered by a retrieval system indexed
with 12 million articles collected from Wikipedia and popular English news media, which provides access to
high-quality content with diversity. Automatic evaluation on a large-scale dataset collected from Reddit shows
that our model yields significantly higher BLEU, ROUGE, and METEOR scores than the state-of-the-art and non-trivial
comparisons. Human evaluation further indicates that our system arguments are more appropriate for refutation and
richer in content.
Education
Services
- Program Committee Member:
- 2021: ACL, EMNLP, NAACL, EACL, AAAI
- 2019: EMNLP NewSum workshop