Publications

Full publication list is given below. Please see my Google Scholar profile for bibliometrics.

To Appear

D. Setiawan, R. Merx, and J.H. Lau (to appear). Context Volume Drives Performance: Tackling Domain Shift in Extremely Low-Resource Translation via RAG. In Proceedings of the Ninth Workshop on Technologies for Machine Translation of Low-Resource Languages (LoResMT 2026).
K. Kurniawan, M. Mistica, T. Baldwin, and J.H. Lau (to appear). On the Interplay between Human Label Variation and Model Fairness. In Findings of the Association for Computational Linguistics: EACL 2026. [code]
Y. Otmakhova, T.H. Truong, R. Mahendra, Z. Zhai, R. Zhu, D. Beck, and J.H. Lau (to appear). FLUKE: A Linguistically-Driven and Task-Agnostic Framework for Robustness Evaluation. In Findings of the Association for Computational Linguistics: EACL 2026. [code]
R. Xing, P. Nakov, T. Baldwin, and J.H. Lau (to appear). COMMUNITYNOTES: A Dataset for Exploring the Helpfulness of Fact-Checking Explanations. In Findings of the Association for Computational Linguistics: EACL 2026. [code]

2025

K. Kurniawan, M. Mistica, T. Baldwin, and J.H. Lau (2025). Training and Evaluating with Human Label Variation: An Empirical Study. In Computational Linguistics, pages 1—27. [code]
C Hicks, K. Davidson, J.H. Lau, and T. Nguyen (2025). Implications of Declaration of Climate Emergency on Australian Local Government Policy in the State of Victoria: Policy Analysis Utilising an LLM-based Retriever-Reader Pipeline. In Climatic Change, Vol 178, pages 191—214. [code]
S. Gao, J.H. Lau, and J. Qi (2025). Beyond Seen Data: Improving KBQA Generalization Through Schema-Guided Logical Form Generation. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing (EMNLP 2025), Suzhou, China, pages 8764—8783. [code]
X. Ma, Y. Jiang, S. Erfani, J. Bailey, W. Liu, K. Ehinger, and J.H. Lau (2025). Reasoning Like Experts: Leveraging Multimodal Large Language Models for Drawing-based Psychoanalysis. In Proceedings of the 33rd ACM International Conference on Multimedia (MM 2025), Dublin, Ireland, pages 5090—5099. [code]
R. Gao, X. Wu, T. Kuribayashi, M. Ye, S. Qi, C. Roever, Y. Liu, Z. Yuan, and J.H. Lau (2025). Can LLMs Simulate L2-English Dialogue? An Information-Theoretic Analysis of L1-Dependent Biases. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (ACL 2025), Vienna, Austria, pages 4355—4379. [code]
A. Shetty, Q. Xu, and J.H. Lau (2025). WET: Overcoming Paraphrasing Vulnerabilities in Embeddings-as-a-Service with Linear Transformation Watermarks. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (ACL 2025), Vienna, Austria, pages 23024—23043. [code]
Y. Jiang, Y. Ding, C. Lei, J. Ao, J.H. Lau, and K. Ehinger (2025). Beyond Perception: Evaluating Abstract Visual Reasoning through Multi-Stage Task. In Findings of the Association for Computational Linguistics: ACL 2025, Vienna, Austria, pages 13—45. [code]
R. Gao, M. Chen, L. Frermann, and J.H. Lau (2025). Moderation Matters: Measuring Conversational Moderation Impact in English as a Second Language Group Discussion. In Findings of the Association for Computational Linguistics: ACL 2025, Vienna, Austria, pages 2070—2095. [code]
M. Li, J.H. Lau, E. Hovy, and M. Lapata (2025). Decomposed Opinion Summarization with Verified Aspect-Aware Modules. In Findings of the Association for Computational Linguistics: ACL 2025, Vienna, Austria, pages 24805—24841. [code]
R. Xing, B. Sun, K. Zhang, P. Nakov, T. Baldwin, and J.H. Lau (2025). An Analytical Emotion Framework of Rumour Threads on Social Media. In Proceedings of the 1st Workshop on Misinformation Detection in the Era of LLMs (MisD 2025), Copenhagen, Denmark.
P. Bowers, T. Ryan, K. Graydon, J.H. Lau, and D. Tomlin (2025). Healthcare Educators' Perspectives on Artificial Intelligence Driven Virtual Patients for Teaching Communication Skills. In Interactive Technology and Smart Education, pages 97—112.
M. Chen, L. Frermann, and J.H. Lau (2025). WHoW: A Cross-domain Approach for Analysing Conversation Moderation. In Proceedings of the 2025 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL HLT 2025), Albuquerque, New Mexico, pages 2091—2126. [code]
R. Gao, J. Wu, C. Roever, X. Wu, L. Lv, and J.H. Lau (2025). An Interpretable and Crosslingual Method for Evaluating Second-Language Dialogues. In Proceedings of the 2025 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL HLT 2025), Albuquerque, New Mexico, pages 1979—2008. [code]
R. Xing, T. Baldwin, and J.H. Lau (2025). Evaluating Evidence Attribution in Generated Fact Checking Explanations. In Proceedings of the 2025 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL HLT 2025), Albuquerque, New Mexico, pages 5475—5496. [code]
T. Aoki, J.H. Lau, H. Kamigaito, H. Takamura, T. Baldwin, and M. Okumura (2025). Discovering Unusual Word Usages with Masked Language Model via Pseudo-label Training. In Journal of Natural Language Processing, Vol 32, pages 134-175.
R. Zhu, J.H. Lau, and J. Qi (2025). Factual Dialogue Summarization via Learning from Large Language Models. In Proceedings of the 31st International Conference on Computational Linguistics (COLING 2025), Abu Dhabi, United Arab Emirates, pages 4474—4492. [code]
R. Gao, C. Roever, and J.H. Lau (2025). Interaction Matters: An Evaluation Framework for Interactive Dialogue Assessment on English Second Language Conversations. In Proceedings of the 31st International Conference on Computational Linguistics (COLING 2025), Abu Dhabi, United Arab Emirates, pages 10977—11012. [code]

2024

T. Simonds, K. Kurniawan, and J.H. Lau (2024). MoDEM: Mixture of Domain Expert Models. In Proceedings of the 22nd Annual Workshop of the Australasian Language Technology Association (ALTA 2024), Canberra, Australia, pages 75—88.
K. Kurniawan, M. Mistica, T. Baldwin, and J.H. Lau (2024). To Aggregate or Not to Aggregate. That is the Question: A Case Study on Annotation Subjectivity in Span Prediction. In Proceedings of the 14th Workshop on Computational Approaches to Subjectivity, Sentiment, & Social Media Analysis (WASSA 2024), Bangkok, Thailand, pages 362—368. [code]
Y. Jiang, K. Ehinger, and J.H. Lau (2024). KALE: An Artwork Image Captioning System Augmented with Heterogeneous Graph. In Proceedings of the 33rd International Joint Conference on Artificial Intelligence (IJCAI 2024), Jeju Island, South Korea, pages 7663—7671. [code]
P. Bowers, K. Graydon, T. Ryan, J.H. Lau, and D. Tomlin (2024). Artificial Intelligence Driven Virtual Patients For Communication Skill Development In Healthcare Students: A Scoping Review. In Australasian Journal of Educational Technology, Vol 40, pages 39—57.
M. Li, J.H. Lau, and E. Hovy (2024). A Sentiment Consolidation Framework for Meta-Review Generation. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (ACL 2024), Bangkok, Thailand, pages 10158—10177. [code]
L. Tian, X. Zhang, and J.H. Lau (2024). CMA-R: Causal Mediation Analysis for Explaining Rumour Detection. In Findings of the Association for Computational Linguistics: EACL 2024, St. Julian's, Malta, pages 1667—1675. [code]

2023

M. Chen, J.H. Lau, and L. Frermann (2023). The Uncivil Empathy: Investigating the Relation Between Empathy and Toxicity in Online Mental Health Support Forums. In Proceedings of the 21st Annual Workshop of the Australasian Language Technology Association (ALTA 2023), Melbourne, Australia, pages 136—147.
C. Shu, Z. Zhang, Y. Chen, J. Xiao, J.H. Lau, Q. Zhang, and Z. Lu (2023). Open Domain Response Generation Guided by Retrieved Conversations. In IEEE Access, Vol 11, pages 99365—99375.
Z. Xie, M. Li, T. Cohn, and J.H. Lau (2023). DeltaScore: Story Evaluation with Perturbations. In Findings of the Association for Computational Linguistics: EMNLP 2023, Singapore. [code]
T. Wada, T. Baldwin, and J.H. Lau (2023). Unsupervised Lexical Simplification with Context Augmentation. In Findings of the Association for Computational Linguistics: EMNLP 2023, Singapore. [code]
M. Li, E. Hovy, and J.H. Lau (2023). Summarizing Multiple Documents with Conversational Structure for Meta-Review Generation. In Findings of the Association for Computational Linguistics: EMNLP 2023, Singapore. [code]
R. Zhu, J. Qi, and J.H. Lau (2023). Annotating and Detecting Fine-grained Factual Inconsistencies for Dialogue Summarization. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (ACL 2023), Toronto, Canada, pages 6825—6845. [code]
T. Wada, Y. Matsumoto, T. Baldwin, and J.H. Lau (2023). Unsupervised Paraphrasing of Multiword Expressions. In Findings of the Association for Computational Linguistics: ACL 2023, Toronto, Canada, pages 4732—4746. [code]
M. Li, J. Qi, and J.H. Lau (2023). Compressed Heterogeneous Graph for Abstractive Multi-document Summarization. In Proceedings of the 37th AAAI Conference on Artificial Intelligence (AAAI 2023), Washington DC, USA, pages 13085—13093. [code]
Z. Xie, T. Cohn, and J.H. Lau (2023). The Next Chapter: A Study of Large Language Models in Storytelling. In Proceedings of The 16th International Natural Language Generation Conference (INLG 2023). [code]
L. Tian, X. Zhang, and J.H. Lau (2023). MetaTroll: Few-shot Detection of State-Sponsored Trolls with Transformer Adapters. In Proceedings of the Web Conference 2023 (WWW 2023), Austin, USA, pages 1743—1753. [code]
G. Winata, A. Aji, S. Cahyawijaya, R. Mahendra, F. Koto, A. Romadhony, K. Kurniawan, D. Moeljadi, R. Prasojo, P. Fung, T. Baldwin, J.H. Lau, R. Sennrich, and S. Ruder (2023). NusaX: Multilingual Parallel Sentiment Dataset for 10 Indonesian Local Languages. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2023), Dubrovnik, Croatia, pages 815—834. [code]
Z. Zhang, C. Shu, Y. Xiao, Y. Shen, D. Zhu, Y. Chen, J. Xiao, J.H. Lau, Q. Zhang, and Z. Lu (2023). Improving Visual-Semantic Embedding with Adaptive Pooling and Optimization Objective. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2023), Dubrovnik, Croatia, pages 1209—1221. [code]
S. Šuster, T. Baldwin, J.H. Lau, A. Jimeno-Yepes, D. Martinez, Y. Otmakhova, and K. Verspoor (2023). Automating Quality Assessment of Medical Evidence in Systematic Reviews: Model Development and Validation Study. In Journal of Medical Internet Research, Vol 25, pages 1543—1562.

2022

S. Yang, X. Huang, R. Zhang, J.H. Lau, and S. Erfani (2022). Robust Task-Oriented Dialogue Generation with Contrastive Pre-training and Adversarial Filtering. In Findings of the Association for Computational Linguistics: EMNLP 2022, Abu Dhabi, United Arab Emirates, pages 1220—1234. [code]
Y. Otmakhova, K. Verspoor, T. Baldwin, A. Jimeno-Yepes, and J.H. Lau (2022). M3: Multi-level dataset for Multi-document summarisation of Medical studies. In Findings of the Association for Computational Linguistics: EMNLP 2022, Abu Dhabi, United Arab Emirates, pages 1450—1461. [code]
R. Xing, S. Bhatia, T. Baldwin, and J.H. Lau (2022). Automatic Explanation Generation For Climate Science Claims. In Proceedings of The 20th Annual Workshop of the Australasian Language Technology Association (ALTA 2022), Adelaide, Australia, pages 122—129. [code]
T.H. Truong, Y. Otmakhova, T. Baldwin, T. Cohn, J.H. Lau, and K. Verspoor (2022). Not another Negation Benchmark: The NaN-NLI Test Suite for Sub-clausal Negation. In Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (AACL-IJCNLP 2022), pages 883—894. [code]
F. Koto, T. Baldwin, and J.H. Lau (2022). LipKey: A Large-Scale News Dataset for Absent Keyphrases Generation and Abstractive Summarization. In Proceedings of the 29th International Conference on Computational Linguistics (COLING 2022), Gyeongju, Republic of Korea, pages 3427—3437. [code]
T. Wada, T. Baldwin, Y. Matsumoto, and J.H. Lau (2022). Unsupervised Lexical Substitution with Decontextualised Embeddings. In Proceedings of the 29th International Conference on Computational Linguistics (COLING 2022), Gyeongju, Republic of Korea, pages 4172—4185. [code]
Y. Otmakhova, T.H. Truong, T. Baldwin, T. Cohn, K. Verspoor, and J.H. Lau (2022). LED Down the Rabbit Hole: Exploring the Potential of Global Attention for Biomedical Multi-document Summarisation. In Proceedings of the Third Workshop on Scholarly Document Processing, Gyeongju, Republic of Korea, pages 181—187. [code]
A. Shen, F. Koto, J.H. Lau, and T. Baldwin (2022). Easy-First Bottom-Up Discourse Parsing via Sequence Labelling. In Proceedings of the 3rd Workshop on Computational Approaches to Discourse (CODI 2022), Gyeongju, Republic of Korea, pages 35—41. [code]
L. Tian, X. Zhang, and J.H. Lau (2022). DUCK: Rumour Detection on Social Media by Modelling User and Comment Propagation Networks. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL HLT 2022), pages 4939—4949. [code]
Y. Otmakhova, K. Verspoor, and J.H. Lau (2022). Cross-linguistic Comparison of Linguistic Feature Encoding in BERT Models for Typologically Different Languages. In Proceedings of the 4th Workshop on Research in Computational Typology and Multilingual NLP (SIGTYP 2022), pages 27—35.
F. Koto, J.H. Lau, and T. Baldwin (2022). Can Pretrained Language Models Generate Persuasive, Faithful, and Informative Ads Text for Product Descriptions?. In Proceedings of the Fifth Workshop on e-Commerce and NLP (ECNLP 2022), pages 234—243.
F. Koto, T. Baldwin, and J.H. Lau (2022). Cloze Evaluation for Deeper Understanding of Commonsense Stories in Indonesian. In Proceedings of the Commonsense Representation and Reasoning Workshop 2022 (CSRR 2022), pages 8—16. [code]
A. Aji, G. Winata, F. Koto, S. Cahyawijaya, A. Romadhony, R. Mahendra, K. Kurniawan, D. Moeljadi, R. Prasojo, T. Baldwin, J.H. Lau, and S. Ruder (2022). One Country, 700+ Languages: NLP Challenges for Underrepresented Languages and Dialects in Indonesia. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (ACL 2022), pages 7226—7249.
Y. Otmakhova, K. Verspoor, T. Baldwin, and J.H. Lau (2022). The Patient Is More Dead Than Alive: Exploring The Current State of The Multi-document Summarisation of The Biomedical Literature. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (ACL 2022), pages 5098—5111.
S. Yang, R. Zhang, S. Erfani, and J.H. Lau (2022). An Interpretable Neuro-Symbolic Reasoning Framework for Task-Oriented Dialogue Generation. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (ACL 2022), pages 4918—4935. [code]
F. Koto, T. Baldwin, and J.H. Lau (2022). FFCI: A Framework for Interpretable Automatic Evaluation of Summarization. In Journal of Artificial Intelligence Research, Vol 73, pages 1553—1607. [code]

2021

Z. Xie, J.H. Lau, and T. Cohn (2021). Exploring Story Generation with Multi-task Objectives in Variational Autoencoders. In Proceedings of The 19th Annual Workshop of the Australasian Language Technology Association (ALTA 2021), pages 97—106.
R. Zhu, J.H. Lau, and J. Qi (2021). Findings on Conversation Disentanglement. In Proceedings of The 19th Annual Workshop of the Australasian Language Technology Association (ALTA 2021), pages 1—11.
F. Koto, J.H. Lau, and T. Baldwin (2021). IndoBERTweet: A Pretrained Language Model for Indonesian Twitter with Effective Domain-Specific Vocabulary Initialization. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP 2021), pages 10660—10668. [code]
M. Mistica, J.H. Lau, B. Merrifield, K. Fazio, and T. Baldwin (2021). Semi-automatic Triage of Requests for Free Legal Assistance. In Proceedings of the Natural Legal Language Processing Workshop 2021, pages 217—227.
T. Wada, T. Iwata, Y. Matsumoto, T. Baldwin, and J.H. Lau (2021). Learning Contextualised Cross-lingual Word Embeddings and Alignments for Extremely Low-Resource Languages Using Parallel Corpora. In Proceedings of the 1st Workshop on Multilingual Representation Learning (MRL 2021), pages 16—31. [code]
L. Tian, X. Zhang, and J.H. Lau (2021). Rumour Detection via Zero-shot Cross-lingual Transfer Learning. In Proceedings of The European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD 2021), pages 603—618. [code]
S. Šuster, K. Verspoor, T. Baldwin, J.H. Lau, A. Jimeno-Yepes, D. Martinez, and Y. Otmakhova (2021). Impact of Detecting Clinical Trial Elements in Exploration of COVID-19 Literature. In Proceedings of The Fourth International Workshop on Health Natural Language Processing (HealthNLP 2021).
F. Koto, J.H. Lau, and T. Baldwin (2021). Evaluating the Efficacy of Summarization Evaluation across Languages. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021. [code]
S. Yang, R. Zhang, S. Erfani, and J.H. Lau (2021). A Unified Framework to Incorporate Multimodal Knowledge Bases into End-to-End Task-Oriented Dialogue Systems. In Proceedings of the 30th International Joint Conference on Artificial Intelligence (IJCAI 2021), pages 3978—3984.
Y. Xu, X. Zhong, A. Jimeno-Yepes, and J.H. Lau (2021). Grey-box Adversarial Attack And Defence For Sentiment Classification. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics — Human Language Technologies (NAACL HLT 2021), pages 4078—4087. [code]
F. Koto, J.H. Lau, and T. Baldwin (2021). Discourse Probing of Pretrained Language Models. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics — Human Language Technologies (NAACL HLT 2021), pages 3849—3864. [code]
S. Bhatia, J.H. Lau, and T. Baldwin (2021). Automatic Classification of Neutralization Techniques in the Narrative of Climate Change Scepticism. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics — Human Language Technologies (NAACL HLT 2021), pages 2167—2175. [code]
F. Koto, J.H. Lau, and T. Baldwin (2021). Top-down Discourse Parsing via Sequence Labelling. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume (EACL 2021), pages 715—726. [code]
K. Verspoor, S. Šuster, Y. Otmakhova, S. Mendis, Z. Zhai, B. Fang, J.H. Lau, T. Baldwin, A. Jimeno-Yepes, and D. Martinez (2021). Brief description of COVID-SEE: The Scientific Evidence Explorer for COVID-19 Related Research. In Proceedings of the 43rd European Conference on Information Retrieval (ECIR 2021), Tuscany, Italy. [code]

2020

S. Zhang, X. Zhang, J.H. Lau, J. Chan, and C. Paris (2020). Less is More: Rejecting Unreliable Reviews for Product Question Answering. In Proceedings of the 2020 European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD 2020), Ghent, Belgium, pages 567—583. [code]
F. Koto, A. Rahimi, J.H. Lau, and T. Baldwin (2020). IndoLEM and IndoBERT: A Benchmark Dataset and Pre-trained Language Model for Indonesian NLP. In Proceedings of the 28th International Conference on Computational Linguistics (COLING 2020), Barcelona, Spain (Online), pages 757—770. [code]
F. Koto, J.H. Lau, and T. Baldwin (2020). Liputan6: A Large-scale Indonesian Dataset for Text Summarisation. In Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing (AACL-IJCNLP 2020), Suzhou, China, pages 598—608. [code]
Y. Xu, X. Zhong, A. Jimeno-Yepes, and J.H. Lau (2020). Forget Me Not: Reducing Catastrophic Forgetting for Domain Adaptation in Reading Comprehension. In Proceedings of the 2020 International Joint Conference on Neural Networks (IJCNN2020), Glasgow, UK, pages 1—8.
L. Tian, X. Zhang, and J.H. Lau (2020). #DemocratsAreDestroyingAmerica: Rumour Analysis on Twitter During COVID-19. In Proceedings of the 5th International Workshop on Mining Actionable Insights from Social Networks: Special Edition on Dis/Misinformation Mining from Social Media (MAISoN 2020).
K. Leins, J.H. Lau, and T. Baldwin (2020). Give Me Convenience and Give Her Death: Who Should Decide What Uses of NLP are Appropriate, and on What Basis?. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL 2020), Seattle, USA, pages 2908—2913.
J.H. Lau, C. Armendariz, S. Lappin, M. Purver, and C. Shu (2020). How Furiously Can Colorless Green Ideas Sleep? Sentence Acceptability in Context. In Transactions of the Association for Computational Linguistics, Vol 8, pages 296—310. [code]

2019

S. Zhang, J.H. Lau, X. Zhang, J. Chan, and C. Paris (2019). Discovering Relevant Reviews for Answering Product-related Queries. In Proceedings of the 19th IEEE International Conference on Data Mining (ICDM 2019), Beijing, China, pages 1468—1473. [code]
Z. Xie, J.H. Lau, and T. Cohn (2019). From Shakespeare to Li-Bai: Adapting a Sonnet Model to Chinese Poetry. In Proceedings of the The 17th Annual Workshop of the Australasian Language Technology Association (ALTA 2019), Sydney, Australia, pages 10—18.
F. Koto, J.H. Lau, and T. Baldwin (2019). Improved Document Modelling with a Neural Discourse Parser. In Proceedings of the The 17th Annual Workshop of the Australasian Language Technology Association (ALTA 2019), Sydney, Australia, pages 67—76. [code]
K. Zhou, C. Shu, B. Li, and J.H. Lau (2019). Early Rumour Detection. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics — Human Language Technologies (NAACL HLT 2019), Minneapolis, Minnesota, pages 1614—1623. [code]

2018

K.N. Tran, J.H. Lau, D. Contractor, U. Gupta, B. Sengupta, C. Butler, and M. Mohania (2018). Document Chunking and Learning Objective Generation for Instruction Design. In Proceedings of the 11th International Conference on Educational Data Mining, New York, USA.
J.H. Lau, T. Cohn, T. Baldwin, J. Brooke, and A. Hammond (2018). Deep-speare: A Joint Neural Model of Poetic Language, Meter and Rhyme. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (ACL 2018), Melbourne, Australia, pages 1948—1958. [code]
J.P. Bernardy, S. Lappin, and J.H. Lau (2018). The Influence of Context on Sentence Acceptability Judgements. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (ACL 2018), Melbourne, Australia, pages 456—461.
W. Zhang, Q.Z. Sheng, J.H. Lau, E. Abebe, and W. Ruan (2018). Duplicate Detection in Programming Question Answering Communities. In ACM Transactions of Internet Technology, Vol 18(3), pages 1—21.
S. Bhatia, J.H. Lau, and T. Baldwin (2018). Topic Intrusion for Automatic Topic Model Evaluation. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP 2018), Brussels, Belgium, pages 844—849. [code]
S. Xu, A. Bennett, D. Hoogeveen, J.H. Lau, and T. Baldwin (2018). Preferred Answer Selection in Stack Overflow: Better Text Representations ... and Metadata, Metadata, Metadata. In Proceedings of the 2018 EMNLP Workshop W-NUT: The 4th Workshop on Noisy User-generated Text, Brussels, Belgium, pages 137—147.

2017

J.H. Lau, L. Chi, K. Tran, and T. Cohn (2017). End-to-end Network for Twitter Geolocation Prediction and Hashing. In Proceedings of the 8th International Joint Conference on Natural Language Processing (IJCNLP 2017), Taipei, Taiwan, pages 744—753.
J.H. Lau, A. Clark, and S. Lappin (2017). Grammaticality, Acceptability, and Probability: A Probabilistic View of Linguistic Knowledge. In Cognitive Science, Vol 41, pages 1202—1241.
S. Bhatia, J.H. Lau, and T. Baldwin (2017). An Automatic Approach for Document-level Topic Model Evaluation. In Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL-2017), Vancouver, Canada, pages 206—215.
J.H. Lau, T. Baldwin, and T. Cohn (2017). Topically Driven Neural Language Model. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (ACL 2017), Vancouver, Canada, pages 355—365. [code]
W. Zhang, Q.Z. Sheng, J.H. Lau, and E. Abebe (2017). Detecting Duplicate Posts in Programming QA Communities via Latent Semantics and Association Rules. In Proceedings of the 26th International Conference on World Wide Web (WWW 2017), Perth, Australia, pages 1221—1229.
Y. Xu, J.H. Lau, T. Baldwin, and T. Cohn (2017). Decoupling Encoder and Decoder Networks for Abstractive Document Summarization. In Proceedings of the MultiLing 2017 Workshop on Summarization and Summary Evaluation Across Source Types and Genres, Valencia, Spain, pages 7—11.
I. Sorodoc, J.H. Lau, N. Aletras, and T. Baldwin (2017). Multimodal Topic Labelling. In Proceedings of the 15th Conference of the EACL (EACL 2017), Valencia, Spain, pages 701—706.
N. Aletras, T. Baldwin, J.H. Lau, and M. Stevenson (2017). Evaluating Topic Representations for Exploring Document Collections. In Journal of the Association for Information Science and Technology, Vol 68, pages 154—167.

2016

S. Bhatia, J.H. Lau, and T. Baldwin (2016). Automatic Labelling of Topics with Neural Embeddings. In Proceedings of the 26th International Conference on Computational Linguistics (COLING 2016), Osaka, Japan, pages 953—963. [code]
J.H. Lau, and T. Baldwin (2016). An Empirical Evaluation of doc2vec with Practical Insights into Document Embedding Generation. In Proceedings of the 1st Workshop on Representation Learning for NLP, Berlin, Germany, pages 78—86. [code]
A. Bennett, T. Baldwin, J.H. Lau, D. McCarthy, and F. Bond (2016). LexSemTm: A Semantic Dataset Based on All-words Unsupervised Sense Distribution Learning. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL 2016), Berlin, Germany, pages 1513—1524. [code]
J.H. Lau, and T. Baldwin (2016). The Sensitivity of Topic Coherence Evaluation to Topic Cardinality. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics — Human Language Technologies (NAACL HLT 2016), San Diego, USA, pages 483—487. [code]

2015

J.H. Lau, A. Clark, and S. Lappin (2015). Unsupervised Prediction of Acceptability Judgements. In Proceedings of the Joint conference of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing (ACL-IJCNLP 2015), Beijing, China, pages 1618—1628. [code]

2014

M. Lui, J.H. Lau, and T. Baldwin (2014). Automatic Detection and Language Identification of Multilingual Documents. In Transactions of the Association for Computational Linguistics, Vol 2(Feb), pages 27—40. [code]
N. Aletras, T. Baldwin, J.H. Lau, and M. Stevenson (2014). Representing Topics Labels for Exploring Digital Libraries. In Proceedings of Digital Libraries 2014, London, UK.
P. Cook, M. Rundell, J.H. Lau, and T. Baldwin (2014). Applying a Word-sense Induction System to the Automatic Extraction of Diverse Dictionary Examples. In Proceedings of the XVI EURALEX International Congress (EURALEX 2014), Bolzano, Italy, pages 15—19.
P. Cook, J.H. Lau, D. Mccarthy, and T. Baldwin (2014). Novel Word-sense Identification. In Proceedings of the 25th International Conference on Computational Linguistics (COLING 2014), Dublin, Ireland, pages 1624—1635.
J.H. Lau, A. Clark, and S. Lappin (2014). Measuring Gradience in Speakers' Grammaticality Judgements. In Proceedings of the 36th Annual Meeting of the Cognitive Science Society (CogSci 2014), Quebec City, Canada, pages 821—826.
J.H. Lau, P. Cook, D. McCarthy, S. Gella, and T. Baldwin (2014). Learning Word Sense Distributions, Detecting Unattested Senses and Identifying Novel Senses Using Topic Models. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (ACL 2014), Baltimore, Maryland, pages 259—270.
J.H. Lau, D. Newman, and T. Baldwin (2014). Machine Reading Tea Leaves: Automatically Evaluating Topic Coherence and Topic Model Quality. In Proceedings of the 14th Conference of the EACL (EACL 2014), Gothenburg, Sweden, pages 530—539. [code]

2013

J.H. Lau, T. Baldwin, and D. Newman (2013). On Collocations and Topic Models. In ACM Transactions on Speech and Language Processing, Vol 10, pages 1—14.
J.H. Lau, P. Cook, and T. Baldwin (2013). unimelb: Topic Modelling-based Word Sense Induction. In Proceedings of the 7th International Workshop on Semantic Evaluation (SemEval 2013), Atlanta, USA.
M. Mistica, J.H. Lau, and T. Baldwin (2013). Unsupervised Word Class Induction for Under-resourced Languages: A Case Study on Indonesian. In Proceedings of the 6th International Joint Conference on Natural Language Processing (IJCNLP 2013), Nagoya, Japan, pages 685—691.
J.H. Lau, P. Cook, and T. Baldwin (2013). unimelb: Topic Modelling-based Word Sense Induction for Web Snippet Clustering. In Proceedings of the 7th International Workshop on Semantic Evaluation (SemEval 2013), Atlanta, USA.
P. Cook, J.H. Lau, and T. Baldwin (2013). A Lexicographic Appraisal of an Automatic Approach for Detecting New Word Senses. In Proceedings of eLex 2013, Tallinn, Estonia, pages 49—65.

2012

J.H. Lau, P. Cook, D. McCarthy, D. Newman, and T. Baldwin (2012). Word Sense Induction for Novel Sense Detection. In Proceedings of the 13th Conference of the EACL (EACL 2012), Avignon, France, pages 591—601.
J.H. Lau, N. Collier, and T. Baldwin (2012). On-line Trend Analysis with Topic Models: #twitter trends detection topic model online. In Proceedings of the 24th International Conference of on Computational Linguistics (COLING 2012), Mumbai, India, pages 1519—1534.
D. Newman, N. Koilada, J.H. Lau, and T. Baldwin (2012). Bayesian Text Segmentation for Index Term Identification and Keyphrase Extraction. In Proceedings of the 24th International Conference of on Computational Linguistics (COLING 2012), Mumbai, India, pages 2077—2092.

2011

J.H. Lau, K. Grieser, D. Newman, and T. Baldwin (2011). Automatic Labelling of Topics. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (ACL HLT 2011), Portland, USA, pages 1536—1545.

2010

J.H. Lau, D. Newman, S. Karimi, and T. Baldwin (2010). Best Topic Word Selection for Topic Labelling. In Proceedings of the 23rd International Conference on Computational Linguistics (COLING 2010), Posters Volume, Beijing, China, pages 605—613.
D. Newman, J.H. Lau, K. Grieser, and T. Baldwin (2010). Automatic Evaluation of Topic Coherence. In Proceedings of Human Language Technologies: The 11th Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL HLT 2010), Los Angeles, USA, pages 100—108.

Jey Han Lau

Publications

To Appear

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010