Publications
^ denotes equal contribution
2024
- Seed-Free Synthetic Data Generation Framework for Instruction-Tuning LLMs: A Case Study in Thai (ACL-SRW’24) Parinthapat Pengpun, Can Udomcharoenchaikit, Weerayut Buaphet, Peerat Limkonchotiwat. Github: LINK
- Space Decomposition for Sentence Embedding (ACL’24 - Finding) Wuttikorn Ponwitayarat^, Peerat Limkonchotiwat^, Ekapol Chuangsuwanich, Sarana Nutanong. Github: LINK
- Identifying and Mitigating Annotation Bias in Natural Language Understanding using Causal Mediation Analysis (ACL’24 - Finding) Can Udomcharoenchaikit, Sitiporn Sae Lim, Peerat Limkonchotiwat, Ekapol Chuangsuwanich, Sarana Nutanong. Github: LINK
- SEA-VQA: Southeast Asian Cultural Context Dataset For Visual Question Answering (ALVR’24) Norawit Urailertprasert, Peerat Limkonchotiwat, Supasorn Suwajanakorn, Sarana Nutanong. Github: LINK
- McCrolin: Multi-consistency Cross-lingual Training for Retrieval Question Answering (EMNLP’24 - Finding) Peerat Limkonchotiwat^, Wuttikorn Ponwitayarat^, Potsawee Manakul, Can Udomcharoenchaikit, Ekapol Chuangsuwanich, Sarana Nutanong. Github: LINK
- Efficient Overshadowed Entity Disambiguation by Mitigating Shortcut Learning (EMNLP’24) Panuthep Tasawong, Peerat Limkonchotiwat, Potsawee Manakul, Can Udomcharoenchaikit, Ekapol Chuangsuwanich, Sarana Nutanong. Github: [LINK]
- An Empirical Study of Multilingual Reasoning Distillation for Question Answering (EMNLP’24) Patomporn Payoungkhamdee, Peerat Limkonchotiwat, Jinheon Baek, Potsawee Manakul, Can Udomcharoenchaikit, Ekapol Chuangsuwanich, Sarana Nutanong. Github: [LINK]
- On Creating an English-Thai Code-switched Machine Translation in Medical Domain (EMNLP’24 - Finding) Parinthapat Pengpun, Krittamate Tiankanon, Amrest Chinkamol, Jiramet Kinchagawat, Pitchaya Chairuengjitjaras, Pasit Supholkhan, Pubordee Aussavavirojekul, Chiraphat Boonnag, Kanyakorn Veerakanjana, Hirunkul Phimsiri, Boonthicha Sae-jia, Nattawach Sataudom, Piyalitt Ittichaiwong, Peerat Limkonchotiwat. Github: [LINK]
- SEACrowd: A Multilingual Multimodal Data Hub and Benchmark Suite for Southeast Asian Languages (EMNLP’24) Holy Lovenia^, Rahmad Mahendra^, Salsabil Maulana Akba^r, Lester James Validad Miranda^, Jennifer Santoso^, Elyanah Aco^, Akhdan Fadhilah, Jonibek Mansurov^, Joseph Marvin Imperial^, Onno P. Kampman^, Joel Ruben Antony Moniz^, Muhammad Ravi Shulthan Habibi^, Frederikus Hudi^, Jann Railey Montalan^, Ryan Ignatius Hadiwijaya, Joanito Agili Lopo, William Nixon, Börje F. Karlsson, James Jaya, Ryandito Diandaru, Yuze GAO, Patrick Amadeus Irawan, Bin Wang, Jan Christian Blaise Cruz, Chenxi Whitehouse, Ivan Halim Parmonangan, Maria Khelli, Wenyu Zhang, Lucky Susanto, Reynard Adha Ryanda, Sonny Lazuardi Hermawan, Dan John Velasco, Muhammad Dehan Al Kautsar, Willy Fitra Hendria, Yasmin Moslem, Noah Flynn, Muhammad Farid Adilazuarda, Haochen Li, Johanes Lee, R. Damanhuri, Shuo Sun, Muhammad Reza Qorib, Amirbek Djanibekov, Wei Qi Leong, Quyet V. Do, Niklas Muennighoff, Tanrada Pansuwan, Ilham Firdausi Putra, Yan Xu, Tai Ngee Chia, Ayu Purwarianti, Sebastian Ruder, William Chandra Tjhi, Peerat Limkonchotiwat^, Alham Fikri Aji^, Sedrick Keh^, Genta Indra Winata^, Ruochen Zhang^, Fajri Koto^, Zheng Xin Yong^, Samuel Cahyawijaya^. Github: [LINK]
- Can General-Purpose Large Language Models Generalize to English-Thai Machine Translation? (GenBench@EMNLP’24) Jirat Chiaranaipanich, Naiyarat Hanmatheekuna, Jitkapat Sawatphol, Krittamate Tiankanon, Jiramet Kinchagawat, Amrest Chinkamol, Parinthapat Pengpun, Piyalitt Ittichaiwong, Peerat Limkonchotiwat. Github: [LINK]
- CHIE: Generative MRC Evaluation for in-context QA with Correctness, Helpfulness, Irrelevancy, and Extraneousness Aspects (GenBench@EMNLP’24) Wannaphong Phatthiyaphaibun, Surapon Nonesung, Peerat Limkonchotiwat, Can Udomcharoenchaikit, Jitkapat Sawatphol, Ekapol Chuangsuwanich, Sarana Nutanong. Github: [LINK]
2023
- Typo-Robust Sentence Representation Learning for Dense Retrieval (ACL’23) Panuthep Tasawong, Wuttikorn Ponwitayarat, Peerat Limkonchotiwat, Can Udomcharoenchaikit, Ekapol Chuangsuwanich, Sarana Nutanong. Github: LINK
- An Efficient Self-Supervised Cross-View Training For Sentence Embedding (TACL 2023) Peerat Limkonchotiwat, Wuttikorn Ponwitayarat, Lalita Lowphansirikul, Can Udomcharoenchaikit, Ekapol Chuangsuwanich, Sarana Nutanong. Github: LINK
- mReFinED: An Efficient End-to-End Multilingual Entity Linking System (EMNLP’23 - Finding) Peerat Limkonchotiwat, Weiwei Cheng, Christos Christodoulopoulos, Amir Saffari, Jens Lehmann.
2022
- Thai Nested Named Entity Recognition Corpus (ACL’22 - Findings) Weerayut Buaphet, Can Udomcharoenchaikit, Peerat Limkonchotiwat, Attapol Rutherford, Sarana Nutanong. Github: LINK
- CL-ReLKT: Cross-lingual Language Knowledge Transfer for Multilingual Retrieval Question Answering (NAACL’22 - Findings) Peerat Limkonchotiwat, Wuttikorn Ponwitayarat, Can Udomcharoenchaikit, Ekapol Chuangsuwanich, Sarana Nutanong. Github: LINK
- ConGen: Unsupervised Control and Generalization Distillation For Sentence Representation (Finding of EMNLP 2022) Peerat Limkonchotiwat, Wuttikorn Ponwitayarat, Lalita Lowphansirikul, Can Udomcharoenchaikit, Ekapol Chuangsuwanich, Sarana Nutanong. Github: LINK
2021
- Handling Cross- and Out-of-Domain Samples in Thai Word Segmentation (ACL’21 - Findings) Peerat Limkonchotiwat, Raheem Sawar, Wannaphong Phatthiyaphaibun, Ekapol Chuangsuwanich, Sarana Nutanong. Github: LINK
- Robust fragment-based framework for cross-lingual sentence retrieval (EMNLP’21 - Findings) Nattapol Trijakwanich, Peerat Limkonchotiwat, Raheem Sawar, Wannaphong Phatthiyaphaibun, Ekapol Chuangsuwanich, Sarana Nutanong. Github: LINK
2020
- Domain Adaptation of Thai Word Segmentation Models using Stacked Ensemble (EMNLP’20) Peerat Limkonchotiwat, Raheem Sawar, Wannaphong Phatthiyaphaibun, Ekapol Chuangsuwanich, Sarana Nutanong. Github: LINK