OAG-BERT: Towards a Unified Backbone Language Model for Academic Knowledge Services

Type: Article

Publication Date: 2022-08-12

Citations: 17

DOI: https://doi.org/10.1145/3534678.3539210

Abstract

Academic knowledge services have substantially facilitated the development of the science enterprise by providing a plenitude of efficient research tools. However, many applications highly depend on ad-hoc models and expensive human labeling to understand scientific contents, hindering deployments into real products. To build a unified backbone language model for different knowledge-intensive academic applications, we pre-train an academic language model OAG-BERT that integrates both the heterogeneous entity knowledge and scientific corpora in the Open Academic Graph (OAG) -- the largest public academic graph to date. In OAG-BERT, we develop strategies for pre-training text and entity data along with zero-shot inference techniques. In OAG-BERT, we develop strategies for pre-training text and entity data along with zero-shot inference techniques. Its zero-shot capability furthers the path to mitigate the need of expensive annotations. OAG-BERT has been deployed for real-world applications, such as the reviewer recommendation function for National Nature Science Foundation of China (NSFC) -- one of the largest funding agencies in China -- and paper tagging in AMiner. All codes and pre-trained models are available via the CogDL toolkit.

Locations

  • arXiv (Cornell University) - View - PDF
  • Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining - View - PDF

Similar Works

Action Title Year Authors
+ OAG-BERT: Towards A Unified Backbone Language Model For Academic Knowledge Services 2021 Xiao Liu
Da Yin
Xingjian Zhang
Kai Su
Kan Wu
Hongxia Yang
Jie Tang
+ PubGraph: A Large-Scale Scientific Knowledge Graph 2023 Kian Ahrabian
Xinwei Du
Richard Delwin Myloth
Arun Baalaaji Sankar Ananthan
Jay Pujara
+ PDF Chat iText2KG: Incremental Knowledge Graphs Construction Using Large Language Models 2024 Yassir Lairgi
Ludovic Moncla
Rémy Cazabet
Khalid Benabdeslem
Pierre Cléau
+ Find the Funding: Entity Linking with Incomplete Funding Knowledge Bases 2022 Gizem Aydın
Seyed Amin Tabatabaei
Giorgios Tsatsaronis
Faegheh Hasibi
+ PDF Chat HetGCoT-Rec: Heterogeneous Graph-Enhanced Chain-of-Thought LLM Reasoning for Journal Recommendation 2025 Runsong Jia
Meng‐Jia Wu
Ying Ding
Jie Lü
Yi Zhang
+ PDF Chat Triple Classification for Scholarly Knowledge Graph Completion 2021 Mohamad Yaser Jaradeh
Kuldeep Singh
Markus Stocker
Sören Auer
+ Harvesting Textual and Structured Data from the HAL Publication Repository 2024 Francis Kulumba
Wissam Antoun
Guillaume Vimont
Laurent Romary
+ Scientific Language Models for Biomedical Knowledge Base Completion: An Empirical Study 2021 Rahul Nadkarni
David Wadden
Iz Beltagy
Noah A. Smith
Hannaneh Hajishirzi
Tom Hope
+ Using Large Language Models for Knowledge Engineering (LLMKE): A Case Study on Wikidata 2023 Bohui Zhang
Ioannis Reklos
Nitisha Jain
Albert Meroño-Peñuela
Elena Simperl
+ PDF Chat Graph Learning in the Era of LLMs: A Survey from the Perspective of Data, Models, and Tasks 2024 Xunkai Li
Zhengyu Wu
Jiayi Wu
Hanwen Cui
Jishuo Jia
Rong-Hua Li
Guoren Wang
+ Web-Scale Academic Name Disambiguation: the WhoIsWho Benchmark, Leaderboard, and Toolkit 2023 Bo Chen
Jing Zhang
Fanjin Zhang
Tianyi Han
Yuqing Cheng
Xiaoyan Li
Yuxiao Dong
Jie Tang
+ Large Language Models on Graphs: A Comprehensive Survey 2023 Bowen Jin
Gang Liu
Chi Han
Meng Jiang
Heng Ji
Jiawei Han
+ PDF Chat Web-Scale Academic Name Disambiguation: The WhoIsWho Benchmark, Leaderboard, and Toolkit 2023 Bo Chen
Jing Zhang
Fanjin Zhang
Tianyi David Han
Yuqing Cheng
Xiaoyan Li
Yuxiao Dong
Jie Tang
+ The MAPLE Benchmark for Scientific Literature Tagging 2023 Yu Zhang
Bowen Jin
Qi Zhu
Meng Yu
Jiawei Han
+ The MAPLE Benchmark for Scientific Literature Tagging 2023 Yu Zhang
Bowen Jin
Qi Zhu
Meng Yu
Jiawei Han
+ PDF Chat Understanding Survey Paper Taxonomy about Large Language Models via Graph Representation Learning 2024 Jun Zhuang
Casey Kennington
+ PDF Chat Graph Chain-of-Thought: Augmenting Large Language Models by Reasoning on Graphs 2024 Bowen Jin
Chulin Xie
Jiawei Zhang
Kashob Kumar Roy
Yu Zhang
Suhang Wang
Meng Yu
Jiawei Han
+ PDF Chat TAGLAS: An atlas of text-attributed graph datasets in the era of large graph and language models 2024 Jiarui Feng
Hao Liu
Lecheng Kong
Yixin Chen
Muhan Zhang
+ PDF Chat Knowledge Graph Large Language Model (KG-LLM) for Link Prediction 2024 Dong Shu
Tianle Chen
Mingyu Jin
Yiting Zhang
Mengnan Du
Yanwen Zhang
+ AceKG: A Large-scale Knowledge Graph for Academic Data Mining 2018 Ruijie Wang
Yuchen Yan
Jialu Wang
Yuting Jia
Ye Zhang
Weinan Zhang
Xinbing Wang