MiChao-HuaFen 1.0: A Specialized Pre-trained Corpus Dataset for Domain-specific Large Models

Type: Preprint

Publication Date: 2023-01-01

Citations: 0

DOI: https://doi.org/10.48550/arxiv.2309.13079

Locations

  • arXiv (Cornell University) - View
  • DataCite API - View

Similar Works

Action Title Year Authors
+ TCM-GPT: Efficient pre-training of large language models for domain adaptation in Traditional Chinese Medicine 2024 Guoxing Yang
Xiaohong Liu
Jian‐Yu Shi
Zan Wang
Guangyu Wang
+ TCM-GPT: Efficient Pre-training of Large Language Models for Domain Adaptation in Traditional Chinese Medicine 2023 Guoxing Yang
Jian‐Yu Shi
Zan Wang
Xiaohong Liu
Guangyu Wang
+ GanLM: Encoder-Decoder Pre-training with an Auxiliary Discriminator 2023 Jian Yang
Shuming Ma
Li Dong
Shaohan Huang
Haoyang Huang
Yuwei Yin
Dongdong Zhang
Liqun Yang
Furu Wei
Zhoujun Li
+ PDF Chat CFinBench: A Comprehensive Chinese Financial Benchmark for Large Language Models 2024 Ying Nie
Binwei Yan
Tianyu Guo
Hao Liu
Haoyu Wang
Wei He
Binfan Zheng
Weihao Wang
Qiang Li
Weijian Sun
+ PDF Chat Chinese Tiny LLM: Pretraining a Chinese-Centric Large Language Model 2024 Xinrun Du
Zhouliang Yu
Songyang Gao
Ding Pan
Yuyang Cheng
Ziyang Ma
Ruibin Yuan
Xingwei Qu
Jiaheng Liu
Tianyu Zheng
+ ProphetNet-X: Large-Scale Pre-training Models for English, Chinese, Multi-lingual, Dialog, and Code Generation 2021 Weizhen Qi
Yeyun Gong
Yu Yan
Can Xu
Bolun Yao
Bartuer Zhou
Biao Cheng
Daxin Jiang
Jiusheng Chen
Ruofei Zhang
+ Extending the Pre-Training of BLOOM for Improved Support of Traditional Chinese: Models, Methods and Results 2023 Philipp Ennen
Po‐Chun Hsu
Chan-Jan Hsu
Changle Liu
Yen-Chen Wu
Yin-Hsiang Liao
Chin‐Teng Lin
Da-shan Shiu
Wei-Yun Ma
+ A Roadmap for Big Model 2022 Sha Yuan
Hanyu Zhao
Shuai Zhao
Jiahong Leng
Yangxiao Liang
Xiaozhi Wang
Jifan Yu
Xin Lv
Zhou Shao
Jiaao He
+ PDF Chat Lattice-BERT: Leveraging Multi-Granularity Representations in Chinese Pre-trained Language Models 2021 Yuxuan Lai
Yijia Liu
Yansong Feng
Songfang Huang
Dongyan Zhao
+ PDF Chat FFN: a Fine-grained Chinese-English Financial Domain Parallel Corpus 2024 Yuxin Fu
Shijing Si
Leyi Mai
Xiang Li
+ PDF Chat Construction of Domain-Specified Japanese Large Language Model for Finance Through Continual Pre-Training 2024 Masanori Hirano
Kentaro Imajo
+ GKD: A General Knowledge Distillation Framework for Large-scale Pre-trained Language Model 2023 Shicheng Tan
Weng Lam Tam
Yuanchun Wang
Wenwen Gong
Shu Zhao
Peng Zhang
Jie Tang
+ Super Tickets in Pre-Trained Language Models: From Model Compression to Improving Generalization 2021 Liang Chen
Simiao Zuo
Minshuo Chen
Haoming Jiang
Xiaodong Liu
Pengcheng He
Tuo Zhao
Weizhu Chen
+ PDF Chat BianCang: A Traditional Chinese Medicine Large Language Model 2024 Sibo Wei
Xueping Peng
Yifei Wang
Jiasheng Si
Weiyu Zhang
Wenpeng Lü
Xiaoming Wu
Yinglong Wang
+ PDF Chat Construction of Domain-specified Japanese Large Language Model for Finance through Continual Pre-training 2024 Masanori Hirano
Kentaro Imajo
+ PDF Chat Construction of Domain-specified Japanese Large Language Model for Finance through Continual Pre-training 2024 Masanori Hirano
Kentaro Imajo
+ G-MAP: General Memory-Augmented Pre-trained Language Model for Domain Tasks 2022 Zhongwei Wan
Yichun Yin
Wei Zhang
Jiaxin Shi
Lifeng Shang
Guangyong Chen
Xin Jiang
Qun Liu
+ Measuring Massive Multitask Chinese Understanding 2023 Hui Zeng
+ AF Adapter: Continual Pretraining for Building Chinese Biomedical Language Model 2022 Yongyu Yan
Kui Xue
Xiaoming Shi
Qi Ye
Jingping Liu
Tong Ruan
+ PanGu-$π$: Enhancing Language Model Architectures via Nonlinearity Compensation 2023 Yunhe Wang
Hanting Chen
Yehui Tang
Tianyu Guo
Kai Han
Nie Ying
Xutao Wang
Hailin Hu
Zheyuan Bai
Yun Wang

Works That Cite This (0)

Action Title Year Authors

Works Cited by This (0)

Action Title Year Authors