Unverified Commit 565ad496 authored by pariskang's avatar pariskang 💬 Committed by GitHub
Browse files

Update README.md

parent 213bab9d
......@@ -5,6 +5,7 @@ This model aims to illuminate the profound knowledge of Traditional Chinese Medi
中医大语言模型,灵感来自中国古代杰出医家张仲景的智慧。 该模型旨在阐明中医博大精深之知识,传承古代智慧与现代技术创新,最终为医学领域提供可信赖和专业的工具。然而,目前所有产生的结果仅供参考,应由经验丰富的专业人员提供诊断和治疗结果和建议。
<p align="center"> <img src="https://raw.githubusercontent.com/pariskang/CMLM-ZhongJing/main/logo.png" alt="logo" title="logo" width="50%"> </p>
<p align="center"><b>Fig 1. A logo of CMLM-Zhongjing generated by Bing’s drawing output combined with human creative prompts.</b></p>
## 1.Instruction Data Construction
While many works such as Alpaca, Belle, etc., are based on the self-instruct approach which effectively harnesses the knowledge of large language models to generate diverse and creative instructions, this approach may lead to noise in instruction data, thereby affecting the accuracy of the model in fields where professional knowledge has a low tolerance for errors, such as medical and legal scenarios. Therefore, how to quickly invoke the OpenAI API without sacrificing the professionalism of instruction data has become an important research direction for instruction data construction and annotation scenarios. Here, we will briefly describe our preliminary experimental exploration.
......@@ -12,6 +13,7 @@ While many works such as Alpaca, Belle, etc., are based on the self-instruct app
目前大多如Alpaca、Belle等工作基于self-instruct思路。self-instruct思路可以很好的调用大语言模型的知识,生成多样和具有创造性的指令,在常规问答场景可以快速构造海量指令实现指令调优。但在一些专业知识容错率较低的领域,比如医疗和法律场景,幻觉输出会导致噪声指令数据从而影响模型的准确性。典型的情况是比如不当的诊断及处方建议甚至影响患者生命,事实性错误的法律条文和法理的引用会造成权益人的败诉。因此,如何快速调用OpenAI API且不牺牲指令数据的专业性成为指令数据构造及标注等场景的重要研究方向。以下将简述我们的初步实验探索。
<p align="center"> <img src="https://raw.githubusercontent.com/pariskang/CMLM-ZhongJing/main/logo_image/strategy.jpeg" alt="strategy" title="strategy" width="100%"> </p>
<p align="center"><b>Fig 2. A Multi-task Therapeutic Behavior Decomposition Instruction Construction Strategy in the Loop of Human Physicians.</b></p>
#### 1.1 Multi-task Therapeutic Behavior Decomposition Instruction Construction Strategy
Human memory and understanding require the construction of various scenarios and stories to implicitly encode knowledge information. The clarity of memory depends on the duration and richness of the learning process. Interleaved learning, spaced practice, and diversified learning can enhance the consolidation of knowledge, thereby forming a deep understanding of domain knowledge. Our approach is to learn from the process of human memory knowledge, use professional tables, leverage the language representation capabilities of large language models, strictly set specific prompt templates, so that the model can generate 16 scenarios based on the table data of Chinese medicine gynecology prescriptions, including patient therapeutic story, diagnostic analysis, diagnosis treatment expected result, formula function, interactive story, patient therapeutic story, narrative medicine, tongue & pulse, therapeutic template making, critical thinking, follow up, prescription, herb dosage, case study, real-world problem, disease mechanism, etc., to promote the model's reasoning ability for prescription data and diagnostic thinking logic.
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment