# 每个人都可以训练自己的语言模型

{% embed url="<https://github.com/gaboolic/rime-build-grammar>" %}

感谢rime输入法交流群雨辰、魔然作者ksqsf的研究。

把制作语言模型的步骤写下来，做个备忘。

语言模型简介：<https://fancyerii.github.io/dev287x/lm/>

简要步骤：

1 收集语料

2 分词，变成txt格式，词和词之间按空格分开 脚本可以参考<https://github.com/gaboolic/rime-frost/blob/master/others/program/mnbvc/yuliao_fenci_to_txt.py>

3 生成.arpa文件 可以使用开源库 <https://github.com/kpu/kenlm>

4 把arpa转成librime-octagram的tool用的格式 雨辰提供<https://github.com/gaboolic/rime-build-grammar/blob/main/arpa.py>

5 执行librime-octagram的build\_grammar


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://moqiyinxing.chunqiujinjing.com/index/jin-jie-ji-shu-xi-jie/mei-ge-ren-dou-ke-yi-xun-lian-zi-ji-de-yu-yan-mo-xing.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
