将jieba.NET换成AOTba,以支持GB18030-2022、AOT编译等,同时修复漏洞#12
Open
4Darmygeometry wants to merge 16 commits into
Open
Conversation
Updated PackageReference for AOTba to version 1.0.4.
Updated AOTba package version from 1.0.9 to 1.0.10.
Updated JiebaSegmenter initialization to disable entity protection.
Updated AOTba package version from 1.0.10 to 1.0.11.
Updated AOTba package version from 1.0.11 to 1.1.0.
Updated PackageReference version from 1.1.0 to 1.1.1.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
原版jieba.NET依赖反射不支持AOT编译,且依赖的Newtonsoft.Json版本存在漏洞。本PR将jieba.NET换成AOTba 1.0.9,有如下特点:
支持lcut与lcutforsearch直接返回列表
支持日期/时间完整提取不被拆开(如下午3点半、晚上8点30分、2021-01-01 09:00:00)
支持比例提取(如提取“金龙鱼1:1:1调和油”的“1:1:1”)
支持提取域名(如https://gitee.com/JTsamsde/AOTba )
支持完整提取带下划线/短线单词(如TF-IDF)
支持版本号提取(如v1.0.1、1.0.1、3.2-preview1、4.1.2-rc1、2.1-alpha1、6.3-beta2)
支持异步加载词典
支持含Emoji句子断句
支持带变体选择符和ZWJ的复杂emoji断句(甚至支持到Unicode 16的emoji)
全面支持GB18030-2022及一号文要求(基本区到扩展I区汉字处理能力)
可AOT编译
可以使得OPENCCNET支持GB18030-2022及一号文范围内的繁简中文转换,且表情包及扩展区汉字不会拆成代理对,域名等完整