Skip to content

C/C++ 获取字符的code point #25

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
Shellbye opened this issue Jul 30, 2018 · 0 comments
Open

C/C++ 获取字符的code point #25

Shellbye opened this issue Jul 30, 2018 · 0 comments
Labels

Comments

@Shellbye
Copy link
Owner

Shellbye commented Jul 30, 2018

最近在做语音合成的相关工作,其中用的一个功能点就是获取汉字的拼音,这个在Python中可以通过pypinyin来完成,

>>> from pypinyin import pinyin
>>> pinyin('中心')
[['zhōng'], ['xīn']]

但是在我们的项目中,需要的是C++版本的,所以需要把这部分功能转化为C++代码,看了一下pypinyin的代码,大概就是先把汉字转化成code point,然后在一个超级大的字典里找这个汉字的拼音。于是,把这个超级大的字典拷贝出来放到C++``里就基本大功告成了,但是还缺一步,就是汉字到code point的过程。在Python中,汉字到code point就是一行代码,

ord('白')
30333

但是就这样一个简单的操作,把它翻译到C++缺用了我很长时间(很大程度上也是因为我对C++不熟),最终还是在万能的SO上找到了解决的代码:

#include <iostream>
#include <string>
#include <codecvt>
#include "data_pinyin.hpp"

int main(int argc, char *argv[])
{
    if (argc < 2){
        std::cerr << "Usage: " << argv[0]
	    << " chararters" << std::endl;
	    return 1;
    }
    std::string name = argv[1];
    std::wstring_convert<std::codecvt_utf8<char32_t>, char32_t> cv;
    auto str32 = cv.from_bytes(name);
    int name_code_point = 0;
    for (auto c : str32)
        name_code_point = uint_least32_t(c);
    std::cout << name_code_point << std::endl;
    std::cout << pinyin_dict[name_code_point] << std::endl;
    return 0;
}

其中的data_pinyin.hpp就是我从pypinyin拷贝出来的那个超级大的字典,并经过简单的处理(16进制转为了10进制)转成了C++相应的格式,就放几行看一下吧,pypinyin的原始文件在这里

#include <map>
std::map<int, std::string> pinyin_dict = {
    {12295, "líng,yuán,xīng"},
    {13312, "qiū"},
    {13313, "tiàn"},
    {13316, "kuà"},
    {13317, ""},
    // 中间省略
    {183944, ""},
    {183955, "chǔ"},

};

完成之后就是编译运行了

$ g++ -std=c++11 -g main.cpp -o main.out
$ ./main.out 白
30333
bái,bó
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant