将python str格式数据传递给C++的函数,如果C++函数的形参是 std::string or char * ,pybind11会自动将Python stringz转为UTF-8的编码方式。所有python的str都能以UTF-8来编码,所以pybind11的这个操作通常都会成功。 The C++ language is encoding agnostic. It is the responsibility of the programmer to track encodings. It’s often easiest to simply use UTF-8 everywhere.
PYBIND11_MODULE(py_string_to_cpp, m){ m.def("utf8_test",[](const std::string &s ){ std::cout<<"utf-8 is icing on cake!!"; std::cout<< s << std::endl; });
m.def("utf8_charptr",[](char* s){ std::cout<<"my favoriate food is "<< s <<std::endl; }); }
s ="cake noodles" utf8_test(s) utf8_charptr(s)
无论C++的函数的形参是传值调用还是引用调用,无论形参中是否使用const,测试结果都是一样的。
Passing bytes to C++
python bytes对象 传递给形参为 std::string or char*的C++函数,无需类型转换。 为了在python3中使函数只接受bytes (and not str),在C++中使用py::bytes来声明形参。
2. Returning C++ strings to Python
C++返回std::string or char*给python,pybind11会假定 string 为UTF-8ge格式,并将编码为python的str(using the same API as Python uses to perform bytes.decode(‘utf-8’))。如果编码失败,pybind11会报错(UnicodeDecodeErro)
m.def("std_string_return",[](){ return std::string("this std::string needs to be UTF-8 encoded!"); });
m.def("char_ptr_return",[](){ char* s ="thish string needs to be UTF-8 encoded!"; return s;
from py_string_to_cpp import std_string_return, char_ptr_return print(std_string_return()) print(char_ptr_return()) isinstance(std_string_return(),str) isinstance(char_ptr_return(),str)
this std::string needs to be UTF-8 encoded! thish string needs to be UTF-8 encoded! True True
Because UTF-8 is inclusive of pure ASCII, there is never any issue with returning a pure ASCII string to Python. If there is any possibility that the string is not pure ASCII, it is necessary to ensure the encoding is valid UTF-8.
Wide character strings
当Python str传递给形参为std::wstring, wchar_t*, std::u16string or std::u32string的C++函数时,str会被编码为UTF-16 or UTF-32, 取决于C++编译器。当这些类型的string从C++向python返回时,会假定这些string有效的UTF-16 UTF-32 格式,并将其编码为python str。