Introduction
During C++ development for the Windows OS one often has to convert a stream from UTF-8 (e. g. an XML-File) to UTF-16 (standard encoding for almost all OS APIs).
Usually I do this by creating a std::wistream and imbue() it with a conversion facet like this:
1 2 |
std::wifstream file( "somefile.xml" ); file.imbue( std::locale( file.getloc(), new std::codecvt_utf8_utf16 ); |
Pretty standard, but what if you are just given an std::istream (e. g. as parameter of a function you can’t or don’t want to change to std::wistream)?
I found two solutions that are almost as straightforward to use as imbue(). The first one requires only standard C++11, the second one utilizes Boost.IoStreams.
Standard C++11 Solution
I just provide a code example with explanations in the comments. You may look up std::wbuffer_convert for further details.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
#include <locale> #include <codecvt> // A stream buffer that converts from UTF-8 to UTF-16. using wbuffer_convert_utf8_utf16 = std::wbuffer_convert< std::codecvt_utf8_utf16< wchar_t, 0x10ffff, std::consume_header > >; // An example function with an std::istream parameter. void dosomething( std::istream& input_utf8 ) { // Wrap a wide stream around the narrow stream. wbuffer_convert_utf8_utf16 conv( input_utf8.rdbuf() ); std::wistream input_utf16( &conv ); // Now use input_utf16 normally ... std::wstring line; while( std::getline( input_utf16, line ) ) { /* ... */ } } |
Boost.IoStreams Solution
What is Boost.IoStreams? It is basically a library to easily create custom streams that are compatible with standard C++ streams.
Using Boost.IoStreams the code goes like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
#include <codecvt> #include <boost/iostreams/stream.hpp> #include <boost/iostreams/code_converter.hpp> #include <boost/ref.hpp> namespace io = boost::iostreams; // Create a standard-compatible stream that has a converter from UTF-8 to UTF-16 build-in. using idevice_utf8_utf16 = io::code_converter< std::istream, std::codecvt_utf8_utf16< wchar_t, 0x10ffff, std::consume_header > >; using wistream_utf8_utf16 = io::stream< idevice_utf8_utf16 >; // An example function with an std::istream parameter. void dosomething( std::istream& input_utf8 ) { // Wrap a wide stream around the narrow stream. wistream_utf8_utf16 input_utf16( boost::ref( input_utf8 ) ); // Now use input_utf16 like a std::wistream ... std::wstring line; while( std::getline( input_utf16, line ) ) { /* ... */ } } |
Some explanations:
- Line 9 creates a “device” which is a concept that exists solely in the boost::iostreams library. This type is just used in the following line to create the actual standard-compatible stream type.
- Line 18 creates an instance of that stream which just wraps the std::istream parameter. The boost::ref is required because Boost.IoStreams requires devices to be copyable (a standard stream is just a model of device in terms of Boost.IoStreams). Standard streams are by definition non-copyable, so we must work around that by using boost::ref which is a copyable reference-wrapper (see Design Rationale of Boost.IoStreams).
Conclusion
Both the standard C++11 and the Boost.IoStreams solution are pretty straightforward to use. There are basically just one or two additional code lines to wrap a wide stream around the narrow stream (not counting the includes and the using-statements)!
I prefer the standard C++11 solution because it has less dependencies. The Boost.IoStreams solution might be useful in a context where one already works with Boost.IoStreams though.