Parsing

This page documents the objects and functions that in some way deal with parsing or otherwise manipulating text. Everything here follows the same conventions as the rest of the library.

Objects cmd_line_parser config_reader cpp_pretty_printer cpp_tokenizer tokenizer xml_parser base64 unichar ustring basic_utf8_ifstream
Global Functions string_cast string_assign cast_to_string cast_to_wstring wrap_string narrow trim ltrim rtrim pad lpad rpad left_substr right_substr tolower toupper convert_utf8_to_utf32 is_combining_char strings_equal_ignore_case
toupper dlib/string.h dlib/string/string_abstract.h This is a function to convert a string to all uppercase. tolower dlib/string.h dlib/string/string_abstract.h This is a function to convert a string to all lowercase. right_substr dlib/string.h dlib/string/string_abstract.h This is a function to return the part of a string to the right of a user supplied delimiter. left_substr dlib/string.h dlib/string/string_abstract.h This is a function to return the part of a string to the left of a user supplied delimiter. rpad dlib/string.h dlib/string/string_abstract.h This is a function to pad whitespace (or user specified characters) onto the right most end of a string. lpad dlib/string.h dlib/string/string_abstract.h This is a function to pad whitespace (or user specified characters) onto the left most end of a string. pad dlib/string.h dlib/string/string_abstract.h This is a function to pad whitespace (or user specified characters) onto the ends of a string. rtrim dlib/string.h dlib/string/string_abstract.h This is a function to remove the whitespace (or user specified characters) from the right most end of a string. ltrim dlib/string.h dlib/string/string_abstract.h This is a function to remove the whitespace (or user specified characters) from the left most end of a string. trim dlib/string.h dlib/string/string_abstract.h This is a function to remove the whitespace (or user specified characters) from the ends of a string. narrow dlib/string.h dlib/string/string_abstract.h This is a function for converting a string of type std::string or std::wstring to a plain std::string. wrap_string dlib/string.h dlib/string/string_abstract.h wrap_string is a function that takes a string and breaks it into a number of lines of a given length. You can use this to make a string fit nicely into a command prompt window for example. strings_equal_ignore_case dlib/string.h dlib/string/string_abstract.h This is a pair of functions to do a case insensitive comparison between strings. cast_to_wstring dlib/string.h dlib/string/string_abstract.h cast_to_string is a templated function which makes it easy to convert arbitrary objects to std::wstring strings. The types supported are any types that can be written to std::wostream via operator<<. cast_to_string dlib/string.h dlib/string/string_abstract.h cast_to_string is a templated function which makes it easy to convert arbitrary objects to std::string strings. The types supported are any types that can be written to std::ostream via operator<<. string_cast dlib/string.h dlib/string/string_abstract.h string_cast is a templated function which makes it easy to convert strings to other types. The types supported are any types that can be read by the basic_istream operator>>. It also supports casting between wstring, string, and ustring objects. string_assign dlib/string.h dlib/string/string_abstract.h string_assign is an object which makes it easy to convert strings to other types. The types supported are any types that can be read by the basic_istream operator>>. It also supports casting between wstring, string, and ustring objects. Since string_assign is a simple stateless object there is a global instance of it called dlib::sa. config_reader_ex.cpp.html unichar dlib/unicode.h dlib/unicode/unicode_abstract.h This is a typedef for an unsigned 32bit integer which we use to store Unicode values. basic_utf8_ifstream dlib/unicode.h dlib/unicode/unicode_abstract.h This object represents an input file stream much like the normal std::ifstream except that it knows how to read UTF-8 data. So when you read characters out of this stream it will automatically convert them from the UTF-8 multibyte encoding into a fixed width wide character encoding.

There are also two typedefs of this object. The first is utf8_wifstream which is a typedef for wchar_t as the wide character to read into. The second is utf8_uifstream which uses unichar instead of wchar_t.

ustring dlib/unicode.h dlib/unicode/unicode_abstract.h This is a typedef for a std::basic_string<unichar>. That is, it is a typedef for a string object that stores unichar Unicode characters. is_combining_char dlib/unicode.h dlib/unicode/unicode_abstract.h This is a global function that can tell you if a character is a Unicode combining character or not. convert_utf8_to_utf32 dlib/unicode.h dlib/unicode/unicode_abstract.h This is a global function that can convert UTF-8 strings into strings of 32bit unichar characters. base64 dlib/base64.h dlib/base64/base64_kernel_abstract.h This object allows you to encode and decode data to and from the Base64 Content-Transfer-Encoding defined in section 6.8 of rfc2045. file_to_code_ex.cpp.html base64_kernel_1 dlib/base64/base64_kernel_1.h This implementation is done using a lookup table in the obvious way. kernel_1a is a typedef for base64_kernel_1 cmd_line_parser dlib/cmd_line_parser.h dlib/cmd_line_parser/cmd_line_parser_kernel_abstract.h This object allows you to easily parse a command line. Note that the documentation for the cmd_line_parser_option (the object returned by the parser's .option() function) is in a separate file. compress_stream_ex.cpp.html cmd_line_parser_kernel_1 dlib/cmd_line_parser/cmd_line_parser_kernel_1.h This implementation uses the map and sequence containers to keep track of the command line options and arguments. For further details see the above link. kernel_1a is a typedef for cmd_line_parser_kernel_1 that uses map_kernel_1a and sequence_kernel_2a cmd_line_parser_check dlib/cmd_line_parser/cmd_line_parser_check_abstract.h This gives a cmd_line_parser object the ability to easily perform various kinds of validation on the command line input. cmd_line_parser_check_1 dlib/cmd_line_parser/cmd_line_parser_check_1.h This implementation is done in the obvious way. See the source for details check_1a is a typedef for cmd_line_parser_print_1 extended by cmd_line_parser_check_1 cmd_line_parser_print dlib/cmd_line_parser/cmd_line_parser_print_abstract.h This extension gives a cmd_line_parser object the ability to print its command line options in a nice format. cmd_line_parser_print_1 dlib/cmd_line_parser/cmd_line_parser_print_1.h This implementation is done by enumerating the options of the parser and printing them. print_1a is a typedef for cmd_line_parser_kernel_1 extended by cmd_line_parser_print_1 config_reader dlib/config_reader.h dlib/config_reader/config_reader_kernel_abstract.h This object represents something which is intended to be used to read text configuration files. config_reader_ex.cpp.html config_reader_kernel_1 dlib/config_reader/config_reader_kernel_1.h This implementation is done using the map object in the obvious way. kernel_1a is a typedef for config_reader_kernel_1 that uses map_kernel_1b config_reader_thread_safe dlib/config_reader/config_reader_thread_safe_abstract.h This object extends a normal config_reader by simply wrapping all its member functions inside mutex locks to make it safe to use in a threaded program. config_reader_thread_safe_1 dlib/config_reader/config_reader_thread_safe_1.h This implementation is done in the obvious way. See the source for details thread_safe_1a is a typedef for config_reader_kernel_1 extended by config_reader_thread_safe_1 cpp_pretty_printer dlib/cpp_pretty_printer.h dlib/cpp_pretty_printer/cpp_pretty_printer_kernel_abstract.h This object represents an HTML pretty printer for C++ source code. cpp_pretty_printer_kernel_1 dlib/cpp_pretty_printer/cpp_pretty_printer_kernel_1.h This is implemented by using the cpp_tokenizer object. This is the pretty printer I use on all the source in this library. It applies a color scheme, turns include directives such as #include "file.h" into links to file.h.html and puts HTML anchor points on function and class declarations. It also looks for comments starting with /*!A and puts an anchor before the comment using the word following the A as the name of the anchor. kernel_1a is a typedef for cpp_pretty_printer_kernel_1 cpp_pretty_printer_kernel_2 dlib/cpp_pretty_printer/cpp_pretty_printer_kernel_2.h This is implemented by using the cpp_tokenizer object. It applies a black and white color scheme suitable for printing on a black and white printer. It also places the document title prominently at the top of the pretty printed source file. kernel_2a is a typedef for cpp_pretty_printer_kernel_2 cpp_tokenizer dlib/cpp_tokenizer.h dlib/cpp_tokenizer/cpp_tokenizer_kernel_abstract.h This object represents a simple tokenizer for C++ source code. cpp_tokenizer_kernel_1 dlib/cpp_tokenizer/cpp_tokenizer_kernel_1.h This is implemented by using the tokenizer object in the obvious way. kernel_1a is a typedef for cpp_tokenizer_kernel_1 tokenizer dlib/tokenizer.h dlib/tokenizer/tokenizer_kernel_abstract.h This object represents a simple tokenizer for textual data. tokenizer_kernel_1 dlib/tokenizer/tokenizer_kernel_1.h This is implemented in the obvious way. kernel_1a is a typedef for tokenizer_kernel_1 xml_parser dlib/xml_parser.h dlib/xml_parser/xml_parser_kernel_abstract.h This object represents a simple SAX style event driven XML parser. It takes its input from an input stream object and sends events to all registered document_handler and error_handler objects.

The xml_parser object also uses the interface classes document_handler and error_handler. Subclasses of these classes are passed to the xml_parser which generates events while it's parsing and sends them to the appropriate handler.
xml_parser_ex.cpp.html xml_parser_kernel_1 dlib/xml_parser/xml_parser_kernel_1.h This implementation is done using a stack (as opposed to recursive descent) to parse xml documents. It also uses a map to implement the attribute_list interface and internally uses the sequence object to keep track of all registered document and error handlers. kernel_1a is a typedef for xml_parser_kernel_1 that uses map_kernel_1a, stack_kernel_1a, and sequence_kernel_2a