Character set encoding matters as many language characters are not part of the standard ASCII character set. Instructions such as
strlen will not behave the way you think when using it with extended characters such as €, ©, ®, à, ë, and many others. And languages such as Mandarin and Russian require a different approach when working with strings.
In this episode, let’s talk about why the encoding matters and how the character set is really represented in the computer. Hint: Everything in the computer breaks down into binary, i.e. ones and zeros.
Simple string instructions like
strlen use bytes. Single-byte characters are the 0-9, a-z, A-Z, punctuation (e.g. exclamation mark, period or dot, opening and closing parenthesis, asterisk, comma, hyphen, etc.), operators (such as plus, minus, divide, multiply, greater than, less than, equals, etc.), and control characters. As you move into the extended character set, these are multibyte characters. Instructions such as
strlen will not behave as you expect with these characters.
If you listen carefully, you can hear the 1s and 0s flowing in your computer.
Total Lab Runtime: 02:50:33
- 1 Lab Introductionfree 09:39
- 2 Embedding Variables in a Stringpro 15:16
- 3 Embedding Complex Variablespro 13:37
- 4 Concatenating Strings with a Dotpro 08:47
- 5 Concatenating and Assigning Shorthandpro 05:58
- 6 Formatting a String using Placeholderspro 15:16
- 7 Specifying Which Argument in a Formatted Stringpro 04:18
- 8 Has Substringpro 21:47
- 9 Replacing Substringspro 13:03
- 10 Get the String's Lengthpro 14:11
- 11 Character Set Encoding - It Matters!pro 11:42
- 12 Has Substring - for UTF-8pro 16:45
- 13 Replacing a UTF-8 Substringpro 05:56
- 14 Stripping out Characters or Entitiespro 10:51
- 15 Wrap it Upfree 03:27