Using `casefold()`

•

Here are two strings: >>> size '5µm' >>> another_size '5μm' And here's their comparison: >>> size == another_size False Why?

The answer may or may not be obvious depending on the font you're using For example, here's a code snippet of the same code using a different font – look closely

Here's another font, to make it even clearer:

The characters used for µ are not the same One is the micro sign <"\u00b5" or chr(181)> The other is the lowercase Greek letter mu <"\u03bc" or chr(956)> If you're comparing texts from different sources which may use either character, a simple comparison may return `False`

Solution: use `casefold()` when comparing strings in this scenario. `casefold()` is one of the string methods and is similar to `upper()` and `lower()` but takes care of edge cases such as µ

>>> size '5µm' >>> another_size '5μm' >>> size == another_size False >>> size.casefold() == another_size.casefold() True

For these characters, `casefold()` always returns the lowercase Greek mu which is the preferred character >>> chr(181) 'µ' >>> chr(956) 'μ' >>> ord(chr(181).casefold()) 956 >>> ord(chr(956).casefold()) 956

More generally, `casefold()` is used to match lowercase and uppercase strings and is a better solution than converting strings either using `upper()` or `lower()` because of some of these edge cases like µ or ß, for example

The German letter ß is another common example for which `casefold()` is needed since it's equivalent to ss: >>> "ß".casefold() 'ss' >>> "groß" == "gross" False >>> "groß".casefold() == "gross".casefold() True

Another use case you may come across is texts with ligatures such as 'ﬂ' if this is not converted to the separate f and l

You can read more about this topic in @mathsppblog's great article which also has code that will allow you to see all the characters that are affected by this, like µ and ß (and this lovely-looking character ﬗ) mathspp.com/blog/how-to-work-with-case-insensitive-strings

What other examples have you encountered where you need to use `casefold()` as `lower()` or `upper()` just won't work?