Typefully

Using `casefold()`

Avatar

Share

 • 

3 years ago

 • 

View on X

Here are two strings: >>> size '5µm' >>> another_size '5μm' And here's their comparison: >>> size == another_size False Why?
The answer may or may not be obvious depending on the font you're using For example, here's a code snippet of the same code using a different font – look closely
Here's another font, to make it even clearer:
The characters used for µ are not the same One is the micro sign <"\u00b5" or chr(181)> The other is the lowercase Greek letter mu <"\u03bc" or chr(956)> If you're comparing texts from different sources which may use either character, a simple comparison may return `False`
Solution: use `casefold()` when comparing strings in this scenario. `casefold()` is one of the string methods and is similar to `upper()` and `lower()` but takes care of edge cases such as µ
>>> size '5µm' >>> another_size '5μm' >>> size == another_size False >>> size.casefold() == another_size.casefold() True
For these characters, `casefold()` always returns the lowercase Greek mu which is the preferred character >>> chr(181) 'µ' >>> chr(956) 'μ' >>> ord(chr(181).casefold()) 956 >>> ord(chr(956).casefold()) 956
More generally, `casefold()` is used to match lowercase and uppercase strings and is a better solution than converting strings either using `upper()` or `lower()` because of some of these edge cases like µ or ß, for example
The German letter ß is another common example for which `casefold()` is needed since it's equivalent to ss: >>> "ß".casefold() 'ss' >>> "groß" == "gross" False >>> "groß".casefold() == "gross".casefold() True
Another use case you may come across is texts with ligatures such as 'fl' if this is not converted to the separate f and l
You can read more about this topic in @mathsppblog's great article which also has code that will allow you to see all the characters that are affected by this, like µ and ß (and this lovely-looking character ﬗ) mathspp.com/blog/how-to-work-with-case-insensitive-strings
What other examples have you encountered where you need to use `casefold()` as `lower()` or `upper()` just won't work?
Avatar

Stephen Gruppetta

@s_gruppetta_ct

Constantly looking for innovative ways to talk and write about Python • Mentoring learners • Writing about Python • Writing about technical writing