characters 1.1.0-nullsafety.5 characters: ^1.1.0-nullsafety.5 copied to clipboard
String replacement with operations that are Unicode/grapheme cluster aware.
[Characters
][Characters] are strings viewed as
sequences of user-perceived characters,
also known as [Unicode (extended) grapheme clusters][Grapheme Clusters].
The [Characters
][Characters] class allows access to
the individual characters of a string,
and a way to navigate back and forth between them
using a [CharacterRange
][CharacterRange].
Unicode characters and representations #
There is no such thing as plain text.
Computers only know numbers, so any "text" on a computer is represented by numbers, which are again stored as bytes in memory.
The meaning of those bytes are provided by layers of interpretation, building up to the glyphs that the computer displays on the screen.
Abstraction | Dart Type | Usage | Example |
---|---|---|---|
Bytes | [ByteBuffer ][ByteBuffer],[ Uint8List ][Uint8List] |
Physical layout: Memory or network communication. | file.readAsBytesSync() |
[Code units][] | [Uint8List ][Uint8List] (UTF‑8)[ Uint16List ][Uint16List], [String ][String] (UTF‑16) |
Standard formats for encoding code points in memory. Stored in memory using one (UTF‑8) or more (UTF‑16) bytes. One or more code units encode a code point. |
string.codeUnits string.codeUnitAt(index) utf8.encode(string) |
[Code points][] | [Runes ][Runes] |
The Unicode unit of meaning. | string.runes |
[Grapheme Clusters][] | [Characters ][Characters] |
Human perceived character. One or more code points. | string.characters |
[Glyphs][] | Visual rendering of grapheme clusters. | print(string) |
A Dart String
is a sequence of UTF-16 code units,
just like strings in JavaScript and Java.
The runtime system decides on the underlying physical representation.
That makes plain strings inadequate when needing to manipulate the text that a user is viewing, or entering, because string operations are not working at the grapheme cluster level.
For example, to abbreviate a text to, say, the 15 first characters or glyphs,
an string like "A 🇬🇧 text in English"
should abbreviate to "A 🇬🇧 text in Eng… when counting characters,
but will become "A 🇬🇧 text in …"
if counting code units using [String
][String] operations.
Whenever you need to manipulate strings at the character level,
you should be using the [Characters
][Characters] type,
not the methods of the [String
][String] class.
The Characters class #
The [Characters
][Characters] class exposes a string
as a sequence of grapheme clusters.
All operations on [Characters
][Characters] operate
on entire grapheme clusters,
so it removes the risk of splitting combined characters or emojis
that are inherent in the code-unit based [String
][String] operations.
You can get a [Characters
][Characters] object for a string using either
the constructor [Characters(string)
][Characters constructor]
or the extension getter string.characters
.
At its core, the class is an [Iterable<String>
][Iterable]
where the element strings are single grapheme clusters.
This allows sequential access to the individual grapheme clusters
of the original string.
On top of that, there are operations mirroring the operations
of [String
][String] that are not index, code-unit or code-point based,
like [startsWith
][Characters.startsWith]
or [replaceAll
][Characters.replaceAll].
There are some differences between these and the [String
][String] operations.
For example the replace methods only accept characters as pattern.
Regular expressions are not grapheme cluster aware,
so they cannot be used safely on a sequence of characters.
Grapheme clusters have varying length in the underlying representation,
so operations on a [Characters
][Characters] sequence cannot be index based.
Instead the [CharacterRange
][CharacterRange] iterator
provided by [Characters.iterator
][Characters.iterator]
has been greatly enhanced.
It can move both forwards and backwards,
and it can span a range of grapheme cluster.
Most operations that can be performed on a full [Characters
][Characters]
can also be performed on the grapheme clusters
in the range of a [CharacterRange
][CharacterRange].
The range can be contracted, expanded or moved in various ways,
not restricted to using [moveNext
][CharacterRange.moveNext],
to move to the next grapheme cluster.
Example:
// Using String indices.
String firstTagString(String source) {
var start = string.indexOf("<") + 1;
if (start > 0) {
var end = string.indexOf(">", start);
if (end >= 0) {
return string.substring(start, end);
}
}
return null;
}
// Using CharacterRange operations.
Characters firstTagCharacters(Characters source) =>
var range = source.findFirst("<".characters);
if (range != null && range.moveUntil(">".characters)) {
return range.currentCharacters;
}
return null;
}
[ByteBuffer]: https://api.dart.dev/stable/2.0.0/dart-typed_data/ByteBuffer-class.html "ByteBuffer class" [CharacterRange.moveNext]: https://pub.dev/documentation/characters/latest/characters/CharacterRange/moveNext.html "CharacterRange.moveNext" [CharacterRange]: https://pub.dev/documentation/characters/latest/characters/CharacterRange-class.html "CharacterRange class" [Characters constructor]: https://pub.dev/documentation/characters/latest/characters/Characters/Characters.html "Characters constructor" [Characters.iterator]: https://pub.dev/documentation/characters/latest/characters/Characters/iterator.html "CharactersRange get iterator" [Characters.replaceAll]: https://pub.dev/documentation/characters/latest/characters/Characters/replaceAll.html "Characters.replaceAlle" [Characters.startsWith]: https://pub.dev/documentation/characters/latest/characters/Characters/startsWith.html "Characters.startsWith" [Characters]: https://pub.dev/documentation/characters/latest/characters/Characters-class.html "Characters class" [Code Points]: https://unicode.org/glossary/#code_point "Unicode Code Point" [Code Units]: https://unicode.org/glossary/#code_unit "Unicode Code Units" [Glyphs]: http://unicode.org/glossary/#glyph "Unicode Glyphs" [Grapheme Clusters]: https://unicode.org/reports/tr29/#Grapheme_Cluster_Boundaries "Unicode (Extended) Grapheme Cluster" [Iterable]: https://api.dart.dev/stable/2.0.0/dart-core/Iterable-class.html "Iterable class" [Runes]: https://api.dart.dev/stable/2.0.0/dart-core/Runes-class.html "Runes class" [String]: https://api.dart.dev/stable/2.0.0/dart-core/String-class.html "String class" [Uint16List]: https://api.dart.dev/stable/2.0.0/dart-typed_data/Uint16List-class.html "Uint16List class" [Uint8List]: https://api.dart.dev/stable/2.0.0/dart-typed_data/Uint8List-class.html "Uint8List class"