Strings #
This file builds on the UTF-8 verification in Init.Data.String.Decode and the preliminary
material in Init.Data.String.Defs to get the theory of strings off the ground. In particular,
in this file we construct the decoding function String.data : String → List Char and show that
it is a two-sided inverse to List.asString : List Char → String. This in turn enables us to
understand the validity predicate on positions in terms of lists of characters, which forms the
basis for all further verification for strings.
Decodes a sequence of characters from their UTF-8 representation. Returns none if the bytes are
not a sequence of Unicode scalar values.
Equations
- b.utf8Decode? = ByteArray.utf8Decode?.go b 0 #[] ⋯
Instances For
Equations
- b.validateUTF8 = ByteArray.validateUTF8.go b 0 ⋯
Instances For
Equations
Decodes an array of bytes that encode a string as UTF-8 into
the corresponding string, or returns none if the array is not a valid UTF-8 encoding of a string.
Equations
- String.fromUTF8? a = if h : a.IsValidUTF8 then some (String.fromUTF8 a h) else none
Instances For
Decodes an array of bytes that encode a string as UTF-8 into the corresponding string, or panics if the array is not a valid UTF-8 encoding of a string.
Equations
- String.fromUTF8! a = if h : a.IsValidUTF8 then String.fromUTF8 a h else panicWithPosWithDecl "Init.Data.String.Basic" "String.fromUTF8!" 185 46 "invalid UTF-8 string"
Instances For
Equations
Instances For
Converts a string to a list of characters.
Since strings are represented as dynamic arrays of bytes containing the string encoded using UTF-8, this operation takes time and space linear in the length of the string.
Examples:
Equations
- s.toList = (String.Internal.toArray s).toList
Instances For
Converts a string to a list of characters.
Since strings are represented as dynamic arrays of bytes containing the string encoded using UTF-8, this operation takes time and space linear in the length of the string.
Examples:
Equations
- b.data = (String.Internal.toArray b).toList
Instances For
Equations
- s₁.decidableLT s₂ = s₁.toList.decidableLT s₂.toList
Returns true if p is a valid UTF-8 position in the string s.
This means that p ≤ s.rawEndPos and p lies on a UTF-8 character boundary. At runtime, this
operation takes constant time.
Examples:
String.Pos.isValid "abc" ⟨0⟩ = trueString.Pos.isValid "abc" ⟨1⟩ = trueString.Pos.isValid "abc" ⟨3⟩ = trueString.Pos.isValid "abc" ⟨4⟩ = falseString.Pos.isValid "𝒫(A)" ⟨0⟩ = trueString.Pos.isValid "𝒫(A)" ⟨1⟩ = falseString.Pos.isValid "𝒫(A)" ⟨2⟩ = falseString.Pos.isValid "𝒫(A)" ⟨3⟩ = falseString.Pos.isValid "𝒫(A)" ⟨4⟩ = true
Equations
- String.Pos.Raw.isValid s p = if h : p < s.rawEndPos then decide (s.getUTF8Byte p h).IsUTF8FirstByte else decide (p = s.rawEndPos)
Instances For
Equations
Copies a region of a string to a new string.
The region of s from b (inclusive) to e (exclusive) is copied to a newly-allocated String.
If b's offset is greater than or equal to that of e, then the resulting string is "".
If possible, prefer String.slice, which avoids the allocation.
Equations
- String.extract b e = { toByteArray := s.toByteArray.extract b.offset.byteIdx e.offset.byteIdx, isValidUTF8 := ⋯ }
Instances For
Equations
- b.extract e = String.extract b e
Instances For
Efficiently checks whether a position is at a UTF-8 character boundary of the slice s.
Equations
- String.Pos.Raw.isValidForSlice s p = if h : p < s.rawEndPos then decide (s.getUTF8Byte p h).IsUTF8FirstByte else decide (p = s.rawEndPos)
Instances For
Equations
Given a valid position on s.str which is within the bounds of the slice s, obtains the
corresponding valid position on s.
Equations
- String.Slice.Pos.ofStr pos h₁ h₂ = { offset := pos.offset.unoffsetBy s.startInclusive.offset, isValidForSlice := ⋯ }
Instances For
Given a slice and a valid position within the slice, obtain a new slice on the same underlying string by replacing the start of the slice with the given position.
Equations
Instances For
Equations
- s.replaceStart pos = s.sliceFrom pos
Instances For
Given a slice and a valid position within the slice, obtain a new slice on the same underlying string by replacing the end of the slice with the given position.
Equations
Instances For
Equations
- s.replaceEnd pos = s.sliceTo pos
Instances For
Given a slice and two valid positions within the slice, obtain a new slice on the same underlying string formed by the new bounds.
Equations
Instances For
Equations
- s.replaceStartEnd newStart newEnd h = s.slice newStart newEnd h
Instances For
Given a slice and two valid positions within the slice, obtain a new slice on the same underlying
string formed by the new bounds, or none if the given end is strictly less than the given start.
Equations
Instances For
Given a slice and two valid positions within the slice, obtain a new slice on the same underlying string formed by the new bounds, or panic if the given end is strictly less than the given start.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Equations
- s.replaceStartEnd! newStart newEnd = s.slice! newStart newEnd
Instances For
Equations
- s.decodeChar byteIdx h = s.toByteArray.utf8DecodeChar byteIdx h
Instances For
Returns the byte at the given position in the string, or panics if the position is the end position.
Equations
Instances For
Returns the character at the position pos of a string, taking a proof that p is not the
past-the-end position.
This function is overridden with an efficient implementation in runtime code.
Examples:
("abc".pos ⟨1⟩ (by decide)).get (by decide) = 'b'("L∃∀N".pos ⟨1⟩ (by decide)).get (by decide) = '∃'
Instances For
Returns the character at the position pos of a string, or none if the position is the
past-the-end position.
This function is overridden with an efficient implementation in runtime code.
Instances For
Returns the character at the position pos of a string, or panics if the position is the
past-the-end position.
This function is overridden with an efficient implementation in runtime code.
Instances For
Given a position in s.sliceFrom p₀, obtain the corresponding position in s.
Equations
Instances For
Equations
Instances For
Given a position in s that is at least p₀, obtain the corresponding position in
s.sliceFrom p₀.
Instances For
Equations
- p₀.toReplaceStart pos h = p₀.sliceFrom pos h
Instances For
Equations
Instances For
Equations
- p₀.toReplaceEnd pos h = p₀.sliceTo pos h
Instances For
Advances a valid position on a slice to the next valid position, given a proof that the position is not the past-the-end position, which guarantees that such a position exists.
Equations
- pos.next h = { offset := pos.offset.increaseBy ((pos.byte h).utf8ByteSize ⋯), isValidForSlice := ⋯ }
Instances For
Advances a valid position on a slice to the next valid position, or panics if the given position is the past-the-end position.
Equations
Instances For
Equations
- One or more equations did not get rendered due to their size.
Instances For
Returns the previous valid position before the given position, given a proof that the position is not the start position, which guarantees that such a position exists.
Instances For
Returns the previous valid position before the given position, or panics if the position is the start position.
Equations
Instances For
Constructs a valid position on s from a position and a proof that it is valid.
Instances For
Constructs a valid position s from a position, panicking if the position is not valid.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Advances a valid position on a string to the next valid position, given a proof that the position is not the past-the-end position, which guarantees that such a position exists.
Equations
- pos.next h = String.Pos.ofToSlice (pos.toSlice.next ⋯)
Instances For
Advances a valid position on a string to the next valid position, or returns none if the
given position is the past-the-end position.
Equations
- pos.next? = Option.map String.Pos.ofToSlice pos.toSlice.next?
Instances For
Advances a valid position on a string to the next valid position, or panics if the given position is the past-the-end position.
Equations
- pos.next! = String.Pos.ofToSlice pos.toSlice.next!
Instances For
Returns the previous valid position before the given position, given a proof that the position is not the start position, which guarantees that such a position exists.
Equations
- pos.prev h = String.Pos.ofToSlice (pos.toSlice.prev ⋯)
Instances For
Returns the previous valid position before the given position, or none if the position is
the start position.
Equations
- pos.prev? = Option.map String.Pos.ofToSlice pos.toSlice.prev?
Instances For
Returns the previous valid position before the given position, or panics if the position is the start position.
Equations
- pos.prev! = String.Pos.ofToSlice pos.toSlice.prev!
Instances For
Constructs a valid position on s from a position and a proof that it is valid.
Equations
- s.pos off h = String.Pos.ofToSlice (s.toSlice.pos off ⋯)
Instances For
Constructs a valid position on s from a position, returning none if the position is not valid.
Equations
- s.pos? off = Option.map String.Pos.ofToSlice (s.toSlice.pos? off)
Instances For
Constructs a valid position s from a position, panicking if the position is not valid.
Equations
- s.pos! off = String.Pos.ofToSlice (s.toSlice.pos! off)
Instances For
Given a byte position within a string slice, obtains the smallest valid position that is strictly greater than the given byte position.
Equations
- String.Slice.findNextPos offset s _h = String.Slice.findNextPos.go✝ s offset.inc
Instances For
Equations
Instances For
Returns the character at position p of a string. If p is not a valid position, returns the
fallback value (default : Char), which is 'A', but does not panic.
This function is overridden with an efficient implementation in runtime code. See
String.Pos.Raw.utf8GetAux for the reference implementation.
This is a legacy function. The recommended alternative is String.Pos.get, combined with
String.pos or another means of obtaining a String.Pos.
Examples:
"abc".get ⟨1⟩ = 'b'"abc".get ⟨3⟩ = (default : Char)because byte3is at the end of the string."L∃∀N".get ⟨2⟩ = (default : Char)because byte2is in the middle of'∃'.
Equations
- String.Pos.Raw.get s p = String.Pos.Raw.utf8GetAux s.toList 0 p
Instances For
Equations
- s.get p = String.Pos.Raw.utf8GetAux s.toList 0 p
Instances For
Instances For
Returns the character at position p of a string. If p is not a valid position, returns none.
This function is overridden with an efficient implementation in runtime code. See
String.utf8GetAux? for the reference implementation.
This is a legacy function. The recommended alternative is String.Pos.get, combined with
String.pos? or another means of obtaining a String.Pos.
Examples:
Equations
- String.Pos.Raw.get? x✝¹ x✝ = String.Pos.Raw.utf8GetAux? x✝¹.toList 0 x✝
Instances For
Equations
- x✝¹.get? x✝ = String.Pos.Raw.utf8GetAux? x✝¹.toList 0 x✝
Instances For
Returns the character at position p of a string. Panics if p is not a valid position.
See String.pos? and String.Pos.get for a safer alternative.
This function is overridden with an efficient implementation in runtime code. See
String.utf8GetAux for the reference implementation.
This is a legacy function. The recommended alternative is String.Pos.get, combined with
String.pos! or another means of obtaining a String.Pos.
Examples
"abc".get! ⟨1⟩ = 'b'
Equations
- String.Pos.Raw.get! s p = String.Pos.Raw.utf8GetAux s.toList 0 p
Instances For
Equations
- s.get! p = String.Pos.Raw.utf8GetAux s.toList 0 p
Instances For
Equations
Instances For
Equations
- s.replaceEnd p = s.sliceTo p
Instances For
Equations
- s.replaceStart p = s.sliceFrom p
Instances For
Given a string and two valid positions within the string, obtain a slice on the string formed by the two positions.
This happens to be equivalent to the constructor of String.Slice.
Equations
Instances For
Given a string and two valid positions within the string, obtain a slice on the string formed
by the new bounds, or return none if the given end is strictly less then the given start.
Equations
Instances For
Given a string and two valid positions within the string, obtain a slice on the string formed by the new bounds, or panic if the given end is strictly less than the given start.
Instances For
Equations
- s.replaceStartEnd! p₁ p₂ = s.slice! p₁ p₂
Instances For
Equations
Instances For
Given a position in s that is at least p₀, obtain the corresponding position in
s.sliceFrom p₀.
Instances For
Equations
- p₀.toReplaceStart pos h = p₀.sliceFrom pos h
Instances For
Equations
Instances For
Equations
- p₀.toReplaceEnd pos h = p₀.sliceTo pos h
Instances For
Given a position in s, obtain the corresponding position in s.slice p₀ p₁ h, or panic if pos
is not between p₀ and p₁.
Equations
- pos.sliceOrPanic p₀ p₁ = pos.toSlice.sliceOrPanic p₀.toSlice p₁.toSlice
Instances For
Copies a region of a slice to a new string.
The region of s from b (inclusive) to e (exclusive) is copied to a newly-allocated String.
If b's offset is greater than or equal to that of e, then the resulting string is "".
If possible, prefer Slice.slice, which avoids the allocation.
Equations
- s.extract p₀ p₁ = String.extract p₀.str p₁.str
Instances For
Returns the next position in a string after position p. If p is not a valid position or
p = s.endPos, returns the position one byte after p.
A run-time bounds check is performed to determine whether p is at the end of the string. If a
bounds check has already been performed, use String.next' to avoid a repeated check.
This is a legacy function. The recommended alternative is String.Pos.next or one of its
variants like String.Pos.next?, combined with String.pos or another means of obtaining
a String.ValisPos.
Some examples of edge cases:
"abc".next ⟨3⟩ = ⟨4⟩, since3 = "abc".endPos"L∃∀N".next ⟨2⟩ = ⟨3⟩, since2points into the middle of a multi-byte UTF-8 character
Examples:
Equations
- String.Pos.Raw.next s p = p + String.Pos.Raw.get s p
Instances For
Equations
- s.next p = p + String.Pos.Raw.get s p
Instances For
Instances For
Returns the position in a string before a specified position, p. If p = ⟨0⟩, returns 0. If p
is greater than rawEndPos, returns the position one byte before p. Otherwise, if p occurs in the
middle of a multi-byte character, returns the beginning position of that character.
For example, "L∃∀N".prev ⟨3⟩ is ⟨1⟩, since byte 3 occurs in the middle of the multi-byte
character '∃' that starts at byte 1.
This is a legacy function. The recommended alternative is String.Pos.prev or one of its
variants like String.Pos.prev?, combined with String.pos or another means of obtaining
a String.Pos.
Examples:
"abc".get ("abc".rawEndPos |> "abc".prev) = 'c'"L∃∀N".get ("L∃∀N".rawEndPos |> "L∃∀N".prev |> "L∃∀N".prev |> "L∃∀N".prev) = '∃'
Equations
- String.Pos.Raw.prev x✝¹ x✝ = String.Pos.Raw.utf8PrevAux x✝¹.toList 0 x✝
Instances For
Equations
- x✝¹.prev x✝ = String.Pos.Raw.utf8PrevAux x✝¹.toList 0 x✝
Instances For
Returns true if a specified byte position is greater than or equal to the position which points to
the end of a string. Otherwise, returns false.
Examples:
(0 |> "abc".next |> "abc".next |> "abc".atEnd) = false(0 |> "abc".next |> "abc".next |> "abc".next |> "abc".next |> "abc".atEnd) = true(0 |> "L∃∀N".next |> "L∃∀N".next |> "L∃∀N".next |> "L∃∀N".atEnd) = false(0 |> "L∃∀N".next |> "L∃∀N".next |> "L∃∀N".next |> "L∃∀N".next |> "L∃∀N".atEnd) = true"abc".atEnd ⟨4⟩ = true"L∃∀N".atEnd ⟨7⟩ = false"L∃∀N".atEnd ⟨8⟩ = true
Equations
- String.Pos.Raw.atEnd x✝¹ x✝ = decide (x✝.byteIdx ≥ x✝¹.utf8ByteSize)
Instances For
Returns the character at position p of a string. Returns (default : Char), which is 'A', if
p is not a valid position.
Requires evidence, h, that p is within bounds instead of performing a run-time bounds check as
in String.get.
A typical pattern combines get' with a dependent if-expression to avoid the overhead of an
additional bounds check. For example:
def getInBounds? (s : String) (p : String.Pos) : Option Char :=
if h : s.atEnd p then none else some (s.get' p h)
Even with evidence of ¬ s.atEnd p, p may be invalid if a byte index points into the middle of a
multi-byte UTF-8 character. For example, "L∃∀N".get' ⟨2⟩ (by decide) = (default : Char).
This is a legacy function. The recommended alternative is String.Pos.get, combined with
String.pos or another means of obtaining a String.Pos.
Examples:
"abc".get' 0 (by decide) = 'a'let lean := "L∃∀N"; lean.get' (0 |> lean.next |> lean.next) (by decide) = '∀'
Equations
- String.Pos.Raw.get' s p h = String.Pos.Raw.utf8GetAux s.toList 0 p
Instances For
Equations
- s.get' p h = String.Pos.Raw.utf8GetAux s.toList 0 p
Instances For
Returns the next position in a string after position p. The result is unspecified if p is not a
valid position.
Requires evidence, h, that p is within bounds. No run-time bounds check is performed, as in
String.next.
A typical pattern combines String.next' with a dependent if-expression to avoid the overhead of
an additional bounds check. For example:
def next? (s : String) (p : String.Pos) : Option Char :=
if h : s.atEnd p then none else s.get (s.next' p h)
This is a legacy function. The recommended alternative is String.Pos.next, combined with
String.pos or another means of obtaining a String.Pos.
Example:
Equations
- String.Pos.Raw.next' s p h = p + String.Pos.Raw.get s p
Instances For
Equations
- s.next' p h = p + String.Pos.Raw.get s p
Instances For
Returns the first position where the two strings differ.
If one string is a prefix of the other, then the returned position is the end position of the shorter string. If the strings are identical, then their end position is returned.
Examples:
"tea".firstDiffPos "ten" = ⟨2⟩"tea".firstDiffPos "tea" = ⟨3⟩"tea".firstDiffPos "teas" = ⟨3⟩"teas".firstDiffPos "tea" = ⟨3⟩
Equations
- a.firstDiffPos b = String.firstDiffPos.loop a b (a.rawEndPos.min b.rawEndPos) 0
Instances For
Equations
- One or more equations did not get rendered due to their size.
Instances For
Creates a new string that consists of the region of the input string delimited by the two positions.
The result is "" if the start position is greater than or equal to the end position or if the
start position is at the end of the string. If either position is invalid (that is, if either points
at the middle of a multi-byte UTF-8 character) then the result is unspecified.
This is a legacy function. The recommended alternative is String.Pos.extract, but usually
it is even better to operate on String.Slice instead and call String.Slice.copy (only) if
required.
Examples:
"red green blue".extract ⟨0⟩ ⟨3⟩ = "red""red green blue".extract ⟨3⟩ ⟨0⟩ = """red green blue".extract ⟨0⟩ ⟨100⟩ = "red green blue""red green blue".extract ⟨4⟩ ⟨100⟩ = "green blue""L∃∀N".extract ⟨1⟩ ⟨2⟩ = "∃∀N""L∃∀N".extract ⟨2⟩ ⟨100⟩ = ""
Equations
- String.Pos.Raw.extract x✝² x✝¹ x✝ = if x✝¹.byteIdx ≥ x✝.byteIdx then "" else String.ofList (String.Pos.Raw.extract.go₁ x✝².toList 0 x✝¹ x✝)
Instances For
Equations
- String.Pos.Raw.extract.go₁ [] x✝² x✝¹ x✝ = []
- String.Pos.Raw.extract.go₁ (c :: cs) x✝² x✝¹ x✝ = if x✝² = x✝¹ then String.Pos.Raw.extract.go₂ (c :: cs) x✝² x✝ else String.Pos.Raw.extract.go₁ cs (x✝² + c) x✝¹ x✝
Instances For
Returns the character index that corresponds to the provided position (i.e. UTF-8 byte index) in a string.
If the position is at the end of the string, then the string's length in characters is returned. If the position is invalid due to pointing at the middle of a UTF-8 byte sequence, then the character index of the next character after the position is returned.
Examples:
"L∃∀N".offsetOfPos ⟨0⟩ = 0"L∃∀N".offsetOfPos ⟨1⟩ = 1"L∃∀N".offsetOfPos ⟨2⟩ = 2"L∃∀N".offsetOfPos ⟨4⟩ = 2"L∃∀N".offsetOfPos ⟨5⟩ = 3"L∃∀N".offsetOfPos ⟨50⟩ = 4
Equations
- String.Pos.Raw.offsetOfPos s pos = String.Pos.Raw.offsetOfPosAux s pos 0 0
Instances For
Equations
- s.offsetOfPos pos = String.Pos.Raw.offsetOfPos s pos
Instances For
Equations
Instances For
Checks whether substrings of two strings are equal. Substrings are indicated by their starting
positions and a size in UTF-8 bytes. Returns false if the indicated substring does not exist in
either string.
This is a legacy function. The recommended alternative is to construct slices representing the
strings to be compared and use the BEq instance of String.Slice.
Equations
- One or more equations did not get rendered due to their size.