W3cubDocs

Convert string data between Unicode encodings. src is either a String or a Vector{UIntXX} of UTF-XX code units, where XX is 8, 16, or 32. T indicates the encoding of the return value: String to return a (UTF-8 encoded) String or UIntXX to return a Vector{UIntXX} of UTF-XX data. (The alias Cwchar_t can also be used as the integer type, for converting wchar_t* strings used by external C libraries.)

The transcode function succeeds as long as the input data can be reasonably represented in the target encoding; it always succeeds for conversions between UTF-XX encodings, even for invalid Unicode data.

Only conversion to/from UTF-8 is currently supported.

source

`Base.unsafe_string`Function

unsafe_string(p::Ptr{UInt8}, [length::Integer])

Copy a string from the address of a C-style (NUL-terminated) string encoded as UTF-8. (The pointer can be safely freed afterwards.) If length is specified (the length of the data in bytes), the string does not have to be NUL-terminated.

This function is labelled "unsafe" because it will crash if p is not a valid memory address to data of the requested length.

source

`Base.codeunit`Method

codeunit(s::AbstractString, i::Integer)

Get the ith code unit of an encoded string. For example, returns the ith byte of the representation of a UTF-8 string.

source

`Base.ascii`Function

ascii(s::AbstractString)

Convert a string to String type and check that it contains only ASCII data, otherwise throwing an ArgumentError indicating the position of the first non-ASCII byte.

julia> ascii("abcdeγfgh")
ERROR: ArgumentError: invalid ASCII at index 6 in "abcdeγfgh"
Stacktrace:
 [1] ascii(::String) at ./strings/util.jl:479

julia> ascii("abcdefgh")
"abcdefgh"

source

`Base.@r_str`Macro

@r_str -> Regex

Construct a regex, such as r"^[a-z]*$". The regex also accepts one or more flags, listed after the ending quote, to change its behaviour:

i enables case-insensitive matching
m treats the ^ and $ tokens as matching the start and end of individual lines, as opposed to the whole string.
s allows the . modifier to match newlines.
x enables "comment mode": whitespace is enabled except when escaped with \, and # is treated as starting a comment.

For example, this regex has all three flags enabled:

julia> match(r"a+.*b+.*?d$"ism, "Goodbye,\nOh, angry,\nBad world\n")
RegexMatch("angry,\nBad world")

source

`Base.Docs.@html_str`Macro

@html_str -> Docs.HTML

Create an HTML object from a literal string.

source

`Base.Docs.@text_str`Macro

@text_str -> Docs.Text

Create a Text object from a literal string.

source

`Base.UTF8proc.normalize_string`Function

normalize_string(s::AbstractString, normalform::Symbol)

Normalize the string s according to one of the four "normal forms" of the Unicode standard: normalform can be :NFC, :NFD, :NFKC, or :NFKD. Normal forms C (canonical composition) and D (canonical decomposition) convert different visually identical representations of the same abstract string into a single canonical form, with form C being more compact. Normal forms KC and KD additionally canonicalize "compatibility equivalents": they convert characters that are abstractly similar but visually distinct into a single canonical choice (e.g. they expand ligatures into the individual characters), with form KC being more compact.

Alternatively, finer control and additional transformations may be be obtained by calling normalize_string(s; keywords...), where any number of the following boolean keywords options (which all default to false except for compose) are specified:

compose=false: do not perform canonical composition
decompose=true: do canonical decomposition instead of canonical composition (compose=true is ignored if present)
compat=true: compatibility equivalents are canonicalized
casefold=true: perform Unicode case folding, e.g. for case-insensitive string comparison
newline2lf=true, newline2ls=true, or newline2ps=true: convert various newline sequences (LF, CRLF, CR, NEL) into a linefeed (LF), line-separation (LS), or paragraph-separation (PS) character, respectively
stripmark=true: strip diacritical marks (e.g. accents)
stripignore=true: strip Unicode's "default ignorable" characters (e.g. the soft hyphen or the left-to-right marker)
stripcc=true: strip control characters; horizontal tabs and form feeds are converted to spaces; newlines are also converted to spaces unless a newline-conversion flag was specified
rejectna=true: throw an error if unassigned code points are found
stable=true: enforce Unicode Versioning Stability

For example, NFKC corresponds to the options compose=true, compat=true, stable=true.

source

`Base.UTF8proc.graphemes`Function

graphemes(s::AbstractString) -> GraphemeIterator

Returns an iterator over substrings of s that correspond to the extended graphemes in the string, as defined by Unicode UAX #29. (Roughly, these are what users would perceive as single characters, even though they may contain more than one codepoint; for example a letter combined with an accent mark is a single grapheme.)

source

`Base.isvalid`Method

isvalid(value) -> Bool

Returns true if the given value is valid for its type, which currently can be either Char or String.

source

`Base.isvalid`Method

isvalid(T, value) -> Bool

Returns true if the given value is valid for that type. Types currently can be either Char or String. Values for Char can be of type Char or UInt32. Values for String can be of that type, or Vector{UInt8}.

source

`Base.isvalid`Method

isvalid(str::AbstractString, i::Integer)

Tells whether index i is valid for the given string.

Examples

julia> str = "αβγdef";

julia> isvalid(str, 1)
true

julia> str[1]
'α': Unicode U+03b1 (category Ll: Letter, lowercase)

julia> isvalid(str, 2)
false

julia> str[2]
ERROR: UnicodeError: invalid character index
[...]

source

`Base.UTF8proc.is_assigned_char`Function

is_assigned_char(c) -> Bool

Returns true if the given char or integer is an assigned Unicode code point.

source

`Base.ismatch`Function

ismatch(r::Regex, s::AbstractString) -> Bool

Test whether a string contains a match of the given regular expression.

source

`Base.match`Function

match(r::Regex, s::AbstractString[, idx::Integer[, addopts]])

Search for the first match of the regular expression r in s and return a RegexMatch object containing the match, or nothing if the match failed. The matching substring can be retrieved by accessing m.match and the captured sequences can be retrieved by accessing m.captures The optional idx argument specifies an index at which to start the search.

source

`Base.eachmatch`Function

eachmatch(r::Regex, s::AbstractString[, overlap::Bool=false])

Search for all matches of a the regular expression r in s and return a iterator over the matches. If overlap is true, the matching sequences are allowed to overlap indices in the original string, otherwise they must be from distinct character ranges.

source

`Base.matchall`Function

matchall(r::Regex, s::AbstractString[, overlap::Bool=false]) -> Vector{AbstractString}

Return a vector of the matching substrings from eachmatch.

source

`Base.lpad`Function

lpad(s, n::Integer, p::AbstractString=" ")

Make a string at least n columns wide when printed by padding s on the left with copies of p.

julia> lpad("March",10)
"     March"

source

`Base.rpad`Function

rpad(s, n::Integer, p::AbstractString=" ")

Make a string at least n columns wide when printed by padding s on the right with copies of p.

julia> rpad("March",20)
"March               "

source

`Base.search`Function

search(string::AbstractString, chars::Chars, [start::Integer])

Search for the first occurrence of the given characters within the given string. The second argument may be a single character, a vector or a set of characters, a string, or a regular expression (though regular expressions are only allowed on contiguous strings, such as ASCII or UTF-8 strings). The third argument optionally specifies a starting index. The return value is a range of indexes where the matching sequence is found, such that s[search(s,x)] == x:

search(string, "substring") = start:end such that string[start:end] == "substring", or 0:-1 if unmatched.

search(string, 'c') = index such that string[index] == 'c', or 0 if unmatched.

julia> search("Hello to the world", "z")
0:-1

julia> search("JuliaLang","Julia")
1:5

source

`Base.rsearch`Function

rsearch(s::AbstractString, chars::Chars, [start::Integer])

Similar to search, but returning the last occurrence of the given characters within the given string, searching in reverse from start.

julia> rsearch("aaabbb","b")
6:6

source

`Base.searchindex`Function

searchindex(s::AbstractString, substring, [start::Integer])

Similar to search, but return only the start index at which the substring is found, or 0 if it is not.

julia> searchindex("Hello to the world", "z")
0

julia> searchindex("JuliaLang","Julia")
1

julia> searchindex("JuliaLang","Lang")
6

source

`Base.rsearchindex`Function

rsearchindex(s::AbstractString, substring, [start::Integer])

Similar to rsearch, but return only the start index at which the substring is found, or 0 if it is not.

julia> rsearchindex("aaabbb","b")
6

julia> rsearchindex("aaabbb","a")
3

source

`Base.contains`Method

contains(haystack::AbstractString, needle::AbstractString)

Determine whether the second argument is a substring of the first.

julia> contains("JuliaLang is pretty cool!", "Julia")
true

source

`Base.reverse`Method

reverse(s::AbstractString) -> AbstractString

Reverses a string.

julia> reverse("JuliaLang")
"gnaLailuJ"

source

`Base.replace`Function

replace(string::AbstractString, pat, r[, n::Integer=0])

Search for the given pattern pat, and replace each occurrence with r. If n is provided, replace at most n occurrences. As with search, the second argument may be a single character, a vector or a set of characters, a string, or a regular expression. If r is a function, each occurrence is replaced with r(s) where s is the matched substring. If pat is a regular expression and r is a SubstitutionString, then capture group references in r are replaced with the corresponding matched text.

source

`Base.split`Function

split(s::AbstractString, [chars]; limit::Integer=0, keep::Bool=true)

Return an array of substrings by splitting the given string on occurrences of the given character delimiters, which may be specified in any of the formats allowed by search's second argument (i.e. a single character, collection of characters, string, or regular expression). If chars is omitted, it defaults to the set of all space characters, and keep is taken to be false. The two keyword arguments are optional: they are a maximum size for the result and a flag determining whether empty fields should be kept in the result.

julia> a = "Ma.rch"
"Ma.rch"

julia> split(a,".")
2-element Array{SubString{String},1}:
 "Ma"
 "rch"

source

`Base.rsplit`Function

rsplit(s::AbstractString, [chars]; limit::Integer=0, keep::Bool=true)

Similar to split, but starting from the end of the string.

julia> a = "M.a.r.c.h"
"M.a.r.c.h"

julia> rsplit(a,".")
5-element Array{SubString{String},1}:
 "M"
 "a"
 "r"
 "c"
 "h"

julia> rsplit(a,".";limit=1)
1-element Array{SubString{String},1}:
 "M.a.r.c.h"

julia> rsplit(a,".";limit=2)
2-element Array{SubString{String},1}:
 "M.a.r.c"
 "h"

source

`Base.strip`Function

strip(s::AbstractString, [chars::Chars])

Return s with any leading and trailing whitespace removed. If chars (a character, or vector or set of characters) is provided, instead remove characters contained in it.

julia> strip("{3, 5}\n", ['{', '}', '\n'])
"3, 5"

source

`Base.lstrip`Function

lstrip(s::AbstractString[, chars::Chars])

Return s with any leading whitespace and delimiters removed. The default delimiters to remove are ' ', \t, \n, \v, \f, and \r. If chars (a character, or vector or set of characters) is provided, instead remove characters contained in it.

julia> a = lpad("March", 20)
"               March"

julia> lstrip(a)
"March"

source

`Base.rstrip`Function

rstrip(s::AbstractString[, chars::Chars])

Return s with any trailing whitespace and delimiters removed. The default delimiters to remove are ' ', \t, \n, \v, \f, and \r. If chars (a character, or vector or set of characters) is provided, instead remove characters contained in it.

julia> a = rpad("March", 20)
"March               "

julia> rstrip(a)
"March"

source

`Base.startswith`Function

startswith(s::AbstractString, prefix::AbstractString)

Returns true if s starts with prefix. If prefix is a vector or set of characters, tests whether the first character of s belongs to that set.

`Base.endswith`Function

endswith(s::AbstractString, suffix::AbstractString)

Returns true if s ends with suffix. If suffix is a vector or set of characters, tests whether the last character of s belongs to that set.

`Base.uppercase`Function

uppercase(s::AbstractString)

Returns s with all characters converted to uppercase.

Example

julia> uppercase("Julia")
"JULIA"

source

`Base.lowercase`Function

lowercase(s::AbstractString)

Returns s with all characters converted to lowercase.

Example

julia> lowercase("STRINGS AND THINGS")
"strings and things"

source

`Base.titlecase`Function

titlecase(s::AbstractString)

Capitalizes the first character of each word in s.

Example

julia> titlecase("the julia programming language")
"The Julia Programming Language"

source

`Base.ucfirst`Function

ucfirst(s::AbstractString)

Returns string with the first character converted to uppercase.

Example

julia> ucfirst("python")
"Python"

source

`Base.lcfirst`Function

lcfirst(s::AbstractString)

Returns string with the first character converted to lowercase.

Example

julia> lcfirst("Julia")
"julia"

source

`Base.join`Function

join(io::IO, strings, delim, [last])

Join an array of strings into a single string, inserting the given delimiter between adjacent strings. If last is given, it will be used instead of delim between the last two strings. For example,

julia> join(["apples", "bananas", "pineapples"], ", ", " and ")
"apples, bananas and pineapples"

strings can be any iterable over elements x which are convertible to strings via print(io::IOBuffer, x). strings will be printed to io.

source

`Base.chop`Function

chop(s::AbstractString)

Remove the last character from s.

julia> a = "March"
"March"

julia> chop(a)
"Marc"

source

`Base.chomp`Function

chomp(s::AbstractString)

Remove a single trailing newline from a string.

julia> chomp("Hello\n")
"Hello"

source

`Base.ind2chr`Function

ind2chr(s::AbstractString, i::Integer)

Convert a byte index i to a character index with respect to string s.

`Base.chr2ind`Function

chr2ind(s::AbstractString, i::Integer)

Convert a character index i to a byte index.

`Base.nextind`Function

nextind(str::AbstractString, i::Integer)

Get the next valid string index after i. Returns a value greater than endof(str) at or after the end of the string.

Examples

julia> str = "αβγdef";

julia> nextind(str, 1)
3

julia> endof(str)
9

julia> nextind(str, 9)
10

source

`Base.prevind`Function

prevind(str::AbstractString, i::Integer)

Get the previous valid string index before i. Returns a value less than 1 at the beginning of the string.

Examples

julia> prevind("αβγdef", 3)
1

julia> prevind("αβγdef", 1)
0

source

`Base.Random.randstring`Function

randstring([rng,] len=8)

Create a random ASCII string of length len, consisting of upper- and lower-case letters and the digits 0-9. The optional rng argument specifies a random number generator, see Random Numbers.

Example

julia> rng = MersenneTwister(1234);

julia> randstring(rng, 4)
"mbDd"

source

`Base.UTF8proc.charwidth`Function

charwidth(c)

Gives the number of columns needed to print a character.

source

`Base.strwidth`Function

strwidth(s::AbstractString)

Gives the number of columns needed to print a string.

Example

julia> strwidth("March")
5

source

`Base.UTF8proc.isalnum`Function

isalnum(c::Char) -> Bool

Tests whether a character is alphanumeric. A character is classified as alphabetic if it belongs to the Unicode general category Letter or Number, i.e. a character whose category code begins with 'L' or 'N'.

source

`Base.UTF8proc.isalpha`Function

isalpha(c::Char) -> Bool

Tests whether a character is alphabetic. A character is classified as alphabetic if it belongs to the Unicode general category Letter, i.e. a character whose category code begins with 'L'.

source

`Base.isascii`Function

isascii(c::Union{Char,AbstractString}) -> Bool

Tests whether a character belongs to the ASCII character set, or whether this is true for all elements of a string.

source

`Base.UTF8proc.iscntrl`Function

iscntrl(c::Char) -> Bool

Tests whether a character is a control character. Control characters are the non-printing characters of the Latin-1 subset of Unicode.

source

`Base.UTF8proc.isdigit`Function

isdigit(c::Char) -> Bool

Tests whether a character is a numeric digit (0-9).

source

`Base.UTF8proc.isgraph`Function

isgraph(c::Char) -> Bool

Tests whether a character is printable, and not a space. Any character that would cause a printer to use ink should be classified with isgraph(c)==true.

source

`Base.UTF8proc.islower`Function

islower(c::Char) -> Bool

Tests whether a character is a lowercase letter. A character is classified as lowercase if it belongs to Unicode category Ll, Letter: Lowercase.

source

`Base.UTF8proc.isnumber`Function

isnumber(c::Char) -> Bool

Tests whether a character is numeric. A character is classified as numeric if it belongs to the Unicode general category Number, i.e. a character whose category code begins with 'N'.

source

`Base.UTF8proc.isprint`Function

isprint(c::Char) -> Bool

Tests whether a character is printable, including spaces, but not a control character.

source

`Base.UTF8proc.ispunct`Function

ispunct(c::Char) -> Bool

Tests whether a character belongs to the Unicode general category Punctuation, i.e. a character whose category code begins with 'P'.

source

`Base.UTF8proc.isspace`Function

isspace(c::Char) -> Bool

Tests whether a character is any whitespace character. Includes ASCII characters '\t', '\n', '\v', '\f', '\r', and ' ', Latin-1 character U+0085, and characters in Unicode category Zs.

source

`Base.UTF8proc.isupper`Function

isupper(c::Char) -> Bool

Tests whether a character is an uppercase letter. A character is classified as uppercase if it belongs to Unicode category Lu, Letter: Uppercase, or Lt, Letter: Titlecase.

source

`Base.isxdigit`Function

isxdigit(c::Char) -> Bool

Tests whether a character is a valid hexadecimal digit. Note that this does not include x (as in the standard 0x prefix).

Example

julia> isxdigit('a')
true

julia> isxdigit('x')
false

source

`Core.Symbol`Type

Symbol(x...) -> Symbol

Create a Symbol by concatenating the string representations of the arguments together.

source

`Base.escape_string`Function

escape_string([io,] str::AbstractString[, esc::AbstractString]) -> AbstractString

General escaping of traditional C and Unicode escape sequences. Any characters in esc are also escaped (with a backslash). See also unescape_string.

source

`Base.unescape_string`Function

unescape_string([io,] s::AbstractString) -> AbstractString

General unescaping of traditional C and Unicode escape sequences. Reverse of escape_string.

source

© 2009–2016 Jeff Bezanson, Stefan Karpinski, Viral B. Shah, and other contributors
Licensed under the MIT License.
https://docs.julialang.org/en/release-0.6/stdlib/strings/

Strings

Base.lengthMethod

Base.sizeofMethod

Base.:*Method

Base.:^Method

Base.stringFunction

Base.reprFunction

Core.StringMethod

Base.transcodeFunction

Base.unsafe_stringFunction

Base.codeunitMethod

Base.asciiFunction

Base.@r_strMacro

Base.Docs.@html_strMacro

Base.Docs.@text_strMacro

Base.UTF8proc.normalize_stringFunction

Base.UTF8proc.graphemesFunction

Base.isvalidMethod

Base.isvalidMethod

Base.isvalidMethod

Base.UTF8proc.is_assigned_charFunction

Base.ismatchFunction

Base.matchFunction

Base.eachmatchFunction

Base.matchallFunction

Base.lpadFunction

Base.rpadFunction

Base.searchFunction

Base.rsearchFunction

Base.searchindexFunction

Base.rsearchindexFunction

Base.containsMethod

Base.reverseMethod

Base.replaceFunction

Base.splitFunction

Base.rsplitFunction

Base.stripFunction

Base.lstripFunction

Base.rstripFunction

Base.startswithFunction

Base.endswithFunction

Base.uppercaseFunction

Base.lowercaseFunction

Base.titlecaseFunction

Base.ucfirstFunction

Base.lcfirstFunction

Base.joinFunction

Base.chopFunction

Base.chompFunction

Base.ind2chrFunction

Base.chr2indFunction

Base.nextindFunction

Base.previndFunction

Base.Random.randstringFunction

Base.UTF8proc.charwidthFunction

Base.strwidthFunction

Base.UTF8proc.isalnumFunction

Base.UTF8proc.isalphaFunction

Base.isasciiFunction

Base.UTF8proc.iscntrlFunction

Base.UTF8proc.isdigitFunction

Base.UTF8proc.isgraphFunction

Base.UTF8proc.islowerFunction

Base.UTF8proc.isnumberFunction

Base.UTF8proc.isprintFunction

Base.UTF8proc.ispunctFunction

Base.UTF8proc.isspaceFunction

Base.UTF8proc.isupperFunction

Base.isxdigitFunction

Core.SymbolType

Base.escape_stringFunction

Base.unescape_stringFunction

`Base.length`Method

`Base.sizeof`Method

`Base.:*`Method

`Base.:^`Method

`Base.string`Function

`Base.repr`Function

`Core.String`Method

`Base.transcode`Function

`Base.unsafe_string`Function

`Base.codeunit`Method

`Base.ascii`Function

`Base.@r_str`Macro

`Base.Docs.@html_str`Macro

`Base.Docs.@text_str`Macro

`Base.UTF8proc.normalize_string`Function

`Base.UTF8proc.graphemes`Function

`Base.isvalid`Method

`Base.isvalid`Method

`Base.isvalid`Method

`Base.UTF8proc.is_assigned_char`Function

`Base.ismatch`Function

`Base.match`Function

`Base.eachmatch`Function

`Base.matchall`Function

`Base.lpad`Function

`Base.rpad`Function

`Base.search`Function

`Base.rsearch`Function

`Base.searchindex`Function

`Base.rsearchindex`Function

`Base.contains`Method

`Base.reverse`Method

`Base.replace`Function

`Base.split`Function

`Base.rsplit`Function

`Base.strip`Function

`Base.lstrip`Function

`Base.rstrip`Function

`Base.startswith`Function

`Base.endswith`Function

`Base.uppercase`Function

`Base.lowercase`Function

`Base.titlecase`Function

`Base.ucfirst`Function

`Base.lcfirst`Function

`Base.join`Function

`Base.chop`Function

`Base.chomp`Function

`Base.ind2chr`Function

`Base.chr2ind`Function

`Base.nextind`Function

`Base.prevind`Function

`Base.Random.randstring`Function

`Base.UTF8proc.charwidth`Function

`Base.strwidth`Function

`Base.UTF8proc.isalnum`Function

`Base.UTF8proc.isalpha`Function

`Base.isascii`Function

`Base.UTF8proc.iscntrl`Function

`Base.UTF8proc.isdigit`Function

`Base.UTF8proc.isgraph`Function

`Base.UTF8proc.islower`Function

`Base.UTF8proc.isnumber`Function

`Base.UTF8proc.isprint`Function

`Base.UTF8proc.ispunct`Function

`Base.UTF8proc.isspace`Function

`Base.UTF8proc.isupper`Function

`Base.isxdigit`Function

`Core.Symbol`Type

`Base.escape_string`Function

`Base.unescape_string`Function