unicode-grapheme
Safe HaskellNone
LanguageHaskell2010

Unicode.Grapheme

Description

Text grapheme utilities.

Since: 0.1

Synopsis

Documentation

Unicode functions are defined in terms of the abstract UnicodeFunction type, which allows us to conveniently wrap functionality across multiple unicode versions.

These can then be combined in a variety of ways for handling the unicode version.

For example, the following function will break the text into grapheme clusters, using either base's unicode version if it is supported, or falling back to the latest supported version.

>>> :{
  break :: Text -> [Text]
  break = runUnicodeFunction breakGraphemeClusters
:}

data UnicodeFunction a b Source #

UnicodeFunction represents some function that works across all UnicodeVersions. It can be extended via its Category and Arrow instances.

>>> :{
  textWidth :: UnicodeFunction Text Int
  textWidth = arr F.sum . map fmap clusterWidth . breakGraphemeClusters
:}

Since: 0.1

Instances

Instances details
Arrow UnicodeFunction Source #

Since: 0.1

Instance details

Defined in Unicode.Grapheme

Methods

arr :: (b -> c) -> UnicodeFunction b c #

first :: UnicodeFunction b c -> UnicodeFunction (b, d) (c, d) #

second :: UnicodeFunction b c -> UnicodeFunction (d, b) (d, c) #

(***) :: UnicodeFunction b c -> UnicodeFunction b' c' -> UnicodeFunction (b, b') (c, c') #

(&&&) :: UnicodeFunction b c -> UnicodeFunction b c' -> UnicodeFunction b (c, c') #

ArrowApply UnicodeFunction Source #

Since: 0.1

Instance details

Defined in Unicode.Grapheme

Methods

app :: UnicodeFunction (UnicodeFunction b c, b) c #

ArrowChoice UnicodeFunction Source #

Since: 0.1

Instance details

Defined in Unicode.Grapheme

Category UnicodeFunction Source #

Since: 0.1

Instance details

Defined in Unicode.Grapheme

Applicative (UnicodeFunction a) Source #

Since: 0.1

Instance details

Defined in Unicode.Grapheme

Methods

pure :: a0 -> UnicodeFunction a a0 #

(<*>) :: UnicodeFunction a (a0 -> b) -> UnicodeFunction a a0 -> UnicodeFunction a b #

liftA2 :: (a0 -> b -> c) -> UnicodeFunction a a0 -> UnicodeFunction a b -> UnicodeFunction a c #

(*>) :: UnicodeFunction a a0 -> UnicodeFunction a b -> UnicodeFunction a b #

(<*) :: UnicodeFunction a a0 -> UnicodeFunction a b -> UnicodeFunction a a0 #

Functor (UnicodeFunction a) Source #

Since: 0.1

Instance details

Defined in Unicode.Grapheme

Methods

fmap :: (a0 -> b) -> UnicodeFunction a a0 -> UnicodeFunction a b #

(<$) :: a0 -> UnicodeFunction a b -> UnicodeFunction a a0 #

Monad (UnicodeFunction a) Source #

Since: 0.1

Instance details

Defined in Unicode.Grapheme

Methods

(>>=) :: UnicodeFunction a a0 -> (a0 -> UnicodeFunction a b) -> UnicodeFunction a b #

(>>) :: UnicodeFunction a a0 -> UnicodeFunction a b -> UnicodeFunction a b #

return :: a0 -> UnicodeFunction a a0 #

Monoid b => Monoid (UnicodeFunction a b) Source #

Since: 0.1

Instance details

Defined in Unicode.Grapheme

Semigroup b => Semigroup (UnicodeFunction a b) Source #

Since: 0.1

Instance details

Defined in Unicode.Grapheme

Construction

breakGraphemeClusters :: UnicodeFunction Text [Text] Source #

Breaks Text into grapheme clusters.

Examples

Expand
>>> runUnicodeFunction breakGraphemeClusters "abc"
["a","b","c"]
>>> -- U+004F U+0308
>>> runUnicodeFunction breakGraphemeClusters "Ö"
["O\776"]
>>> -- 🧑‍🌾
>>> runUnicodeFunction breakGraphemeClusters "\x1F9D1\x200D\x1F33E"
["\129489\8205\127806"]

Since: 0.1

textWidth :: UnicodeFunction Text Int Source #

Splits the text into grapheme clusters and counts each cluster width.

Examples

Expand
>>> runUnicodeFunction textWidth "abc"
3
>>> -- U+004F U+0308
>>> runUnicodeFunction textWidth "Ö"
1
>>> -- 🧑‍🌾
>>> runUnicodeFunction textWidth "\x1F9D1\x200D\x1F33E"
2

Since: 0.1

clusterWidth :: UnicodeFunction Text Int Source #

Given a single grapheme cluster -- of possibly multiple codepoints -- returns the width 1 or 2. This is based on heuristics i.e. if the text contains at least one codepoint with the following properties:

  • East_Asian_Width = Fullwidth or Wide
  • Emoji_Presentation
  • U+FE0F (emoji-style)

Then width is 2. Otherwise it is 1.

Examples
Expand
>>> runUnicodeFunction clusterWidth "a"
1
>>> runUnicodeFunction clusterWidth "🇯🇵"
2
>>> -- Used with multiple clusters can lead to unexpected results!
>>> runUnicodeFunction clusterWidth "abc"
1

Since: 0.1

Operations

dimap Source #

Arguments

:: (c -> a)

Contravariantly map input.

-> (b -> d)

Covariantly map output.

-> UnicodeFunction a b 
-> UnicodeFunction c d 

Dimaps a UnicodeFunction.

Since: 0.1

map Source #

Arguments

:: ((a -> b) -> c -> d)

Function mapper.

-> UnicodeFunction a b

Unicode function.

-> UnicodeFunction c d 

Maps a UnicodeFunction.

Since: 0.1

Elimination

runUnicodeFunction :: UnicodeFunction a b -> a -> b Source #

Runs the UnicodeFunction with base's unicode version, if it is supported. Otherwise uses the latest supported version.

Since: 0.1

runUnicodeFunctionVersion :: UnicodeVersion -> UnicodeFunction a b -> a -> b Source #

Runs the UnicodeFunction with the given unicode version.

Since: 0.1

Unicode versions

data UnicodeVersion #

Instances

Instances details
Bounded UnicodeVersion 
Instance details

Defined in Unicode.Grapheme.Internal.Version

Enum UnicodeVersion 
Instance details

Defined in Unicode.Grapheme.Internal.Version

Show UnicodeVersion 
Instance details

Defined in Unicode.Grapheme.Internal.Version

Eq UnicodeVersion 
Instance details

Defined in Unicode.Grapheme.Internal.Version

Ord UnicodeVersion 
Instance details

Defined in Unicode.Grapheme.Internal.Version

Functions

Display

Errors