gitea/vendor/github.com/rivo/uniseg
6543 d1353e1f7c
Vendor Update (#14496)
* update code.gitea.io/sdk/gitea v0.13.1 -> v0.13.2

* update github.com/go-swagger/go-swagger v0.25.0 -> v0.26.0

* update github.com/google/uuid v1.1.2 -> v1.2.0

* update github.com/klauspost/compress v1.11.3 -> v1.11.7

* update github.com/lib/pq 083382b7e6fc -> v1.9.0

* update github.com/markbates/goth v1.65.0 -> v1.66.1

* update github.com/mattn/go-sqlite3 v1.14.4 -> v1.14.6

* update github.com/mgechev/revive 246eac737dc7 -> v1.0.3

* update github.com/minio/minio-go/v7 v7.0.6 -> v7.0.7

* update github.com/niklasfasching/go-org v1.3.2 -> v1.4.0

* update github.com/olivere/elastic/v7 v7.0.21 -> v7.0.22

* update github.com/pquerna/otp v1.2.0 -> v1.3.0

* update github.com/xanzy/go-gitlab v0.39.0 -> v0.42.0

* update github.com/yuin/goldmark v1.2.1 -> v1.3.1
2021-01-28 17:56:38 +01:00
..
doc.go Vendor Update (#14496) 2021-01-28 17:56:38 +01:00
go.mod Vendor Update (#14496) 2021-01-28 17:56:38 +01:00
grapheme.go Vendor Update (#14496) 2021-01-28 17:56:38 +01:00
LICENSE.txt Vendor Update (#14496) 2021-01-28 17:56:38 +01:00
properties.go Vendor Update (#14496) 2021-01-28 17:56:38 +01:00
README.md Vendor Update (#14496) 2021-01-28 17:56:38 +01:00

Unicode Text Segmentation for Go

Godoc Reference Go Report

This Go package implements Unicode Text Segmentation according to Unicode Standard Annex #29 (Unicode version 12.0.0).

At this point, only the determination of grapheme cluster boundaries is implemented.

Background

In Go, strings are read-only slices of bytes. They can be turned into Unicode code points using the for loop or by casting: []rune(str). However, multiple code points may be combined into one user-perceived character or what the Unicode specification calls "grapheme cluster". Here are some examples:

String Bytes (UTF-8) Code points (runes) Grapheme clusters
Käse 6 bytes: 4b 61 cc 88 73 65 5 code points: 4b 61 308 73 65 4 clusters: [4b],[61 308],[73],[65]
🏳️‍🌈 14 bytes: f0 9f 8f b3 ef b8 8f e2 80 8d f0 9f 8c 88 4 code points: 1f3f3 fe0f 200d 1f308 1 cluster: [1f3f3 fe0f 200d 1f308]
🇩🇪 8 bytes: f0 9f 87 a9 f0 9f 87 aa 2 code points: 1f1e9 1f1ea 1 cluster: [1f1e9 1f1ea]

This package provides a tool to iterate over these grapheme clusters. This may be used to determine the number of user-perceived characters, to split strings in their intended places, or to extract individual characters which form a unit.

Installation

go get github.com/rivo/uniseg

Basic Example

package uniseg

import (
	"fmt"

	"github.com/rivo/uniseg"
)

func main() {
	gr := uniseg.NewGraphemes("👍🏼!")
	for gr.Next() {
		fmt.Printf("%x ", gr.Runes())
	}
	// Output: [1f44d 1f3fc] [21]
}

Documentation

Refer to https://godoc.org/github.com/rivo/uniseg for the package's documentation.

Dependencies

This package does not depend on any packages outside the standard library.

Your Feedback

Add your issue here on GitHub. Feel free to get in touch if you have any questions.

Version

Version tags will be introduced once Golang modules are official. Consider this version 0.1.