Command line tool to validate and normalize UTF-8 data
Go to file
Leonora Tindall c44f33a056
First commit - working version from RBR
2019-10-19 16:05:52 -04:00
src First commit - working version from RBR 2019-10-19 16:05:52 -04:00
.gitignore First commit - working version from RBR 2019-10-19 16:05:52 -04:00
Cargo.lock First commit - working version from RBR 2019-10-19 16:05:52 -04:00
Cargo.toml First commit - working version from RBR 2019-10-19 16:05:52 -04:00
README First commit - working version from RBR 2019-10-19 16:05:52 -04:00

README

utf8-norm, validate and normalize UTF-8 Unicode data

Version 1.0.0 licensed GPLv3. (C) 2019 Leonora Tindall <nora@nora.codes>
Fast command line Unicode normalization, supporting stream safety transformations as well
as NFC, NFD, NFKD, and NFKC. Exits with failure if the incoming stream is not valid UTF-8.

Usage: utf8-norm [--nfc | --nfd | --nfkc | --nfkd] [--stream-safe] [--crlf] <infile> <outfile>

    <infile> (default stdin) - file from which to read bytes.
    <outfile> (default stdout) - file to which to write normalized Unicode.
    -w, --crlf  - write CRLF (Windows) instead of LF only (Unix) at the end of lines.
    -d, --nfd   - write NFD (canonical decomposition).
    -D, --nfkd  - write NFKD (compatibility decomposition).
    -c, --nfc   - write NFC (canonical composition computed from NFD). This is the default.
    -C, --nfkc  - write NFKC (canonical composition computed from NFC).
    -s, --stream-safe   - write stream-safe bytes (Conjoining Grapheme Joiners, UAX15-D4).
    -V, --version - output version information and exit

utf8-norm was created at Rust Belt Rust 2019 in Dayton, OH. Thanks to @j41manning for her
excellent talk regarding Unicode handling in Rust.

Natively install as `cargo install utf8-norm` or from your distribution's package manager.