subst - Greple module for text search and substitution
Version 2.3301
greple -Msubst --dict dictionary [ options ]
Dictionary: --dict dictionary file --dictdata dictionary data Check: --check=[ng,ok,any,outstand,all,none] --select=N --linefold --stat --with-stat --stat-style=[default,dict] --stat-item={match,expect,number,ok,ng,dict}=[0,1] --subst --[no-]warn-overlap --[no-]warn-include File Update: --diff --diffcmd command --create --replace --overwrite
This greple module supports check and substitution of text files based on dictionary data.
Dictionary file is given by --dict option and each line contains matching pattern and expected string pairs.
greple -Msubst --dict DICT
If the dictionary file contains following data:
colou?r color cent(er|re) center
above command finds the first pattern which does not match the second string, that is "colour" and "centre" in this case.
Field // in dictionary data is ignored, so this file can be written like this:
//
colou?r // color cent(er|re) // center
You can use same file by greple's -f option and string after // is ignored as a comment in that case.
greple -f DICT ...
Option --dictdata can be used to provide dictionary data in command line.
greple --dictdata $'colou?r color\ncent(er|re) center\n'
Dictionary entry starting with a sharp sign (#) is a comment and ignored.
#
When the matched string is same or shorter than previously matched string by another pattern, it is simply ignored (--no-warn-include by default). So, if you have to declare conflicted patterns, place the longer pattern earlier.
If the matched string overlaps with previously matched string, it is warned (--warn-overlap by default) and ignored.
This version uses Getopt::EX::termcolor module. It sets option --light-screen or --dark-screen depending on the terminal on which the command run, or TERM_BGCOLOR environment variable.
Some terminals (eg: "Apple_Terminal" or "iTerm") are detected automatically and no action is required. Otherwise set TERM_BGCOLOR environment to #000000 (black) to #FFFFFF (white) digit depending on terminal background color.
Specify dictionary file.
Specify dictionary data by text.
outstand
ng
ok
any
all
none
Option --check takes argument from ng, ok, any, outstand, all and none.
With default value outstand, command will show information about both expected and unexpected words only when unexpected word was found in the same file.
With value ng, command will show information about unexpected words. With value ok, you will get information about expected words. Both with value any.
Value all and none make sense only when used with --stat option, and display information about never matched pattern.
Select Nth entry from the dictionary. Argument is interpreted by Getopt::EX::Numbers module. Range can be defined like --select=1:3,7:9. You can get numbers by --stat option.
1:3,7:9
If the target data is folded in the middle of text, use --linefold option. It creates regex patterns which matches string spread across lines. Substituted text does not include newline, though. Because it confuses regex behavior somewhat, avoid to use if possible.
Print statistical information. Works with --check option.
Option --with-stat print statistics after normal output, while --stat print only statistics.
default
dict
Using --stat-style=dict option with --stat and --check=any, you can get dictionary style output for your working document.
Specify which item is shown up in stat information. Default values are:
match=1 expect=1 number=1 ng=1 ok=1 dict=0
If you don't need to see pattern field, use like this:
--stat-item match=0
Multiple parameters can be set at once:
--stat-item match=number=0,ng=1,ok=1
Substitute unexpected matched pattern to expected string. Newline character in the matched string is ignored. Pattern without replacement string is not changed.
Warn overlapped pattern. Default on.
Warn included pattern. Default off.
Option --diff produce diff output of original and converted text.
Specify diff command name used by --diff option. Default is "diff -u".
Create new file and write the result. Suffix ".new" is appended to original filename.
Replace the target file by converted result. Original file is renamed to backup name with ".bak" suffix.
Overwrite the target file by converted result with no backup.
This module includes example dictionaries. They are installed share directory and accessed by --exdict option.
greple -Msubst --exdict jtca-katakana-guide-3.dict
Use dictionary flie in the distribution as a dictionary file.
Show dictionary directory.
Created from following guideline document.
外来語(カタカナ)表記ガイドライン 第3版 制定:2015年8月 発行:2015年9月 一般財団法人テクニカルコミュニケーター協会 Japan Technical Communicators Association https://www.jtca.org/standardization/katakana_guide_3_20171222.pdf
Customized --jtca-katakana-guide. Original dictionary is automatically generated from published data. This dictionary is customized for practical use.
JTF日本語標準スタイルガイド(翻訳用) 第3.0版 2019年8月20日 一般社団法人 日本翻訳連盟(JTF) 翻訳品質委員会 https://www.jtf.jp/jp/style_guide/pdf/jtf_style_guide.pdf
Customized --jtf-style-guide. Original dictionary is automatically generated from published data. This dictionary is customized for practical use.
Dictionary used for "C/C++ セキュアコーディング 第2版" published in 2014.
https://www.jpcert.or.jp/securecoding_book_2nd.html
Dictionary generated from Microsoft localization style guide.
https://www.microsoft.com/ja-jp/language/styleguides
Data is generated from this article:
https://www.atmarkit.co.jp/news/200807/25/microsoft.html
Customized --ms-style-guide. Original dictionary is automatically generated from published data. This dictionary is customized for practical use.
Amendment dictionary can be found here. Please raise an issue or send a pull-request if you have request to update.
This module is originaly made for Japanese text editing support.
Japanese KATAKANA word have a lot of variants to describe same word, so unification is important but it's quite tiresome work. In the next example,
イ[エー]ハトー?([ヴブボ]ォ?) // イーハトーヴォ
left pattern matches all following words.
イエハトブ イーハトヴ イーハトーヴ イーハトーヴォ イーハトーボ イーハトーブ
This module helps to detect and correct them.
$ cpanm App::Greple::subst
https://github.com/kaz-utashiro/greple
https://github.com/kaz-utashiro/greple-subst
https://github.com/kaz-utashiro/greple-update
https://www.jtca.org/standardization/katakana_guide_3_20171222.pdf
https://www.jtf.jp/jp/style_guide/styleguide_top.html, https://www.jtf.jp/jp/style_guide/pdf/jtf_style_guide.pdf
https://www.microsoft.com/ja-jp/language/styleguides, https://www.atmarkit.co.jp/news/200807/25/microsoft.html
文化庁 国語施策・日本語教育 国語施策情報 内閣告示・内閣訓令 外来語の表記
https://qiita.com/kaz-utashiro/items/85add653a71a7e01c415
イーハトーブ
Kazumasa Utashiro
Copyright 2017-2023 Kazumasa Utashiro.
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
To install App::Greple::subst, copy and paste the appropriate command in to your terminal.
cpanm
cpanm App::Greple::subst
CPAN shell
perl -MCPAN -e shell install App::Greple::subst
For more information on module installation, please visit the detailed CPAN module installation guide.