make: /home/master/98/r98922053/srilm-1.5.10/sbin/machine-type: Command not found
make: /sbin/machine-type: Command not found
matherr.c:18:16: warning: 'struct exception' declared inside parameter list will not be visible outside of this definition or declaration matherr(struct exception *x) ^~~~~~~~~ matherr.c: In function 'matherr': matherr.c:21:10: error: dereferencing pointer to incomplete type 'struct exception' if (x->type == SING && strcmp(x->name, "log10") == 0) { ^~ matherr.c:21:20: error: 'SING' undeclared (first use in this function) if (x->type == SING && strcmp(x->name, "log10") == 0) { ^~~~ matherr.c:21:20: note: each undeclared identifier is reported only once for each function it appears in matherr.c:29:1: warning: control reaches end of non-void function [-Wreturn-type] } ^ make[2]: *** [/home/master/98/r98922053/srilm-1.5.10/common/Makefile.common.targets:85: ../obj/i686-m64/matherr.o] Error 1 make[2]: Leaving directory '/home/master/98/r98922053/srilm-1.5.10/lm/src' make[1]: *** [Makefile:77: release-libraries] Error 1 make[1]: Leaving directory '/home/master/98/r98922053/srilm-1.5.10' make: *** [Makefile:51: World] Error 2
11 warnings and 1 error genertated. make[2]: *** [../obj/macosx/LatticeIndex.o] Error 1 make[1]: *** [release-libraries] Error 1 make: *** [World] Error 2
/usr/include/features.h:367:25 fatal error: sys/cdefs.h no such file or directory
-v
參數,完整用法為
docker run -it -v /home/r12345678/dsp_hw3:/root/dsp_hw3 ntudsp2020autumn/srilm如此一來便會將
/home/r12345678/dsp_hw3
資料夾掛載進 docker container 的
/root/dsp_hw3
docker ps
指令可以看到如下圖的情形
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 6b0238b241f4 ntudsp2020autumn/srilm "/bin/bash" 8 seconds ago Exited (0) 4 seconds ago practical_fermat此時使用
# Option 1: docker COMMAND practical_fermat # Option 2: docker COMMAND 6b0238b241f4兩種之一皆可對此 container 進行
COMMAND
(如 start
、attach
等)的操作。--name
參數,例如
docker run -it -v /home/r12345678/dsp_hw3:/root/dsp_hw3 --name my_new_name \ ntudsp2020autumn/srilm即可命名為
my_new_name
。# 將 container 內 /path/to/file_to_be_copied 的複製出來到 /path/to/host_file docker cp CONTAINER_ID:/path/to/file_to_be_copied /path/to/host_file # 將 container 外 /path/to/out_file 的複製進 docker 的 /path/to/docker_file docker cp /path/to/out_file CONTAINER_ID:/path/to/docker_file
ㄅ 八 匕 卜 不 卞 巴 比 丙 包 ... 八 八 匕 匕 不 不 ... ... ㄆ 仆 匹 片 丕 叵 平 扒 扑 疋 ... 仆 仆 匹 匹 片 片 ... ... ㄦ 二 而 耳 兒 洱 貳 爾 餌 邇 ... 二 二 而 而 兒 兒 ... ...注意:實作上,列的順序不一定要按照字典排序(不用ㄅㄆㄇ……,可以ㄧㄅㄍㄒ……)。
'\t'
),而每個對應的字必須用一個空格隔開'big5-hkscs'
作為 encoding。
python
預設出來的也會是 Python 3。iconv -f big5 $filename
觀看,檢查方法:在 terminal 打 file ZhuYin-Big5.map,只要是 ISO 開頭的應該就沒問題。
SRIPATH ?= /root/srilm-1.5.10 MACHINE_TYPE ?= i686-m64 CXX = g++ CXXFLAGS = -O3 -I$(SRIPATH)/include -w --std=c++11 vpath lib%.a $(SRIPATH)/lib/$(MACHINE_TYPE) TARGET = ngram_test SRC = ngram_test.cpp OBJ = $(SRC:.cpp=.o) .PHONY: all clean all: $(TARGET) $(TARGET): $(OBJ) -loolm -ldstruct -lmisc $(CXX) $(LDFLAGS) -o $@ $^ %.o: %.cpp $(CXX) $(CXXFLAGS) -c $< clean: $(RM) $(OBJ) $(TARGET)source code ngram_test.cpp 為:
#include <stdio.h> #include "Ngram.h"如此就可以利用 lm.wordProb 來得到 language model 的機率。
int main(int argc, char *argv[]) {
int ngram_order = 3; Vocab voc; Ngram lm( voc, ngram_order ); { const char lm_filename[] = "./corpus.lm"; File lmFile( lm_filename, "r" ); lm.read(lmFile); lmFile.close(); } VocabIndex wid = voc.getIndex("囧"); if(wid == Vocab_None) { printf("No word with wid = %d\n", wid); printf("where Vocab_None is %d\n", Vocab_None); } wid = voc.getIndex("患者"); VocabIndex context[] = {voc.getIndex("癮") , voc.getIndex("毒"), Vocab_None}; printf("log Prob(患者|毒-癮) = %f\n", lm.wordProb(wid, context)); }
VocabIndex wid = voc.getIndex("囧"); if(wid == Vocab_None) { // replace OOV with <unk> wid = voc.getIndex(Vocab_Unknown); } // do something ...
ngram-count
時出現
warning: discount coeff 1 is out of range: -0或者在執行
disambig
時出現
corpus_LM.txt: line 10: warning: non-zero probability for <unk> in closed-vocabulary LM(這條是因為我們用了 <unk> 而我們確實希望出現 OOV 時輸出 <unk>) 這兩條警告訊息忽略即可。
disambig
之前,有沒有記得 test_data 也和 training corpus 一樣經過 separator_big5.pl
斷成字。
example_ans.txt
和投影片說好的輸出格式怎麼不一樣?<s> 讓 他 十 分 害 怕 </s> <s> 只 希 望 自 己 明 年 度 別 再 這 麼 苦 命 了 </s> <s> 演 藝 娛 樂 產 業 加 入 積 極 轉 型 提 升 競 爭 力 </s> <s> 明 天 就 是 年 </s> <s> 台 灣 將 正 式 加 入 世 界 貿 易 組 織 </s> <s> 因 應 全 球 化 市 場 國 際 競 爭 </s> ...的格式才是正確的。為了方便同學比較,一個格式正確的 sample 提供在這邊。不過同學 disambig example.txt 出來的結果並不一定會完全和 example_sample.txt 相同(因為 LM 的訓練語料等因素會造成模型的差異)請各位留意。
$1 segemented file to be decoded $2 ZhuYin-Big5 mapping $3 language model $4 output file
./mydisambig_trigram $1 $2 $3 $4也就是跟 bigram 一樣的格式。
./mydisambig $1 $2 $3 $4 --tri指令 decode。
setup.sh
不會
work?如果執行出現 error [: -v: unary operator expected
錯誤,原因可能是 bash
版本不同所致,可以更改 setup.sh
如下:
原本是
if [ ! -v SRILM_BIN_PATH ]; then export SRILM_BIN_PATH="$SRILM_REP_PATH/bin/$MACHINE_TYPE" export PATH="$SRILM_BIN_PATH:$PATH" export PS1="(srilm) $PS1" fi
更改為
set_var() {
export SRILM_BIN_PATH="$SRILM_REP_PATH/bin/$MACHINE_TYPE"
export PATH="$SRILM_BIN_PATH:$PATH"
export PS1="(srilm) $PS1"
}
not_set_var () { :;}
[[ $SRILM_BIN_PATH && ${SRILM_BIN_PATH-x} ]] && not_set_var || set_var
這是因為 -v
的語法不為某些舊版的 bash 接受。不過如果是用 docker 的同學應該不會遇到這樣的問題。
有些同學會遇到這種 error
關於這個問題,可能的原因有
可以的,STD library 包括
<map>
、<vector>
、<algorithm>
基本上 compile 不會 error 都可以。
如果同學用 C++ 進行 cout、cerr 之類的卻因為亂碼而看不懂,可以用
./mydisambig ... | iconv -f big5
加上後面的 | iconv -f big5
來正確轉碼,或者檔案存下來後用以往 FAQ 的作法轉碼。要注意的是如果 source code 中有中文字切記要用
Big5-HKSCS 存。
這部分原則上跟具體實作並沒有太大的關係,但是以防同學因為使用任何編輯器存檔時改到 encoding 造成結果壞掉還是提醒一下。原則上我們是以 Big5-HKSCS 為準。
testError.cc
有一些使用 library 很方便的提示。真的還是不行的話,可以上網查 documentations。?=
是什麼?
=> SRILM = /root/srilm-1.5.10
更多請詳見 Q3
以下 SRILM 路徑都以此路徑舉例說明
NO_TCL = X
TCL_INCLUDE =
TCL_LIBRARY =
修改後執行 make MACHINE_TYPE=i686-m64 World
安裝好後, 可在 bin/i686-m64 中找到 SRILM 的執行檔
line 13: 改為 CC = /usr/bin/gcc $(GCC_FLAGS) -Wimplicit-int
line 14: 改為 CXX = /usr/bin/g++ $(GCC_FLAGS) -DINSTANTIATE_TEMPLATES
line 49, 50: 註解掉
MACHINE_TYPE = macosx
CXX = /usr/bin/g++