Commit eb370577 authored by Matthias Winkelmann's avatar Matthias Winkelmann Committed by Chris Waterson
Browse files

Fixed links to evaluation data in makefile (#5402)

parent aec1fec6
...@@ -6,7 +6,9 @@ Mtruk.csv ...@@ -6,7 +6,9 @@ Mtruk.csv
SimLex-999.zip SimLex-999.zip
analogy analogy
fastprep fastprep
myz_naacl13_test_set.tgz *.dSYM
questions-words.txt questions-words.txt
word_relationship.*
tensorflow/
rw.zip rw.zip
ws353simrel.tar.gz ws353simrel.tar.gz
...@@ -155,10 +155,10 @@ You can do some simple exploration using `nearest.py`: ...@@ -155,10 +155,10 @@ You can do some simple exploration using `nearest.py`:
... ...
To evaluate the embeddings using common word similarity and analogy datasets, To evaluate the embeddings using common word similarity and analogy datasets,
use `eval.mk` to retrieve the data sets and build the tools: use `eval.mk` to retrieve the data sets and build the tools. Note that wordsim is currently not compatible with Python 3.x.
make -f eval.mk make -f eval.mk
./wordsim.py -v vocab.txt -e vecs.bin *.ws.tab ./wordsim.py --vocab vocab.txt --embeddings vecs.bin *.ws.tab
./analogy --vocab vocab.txt --embeddings vecs.bin *.an.tab ./analogy --vocab vocab.txt --embeddings vecs.bin *.an.tab
The word similarity evaluation compares the embeddings' estimate of "similarity" The word similarity evaluation compares the embeddings' estimate of "similarity"
......
...@@ -59,9 +59,9 @@ simlex999.ws.tab: SimLex-999.zip ...@@ -59,9 +59,9 @@ simlex999.ws.tab: SimLex-999.zip
mikolov.an.tab: questions-words.txt mikolov.an.tab: questions-words.txt
egrep -v -E '^:' $^ | tr '[A-Z] ' '[a-z]\t' > $@ egrep -v -E '^:' $^ | tr '[A-Z] ' '[a-z]\t' > $@
msr.an.tab: myz_naacl13_test_set.tgz msr.an.tab: word_relationship.questions word_relationship.answers
tar Oxfz $^ test_set/word_relationship.questions | tr ' ' '\t' > /tmp/q cat word_relationship.questions | tr ' ' '\t' > /tmp/q
tar Oxfz $^ test_set/word_relationship.answers | cut -f2 -d ' ' > /tmp/a cat word_relationship.answers | cut -f2 -d ' ' > /tmp/a
paste /tmp/q /tmp/a > $@ paste /tmp/q /tmp/a > $@
rm -f /tmp/q /tmp/a rm -f /tmp/q /tmp/a
...@@ -75,7 +75,7 @@ MEN.tar.gz: ...@@ -75,7 +75,7 @@ MEN.tar.gz:
wget http://clic.cimec.unitn.it/~elia.bruni/resources/MEN.tar.gz wget http://clic.cimec.unitn.it/~elia.bruni/resources/MEN.tar.gz
Mtruk.csv: Mtruk.csv:
wget http://tx.technion.ac.il/~kirar/files/Mtruk.csv wget http://www.kiraradinsky.com/files/Mtruk.csv
rw.zip: rw.zip:
wget http://www-nlp.stanford.edu/~lmthang/morphoNLM/rw.zip wget http://www-nlp.stanford.edu/~lmthang/morphoNLM/rw.zip
...@@ -84,10 +84,13 @@ SimLex-999.zip: ...@@ -84,10 +84,13 @@ SimLex-999.zip:
wget http://www.cl.cam.ac.uk/~fh295/SimLex-999.zip wget http://www.cl.cam.ac.uk/~fh295/SimLex-999.zip
questions-words.txt: questions-words.txt:
wget http://word2vec.googlecode.com/svn/trunk/questions-words.txt wget http://download.tensorflow.org/data/questions-words.txt
myz_naacl13_test_set.tgz: word_relationship.questions:
wget http://research.microsoft.com/en-us/um/people/gzweig/Pubs/myz_naacl13_test_set.tgz wget https://github.com/darshanhegde/SNLPProject/raw/master/word2vec/eval/word_relationship.questions
word_relationship.answers:
wget https://github.com/darshanhegde/SNLPProject/raw/master/word2vec/eval/word_relationship.answers
analogy: analogy.cc analogy: analogy.cc
...@@ -95,4 +98,4 @@ clean: ...@@ -95,4 +98,4 @@ clean:
rm -f *.ws.tab *.an.tab analogy *.pyc rm -f *.ws.tab *.an.tab analogy *.pyc
distclean: clean distclean: clean
rm -f *.tgz *.tar.gz *.zip Mtruk.csv questions-words.txt rm -f *.tgz *.tar.gz *.zip Mtruk.csv questions-words.txt word_relationship.{questions,answers}
File mode changed from 100644 to 100755
...@@ -38,7 +38,7 @@ class Vecs(object): ...@@ -38,7 +38,7 @@ class Vecs(object):
'unexpected file size for binary vector file %s' % rows_filename) 'unexpected file size for binary vector file %s' % rows_filename)
# Memory map the rows. # Memory map the rows.
dim = size / (4 * n) dim = round(size / (4 * n))
rows_mm = mmap.mmap(rows_fh.fileno(), 0, prot=mmap.PROT_READ) rows_mm = mmap.mmap(rows_fh.fileno(), 0, prot=mmap.PROT_READ)
rows = np.matrix( rows = np.matrix(
np.frombuffer(rows_mm, dtype=np.float32).reshape(n, dim)) np.frombuffer(rows_mm, dtype=np.float32).reshape(n, dim))
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment