![]() |
POS tagging. |
Last week, I spent some time trying to learn how to install and use SyntaxNet and Parsey McParseface. The hardest part was following the installation steps mentioned on SyntaxNet's Github repository. Getting the right versions of Java, Bazel, Python, protobuf, asciitree, numpy and swig was definitely exhausting. What's more, on my modest laptop that has 6GB of RAM, the installation ran for well over 6 hours. If you don't want to spend so much time just to experiment with Parsey McParseface, using a pre-built Docker image is the way to go. In this tutorial, I'll show you how.
Prerequisites:
- A 64-bit computer with at least 2 GB of RAM
- The latest version of Docker
- Ubuntu
There could be other SyntaxNet images, but in this tutorial, we'll be pulling an image created by brianlow.
docker pull brianlow/syntaxnet-dockerDepending on the speed of your network connection, you might have to wait for a while because the image is about 1GB.
2. Using SyntaxNet
Now that you have the SyntaxNet image, you need to create a new container using it and run a Bash shell on it. The following command shows you how:
docker run --name mcparseface --rm -i -t brianlow/syntaxnet-docker bashUsing Parsey McParseface directly is a little complicated. Thankfully, it comes with a very handy shell script called demo.sh. All you need to do is pass an English sentence to it.
echo "Bob, a resident of Yorkshire, loves his wife and children" \ | syntaxnet/demo.shThe output of demo.sh is a tree.
Input: Bob , a resident of Yorkshire , loves his wife and children Parse: loves VBZ ROOT +-- Bob NNP nsubj | +-- , , punct | +-- resident NN appos | +-- a DT det | +-- of IN prep | +-- Yorkshire NNP pobj +-- , , punct +-- wife NN dobj +-- his PRP$ poss +-- and CC cc +-- children NNS conjAs you can see, in the tree, each word is associated with tags. For example, the word “loves” has a tag “VBZ”, which means present-tense-third-person-verb. You can also see that Parsey McParseface understands that it is the root of the sentence. You can probably tell that “NN” means noun-singular and “NNS” means noun-plural. Then there are less intuitive tags such as “appos”, which is short for appositional modifier, and “amod”, which is short for adjectival modifier. You can find the meanings of all the tags on UniversalDependencies.
I experimented with a few more complicated sentences, and Parsey McParseface had no trouble parsing them. Here’s one example:
Input: Bob 's wife , a grumpy old woman , asked him to sleep in the barn Parse: asked VBD ROOT +-- wife NN nsubj | +-- Bob NNP poss | | +-- 's POS possessive | +-- , , punct | +-- woman NN appos | +-- a DT det | +-- grumpy JJ amod | +-- old JJ amod +-- him PRP dobj +-- sleep VB xcomp +-- to TO aux +-- in IN prep +-- barn NN pobj +-- the DT detHere’s another example, which is slightly ambiguous:
Input: Say hello to my little friend , Bob Parse: Say VB ROOT +-- hello UH discourse +-- to IN prep +-- friend NN pobj +-- my PRP$ poss +-- little JJ amod +-- , , punct +-- Bob NNP apposThe actual output of the SyntaxNet parser is a CoNLL table. demo.sh passes that table to a Python script called conll2tree to generate the tree. If you are interested in looking at the CoNLL table, all you need to do is comment out the call to conll2tree. Here’s a sample CoNLL table:
The CoNLL format is obviously less intuitive, but is a little easier to work with in a program. For example, I can quickly determine all the nouns present in a sentence using a simple awk program:
awk -F'\t' '$4 == "NOUN" {print $2}' output.conllWith slightly more complex programs, you can determine details such as who the subject is, what adjectives are associated with it, who or what the dative object is, and so on. I hope you are now beginning to understand the significance of this powerful parser.
That’s all for now. Thanks for reading. If you found this tutorial useful, please do share it.
Great article
ReplyDelete"The actual output of the SyntaxNet parser is a CoNLL table."
Would you point me in the right direction of how to get this CoNLL table into a MySQL database.
I want to port the syntaxnet model to android native, but there is no documentation given for this model. In google tensorflow example, they have shared an example of identifying the object using tensorflow library. Similar way i want to port syntaxNet to android. Any help in this regard will be much appreciable.
ReplyDeleteBasic concepts what i understood, tensorflow model need to initialize first with the respective model, define the input size etc. then run the tensorflow session and expect the POS tag list.
SyntaxNet is having lots of python script, (I am new to python). please guide me.