Usage

Examples

  1. S = Tag covers Single Word.

  2. B = Tag Begins with the Word.

  3. I = Word is internal to tag which has begun.

  4. E = Tag Ends with the Word.

  5. 0 = Other tags.

 Example:

 (‘Republican’, ‘B-NP’), (‘candidate’, ‘I-NP’), (‘George’, ‘I-NP’),
  (‘Bush’, ‘E-NP’), (‘was’, ‘S-VP’), (‘great’, ‘S-ADJP’), (‘.’, ‘O’)

 means:
[Republican]NP [candidate]NP [a good boy]NP [George]NP [Bush]NP [was]VP
[great]ADJP

Annotator is the only class you need. Create an annotator object.

pntl
| -- tools
     | --class-- Annotator
     | --jar-- stanford-parser
| -- utils
     | --function-- skipgrams

Note

in.parser file consite syntax tree(for now) which is use as input for dependencie parser. One more thing the last runned sentence output only will be stored.

Annotator[class]

>>> from pntl.tools import Annotator
>>> annotator = Annotator(senna_dir = "/home/user/senna", stp_dir = "/home/user/stanford_parser_folder")
>>> # changing senna path at run time is also possible
>>>
>>> annotator.senna_dir = "/home/user/senna"
>>> annotator.senna_dir# return path name
  "/home/user/senna"
>>> annotator.stp_dir = "/home/user/stanford_parser_folder"# stanfordparser.jar must present inside it.
>>> annotator.java_cli
  java -cp stanford-parser.jar edu.stanford.nlp.trees.EnglishGrammaticalStructure -treeFile in.parse -collapsed
>>>
>>> annotator.java_cli = "java -cp stanford-parser.jar edu.stanford.nlp.trees.EnglishGrammaticalStructure -treeFile in.parse"
>>> # setting the cli

Self-testing

Warning

This function is depercated

To test for your self please use function test()

>>> from pntl.tools import test
>>> test(senna_path=“/home/user/senna”,
         stp_dir=“/home/user/stanford_parser_folder/stanford_parser.jar”)# input the
         location of senna file, if the senna is present the follwing output is printed
  conll:
   He        PRP                -       S-A0        S-A0        S-A0
          created        VBD          created        S-V           O           O
              the         DT                -       B-A1           O           O
            robot         NN                -       E-A1           O           O
              and         CC                -          O           O           O
            broke        VBD            broke          O         S-V           O
               it        PRP                -          O        S-A1           O
            after         IN                -          O    B-AM-TMP           O
           making        VBG           making          O    I-AM-TMP         S-V
              it.        PRP                -          O    E-AM-TMP        S-A1
  dep_parse:
   nsubj(created-2, He-1)
  root(ROOT-0, created-2)
  det(robot-4, the-3)
  dobj(created-2, robot-4)
  conj_and(created-2, broke-6)
  dobj(broke-6, it-7)
  prepc_after(broke-6, making-9)
  dobj(making-9, it.-10)
  chunk:
   [('He', 'S-NP'), ('created', 'S-VP'), ('the', 'B-NP'), ('robot', 'E-NP'), ('and', 'O'), ('broke', 'S-VP'), ('it', 'S-NP'), ('after', 'S-PP'), ('making', 'S-VP'), ('it.', 'S-NP')]
  pos:
   [('He', 'PRP'), ('created', 'VBD'), ('the', 'DT'), ('robot', 'NN'), ('and', 'CC'), ('broke', 'VBD'), ('it', 'PRP'), ('after', 'IN'), ('making', 'VBG'), ('it.', 'PRP')]
  ner:
   [('He', 'O'), ('created', 'O'), ('the', 'O'), ('robot', 'O'), ('and', 'O'), ('broke', 'O'), ('it', 'O'), ('after', 'O'), ('making', 'O'), ('it.', 'O')]
  srl:
   [{'A1': 'the robot', 'V': 'created', 'A0': 'He'}, {'A1': 'it', 'AM-TMP': 'after making it.', 'V': 'broke', 'A0': 'He'}, {'A1': 'it.', 'V': 'making', 'A0': 'He'}]
  syntax tree:
   (S1(S(NP(PRP He))(VP(VP(VBD created)(NP(DT the)(NN robot)))(CC and)(VP(VBD broke)(NP(PRP it))(PP(IN after)(S(VP(VBG making)(NP(PRP it.)))))))))
  words:
   ['He', 'created', 'the', 'robot', 'and', 'broke', 'it', 'after', 'making', 'it.']
  skip gram
   [('He', 'created', 'the'), ('He', 'created', 'robot'), ('He', 'created', 'and'),
   ('He', 'the', 'robot'), ('He', 'the', 'and'), ('He', 'robot', 'and'), ('created', 'the', 'robot'),
   ('created', 'the', 'and'), ('created', 'the', 'broke'), ('created', 'robot', 'and'),
    ('created', 'robot', 'broke'), ('created', 'and', 'broke'), ('the', 'robot', 'and'),
    ('the', 'robot', 'broke'), ('the', 'robot', 'it'), ('the', 'and', 'broke'),
    ('the', 'and', 'it'), ('the', 'broke', 'it'), ('robot', 'and', 'broke'),
    'broke', 'it'), ('robot', 'broke', 'after'), ('robot', 'it', 'after'),
    ('and', 'broke', 'it'), ('and', 'broke', 'after'), ('and', 'broke', 'making'),
    ('and', 'it', 'after'), ('and', 'it', 'making'), ('and', 'after', 'making'),
    ('broke', 'it', 'after'), ('broke', 'it', 'making'), ('broke', 'it', 'it.'),
    ('broke', 'after', 'making'), ('broke', 'after', 'it.'), ('broke', 'making', 'it.'),
    ('it', 'after', 'making'), ('it', 'after', 'it.'), ('it', 'making', 'it.'), ('after', 'making', 'it.')]

Note

Run the depParser.sh for English PCFG parser on one or more files, printing trees only.

Warning

If on encournter this type of error meaning (Unable to resolve “edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz” as either class path, filename or URL) then you should have CoreNLP(Stanford).

Using Function get_annoations(sentence) returns a dictionary of annotations.

>>> annotator.get_annoations("There are people dying make this world a better place for you and for me.")
   {'dep_parse': '',
   'chunk': [('There', 'S-NP'), ('are', 'S-VP'), ('people', 'S-NP'), ('dying', 'B-VP'), ('make', 'E-VP'), ('this', 'B-NP'), ('world', 'E-NP'), ('a', 'B-NP'), ('better', 'I-NP'), ('place', 'E-NP'), ('for', 'S-PP'), ('you', 'S-NP'), ('and', 'O'), ('for', 'S-PP'), ('me.', 'S-NP')],
   'pos': [('There', 'EX'), ('are', 'VBP'), ('people', 'NNS'), ('dying', 'VBG'), ('make', 'VB'), ('this', 'DT'), ('world', 'NN'), ('a', 'DT'), ('better', 'JJR'), ('place', 'NN'), ('for', 'IN'), ('you', 'PRP'), ('and', 'CC'), ('for', 'IN'), ('me.', '.')],
   'srl': [{'A1': 'people', 'V': 'dying'},
   {'A1': 'people  this world', 'A2': 'a better place for you and for me.', 'V': 'make'}],
    'syntax_tree': '(S1(S(NP(EX There))(VP(VBP are)(NP(NP(NNS people))(SBAR(S(VBG dying)(VP(VB make)(S(NP(DT this)(NN world))(NP(DT a)(JJR better)(NN place)))(PP(PP(IN for)(NP(PRP you)))(CC and)(PP(IN for)(NP(. me.)))))))))))',
    'verbs': ['dying', 'make'],
   'words': ['There', 'are', 'people', 'dying', 'make', 'this', 'world', 'a', 'better', 'place', 'for', 'you', 'and', 'for', 'me.'], \\
   'ner': [('There', 'O'), ('are', 'O'),
   ('people', 'O'), ('dying', 'O'), ('make', 'O'), ('this', 'O'), ('world', 'O'), ('a', 'O'), ('better', 'O'), ('place', 'O'), ('for', 'O'), ('you', 'O'), ('and', 'O'), ('for', 'O'), ('me.', 'O')]}

Using Function get_annoations(sentence,dep_parse=True) returns a dictionary of annotations with dependency parse, by default it is switched off.

>>> annotator.get_annoations("There are people dying make this world a better place for you and for me.",dep_parse=True)
    {'dep_parse': 'expl(are-2, There-1)\nroot(ROOT-0, are-2)\nnsubj(are-2, people-3)\ndep(make-5, dying-4)\nrcmod(people-3, make-5)\ndet(world-7, this-6)\nnsubj(place-10, world-7)\ndet(place-10, a-8)\namod(place-10, better-9)\nxcomp(make-5, place-10)\nprep_for(make-5, you-12)\nconj_and(you-12, me.-15)',
    'chunk': [('There', 'S-NP'), ('are', 'S-VP'), ('people', 'S-NP'),
     ('dying', 'B-VP'), ('make', 'E-VP'), ('this', 'B-NP'), ('world', 'E-NP'), ('a', 'B-NP'), ('better', 'I-NP'), ('place', 'E-NP'), ('for', 'S-PP'), ('you', 'S-NP'), ('and', 'O'), ('for', 'S-PP'), ('me.', 'S-NP')],
      'pos': [('There', 'EX'), ('are', 'VBP'),
      ('people', 'NNS'), ('dying', 'VBG'), ('make', 'VB'), ('this', 'DT'), ('world', 'NN'), ('a', 'DT'), ('better', 'JJR'), ('place', 'NN'), ('for', 'IN'), ('you', 'PRP'), ('and', 'CC'), ('for', 'IN'), ('me.', '.')], 'srl': [{'A1': 'people', 'V': 'dying'},\
      {'A1': 'people  this world', 'A2': 'a better place for you and for me.', 'V': 'make'}],
       'syntax_tree': '(S1(S(NP(EX There))(VP(VBP are)(NP(NP(NNS people))(SBAR(S(VBG dying)(VP(VB make)(S(NP(DT this)(NN world))(NP(DT a)(JJR better)(NN place)))(PP(PP(IN for)(NP(PRP you)))(CC and)(PP(IN for)(NP(. me.)))))))))))',
       'verbs': ['dying', 'make'],
       'words': ['There', 'are', 'people', 'dying', 'make', 'this', 'world', 'a', 'better', 'place', 'for', 'you', 'and', 'for', 'me.'], 'ner': [('There', 'O'), ('are', 'O'), ('people', 'O'), ('dying', 'O'), ('make', 'O'), ('this', 'O'), ('world', 'O'), ('a', 'O'), ('better', 'O'), ('place', 'O'), ('for', 'O'), ('you', 'O'), ('and', 'O'), ('for', 'O'), ('me.', 'O')]}

You can access individual componets as:

>>> annotator.get_annoations("Jawahar is a good boy.")['pos']
  [('Jawahar', 'NNP'), ('is', 'VBZ'), ('a', 'DT'), ('good', 'JJ'), ('boy', 'NN'), ('.', '.')]
>>> annotator.get_annoations("Jawahar is a good boy.")['ner']
  [('Jawahar', 'S-PER'), ('is', 'O'), ('a', 'O'), ('good', 'O'), ('boy', 'O'), ('.', 'O')]
>>> annotator.get_annoations("Jawahar is a good boy.")['chunk']
  [('Jawahar', 'S-NP'), ('is', 'S-VP'), ('a', 'B-NP'), ('good', 'I-NP'), ('boy', 'E-NP'), ('.', 'O')]

To list the verbs for which semantic roles are found.

>>> annotator.get_annoations("He created the robot and broke it after making it.")['verbs']
   ['created', 'broke', 'making']

‘srl’ Returns a list of dictionaries, identifyinging sematic roles for various verbs in sentence.

>>> annotator.get_annoations("He created the robot and broke it after making it.")['srl']
    [{'A1': 'the robot', 'A0': 'He', 'V': 'created'}, {'A1': 'it', 'A0': 'He', 'AM-TMP': 'after making it.', 'V': 'broke'}, {'A1': 'it.', 'A0': 'He', 'V': 'making'}]

‘syntax_tree’ Returns syntax tree in penn Tree Bank Format.

>>> annotator.get_annoations("He created the robot and broke it after making it.")['syntax_tree']
    '(S1(S(NP(PRP He))(VP(VP(VBD created)(NP(DT the)(NN robot)))(CC and)(VP(VBD broke)(NP(PRP it))(PP(IN after)(S(VP(VBG making)(NP(PRP it.)))))))))'

Note

‘dep_parse’ Returns dependency Relations as a string. Each relation is in new line. You may require some post processing on this.

Note

dep_parse may not work properly if stanford dependency parser is not present in practnlptools folder. To change in the output format from edit `lexparser.sh`(self testing only) if you know what you are doing.

To know about outputformat see the Stanford Parser FAQ link and manuall link.

>>> annotator.get_annoations("He created the robot and broke it after making it.",dep_parse=True)['dep_parse']
    nsubj(created-2, He-1)
    root(ROOT-0, created-2)
    det(robot-4, the-3)
    dobj(created-2, robot-4)
    conj_and(created-2, broke-6)
    dobj(broke-6, it-7)
    prepc_after(broke-6, making-9)
    dobj(making-9, it.-10)

Note: For illustration purposes we have used:

>>> annotator.get_annoations("He created the robot and broke it after making it.",dep_parse=True)['dep_parse']

Better method is:

>>> annotation=annotator.get_annoations("He created the robot and broke it after making it.",dep_parse=True)
>>>ner=annotation['ner']
>>>srl=annotation['srl']

get_conll_format( sentence, options=‘-srl -pos -ner -chk -psg’)

This function used to return CoNLL format that is return by the SENNA tool in its process. The option= should be in string format which is converted as list() and passed into the lower communication for shell.

>>> annotator.get_conll_format("He created the robot and broke it after making it.", options='-srl -pos')
He         PRP                -       S-A0        S-A0        S-A0
        created        VBD          created        S-V           O           O
            the         DT                -       B-A1           O           O
          robot         NN                -       E-A1           O           O
            and         CC                -          O           O           O
          broke        VBD            broke          O         S-V           O
             it        PRP                -          O        S-A1           O
          after         IN                -          O    B-AM-TMP           O
         making        VBG           making          O    I-AM-TMP         S-V
            it.        PRP                -          O    E-AM-TMP        S-A1

to get help for this function use the class method help_conll_format() >Annotator.help_conll_format() # pnlt.utils.skipgrams(sentence, n=2, k=1) n = is the value for n-grams k = skip value skipgrams() returns the output in genetator form for better memory management. .. code:: python

>>> from pntl.utils import skipgrams
>>> sent = "He created the robot and broke it after making it."
>>> #return generators
>>> list(skipgrams(sent.split(), n=3, k=2))
[('He', 'created', 'the'), ('He', 'created', 'robot'), ('He', 'created', 'and'),
 ('He', 'the', 'robot'), ('He', 'the', 'and'),
 ('He', 'robot', 'and'),
  ('created', 'the', 'robot'), ('created', 'the', 'and'),
   ('created', 'the', 'broke'), ('created', 'robot', 'and'), ('created', 'robot', 'broke'), ('created', 'and', 'broke'),
 ('the', 'robot', 'and'), ('the', 'robot', 'broke'), ('the', 'robot', 'it'), ('the', 'and', 'broke'),
 ('the', 'and', 'it'), ('the', 'broke', 'it'), ('robot', 'and', 'broke'), ('robot', 'and', 'it'),
  ('robot', 'and', 'after'), ('robot', 'broke', 'it'), ('robot', 'broke', 'after'),
  ('robot', 'it', 'after'), ('and', 'broke', 'it'), ('and', 'broke', 'after'),
   ('and', 'broke', 'making'), ('and', 'it', 'after'), ('and', 'it', 'making'),
   ('and', 'after', 'making'),
  ('broke', 'it', 'after'), ('broke', 'it', 'making'),
  ('broke', 'it', 'it.'),
   ('broke', 'after', 'making'), ('broke', 'after', 'it.'), ('broke', 'making', 'it.'),
   ('it', 'after', 'making'),
   ('it', 'after', 'it.'), ('it', 'making', 'it.'), ('after', 'making', 'it.')]