Welcome to practNLPTools-lite’s documentation!

Contents:

practNLPTools-lite

This project is a fork of biplab-iitb

Warning

CLI is only for example purpose don’t use for long running jobs.

Get the very old code in devbranch or prior stable version oldVersion.

Author python_version HitCount

Build Status - this built might take you to practNLPTools which is testing ground for this repository so don’t worry.

Practical Natural Language Processing Tools for Humans. practNLPTools is a pythonic library over SENNA and Stanford Dependency Extractor.

name

status

PyPi

pypi status

travis

travis status

Documentation

Documentation Status

dependency

Updates

blocker Pyupbot

Python 3

FOSSA

FOSSA Status

Note

After version 0.3.0+ pntl should able to store the result into database for later usage if needed by installing below dependency.

pip install git+https://github.com/jawahar273/snowbase.git

QuickStart

Downlarding Stanford Parser JAR

To downlard the stanford-parser from github automatically and placing them inside the install direction.

pntl -I true
# downlards required file from github.

Running Predefine Examples Sentences

To run predefine example in batch mode(which has more than one list of examples).

pntl -SE home/user/senna -B true

Example

Batch mode means listed sentences.

..code:

# Example structure for predefine
# Sentences in the code.

sentences = [
    "This is line 1",
    "This is line 2",

]

To run predefine example in non batch mode.

pntl -SE home/user/senna

Running user given sentence

To run user given example using -S is

pntl -SE home/user/senna -S 'I am gonna make him an offer he can not refuse.'

Functionality

  • Semantic Role Labeling.

  • Syntactic Parsing.

  • Part of Speech Tagging (POS Tagging).

  • Named Entity Recognisation (NER).

  • Dependency Parsing.

  • Shallow Chunking.

  • Skip-gram(in-case).

  • find the senna path if is install in the system.

  • stanford parser and depPaser file into installed direction.

Future work

  • tag2file(new)

  • creating depParser for corresponding os environment

  • custome input format for stanford parser insted of tree format

Features

  1. Fast: SENNA is written is C. So it is Fast.

  2. We use only dependency Extractor Component of Stanford Parser, which takes in Syntactic Parse from SENNA and applies dependency Extraction. So there is no need to load parsing models for Stanford Parser, which takes time.

  3. Easy to use.

  4. Platform Supported - Windows, Linux and Mac

  5. Automatic finds stanford parsing jar if it is present in install path[pntl].

Note

SENNA pipeline has a fixed maximum size of the sentences that it can read. By default it is 1024 token/sentence. If you have larger sentences, changing the MAX_SENTENCE_SIZE value in SENNA_main.c should beconsidered and your system specific binary should be rebuilt. Otherwise this could introduce misalignment errors.

Installation

Requires:

A computer with 500mb memory, Java Runtime Environment (1.7 preferably, works with 1.6 too, but didnt test.) installed and python.

Linux:

run:

sudo python setup.py install

windows:

run this commands as administrator:

python setup.py install

Bench Mark comparsion

By using the time command in ubuntu on running the testsrl.py on this link and along with tools.py on pntl

pntl

NLTK-senna

at fist run

real 0m1.674s

real 0m2.484s

user 0m1.564s

user 0m1.868s

sys 0m0.228s

sys 0m0.524s

at second run

real 0m1.245s

real 0m3.359s

user 0m1.560s

user 0m2.016s

sys 0m0.152s

sys 0m1.168s

Note

This benchmark may diffrent from system to sytem. The result produced here is from ububtu 4Gb RAM and i3 process.

Credits

This package was created with Cookiecutter and the audreyr/cookiecutter-pypackage project template.

Installation

Stable release

To install practNLPTools-lite, run this command in your terminal:

$ pip install pntl

This is the preferred method to install practNLPTools-lite, as it will always install the most recent stable release.

If you don’t have pip installed, this Python installation guide can guide you through the process.

From sources

The sources for practNLPTools-lite can be downloaded from the Github repo.

You can either clone the public repository:

$ git clone git://github.com/jawahar273/practNLPTools-lite

Or download the tarball:

$ curl  -OL https://github.com/jawahar273/practNLPTools-lite/tarball/master

Once you have a copy of the source, you can install it with:

$ python setup.py install

Usage

Examples

  1. S = Tag covers Single Word.

  2. B = Tag Begins with the Word.

  3. I = Word is internal to tag which has begun.

  4. E = Tag Ends with the Word.

  5. 0 = Other tags.

 Example:

 (‘Republican’, ‘B-NP’), (‘candidate’, ‘I-NP’), (‘George’, ‘I-NP’),
  (‘Bush’, ‘E-NP’), (‘was’, ‘S-VP’), (‘great’, ‘S-ADJP’), (‘.’, ‘O’)

 means:
[Republican]NP [candidate]NP [a good boy]NP [George]NP [Bush]NP [was]VP
[great]ADJP

Annotator is the only class you need. Create an annotator object.

pntl
| -- tools
     | --class-- Annotator
     | --jar-- stanford-parser
| -- utils
     | --function-- skipgrams

Note

in.parser file consite syntax tree(for now) which is use as input for dependencie parser. One more thing the last runned sentence output only will be stored.

Annotator[class]

>>> from pntl.tools import Annotator
>>> annotator = Annotator(senna_dir = "/home/user/senna", stp_dir = "/home/user/stanford_parser_folder")
>>> # changing senna path at run time is also possible
>>>
>>> annotator.senna_dir = "/home/user/senna"
>>> annotator.senna_dir# return path name
  "/home/user/senna"
>>> annotator.stp_dir = "/home/user/stanford_parser_folder"# stanfordparser.jar must present inside it.
>>> annotator.java_cli
  java -cp stanford-parser.jar edu.stanford.nlp.trees.EnglishGrammaticalStructure -treeFile in.parse -collapsed
>>>
>>> annotator.java_cli = "java -cp stanford-parser.jar edu.stanford.nlp.trees.EnglishGrammaticalStructure -treeFile in.parse"
>>> # setting the cli

Self-testing

Warning

This function is depercated

To test for your self please use function test()

>>> from pntl.tools import test
>>> test(senna_path=“/home/user/senna”,
         stp_dir=“/home/user/stanford_parser_folder/stanford_parser.jar”)# input the
         location of senna file, if the senna is present the follwing output is printed
  conll:
   He        PRP                -       S-A0        S-A0        S-A0
          created        VBD          created        S-V           O           O
              the         DT                -       B-A1           O           O
            robot         NN                -       E-A1           O           O
              and         CC                -          O           O           O
            broke        VBD            broke          O         S-V           O
               it        PRP                -          O        S-A1           O
            after         IN                -          O    B-AM-TMP           O
           making        VBG           making          O    I-AM-TMP         S-V
              it.        PRP                -          O    E-AM-TMP        S-A1
  dep_parse:
   nsubj(created-2, He-1)
  root(ROOT-0, created-2)
  det(robot-4, the-3)
  dobj(created-2, robot-4)
  conj_and(created-2, broke-6)
  dobj(broke-6, it-7)
  prepc_after(broke-6, making-9)
  dobj(making-9, it.-10)
  chunk:
   [('He', 'S-NP'), ('created', 'S-VP'), ('the', 'B-NP'), ('robot', 'E-NP'), ('and', 'O'), ('broke', 'S-VP'), ('it', 'S-NP'), ('after', 'S-PP'), ('making', 'S-VP'), ('it.', 'S-NP')]
  pos:
   [('He', 'PRP'), ('created', 'VBD'), ('the', 'DT'), ('robot', 'NN'), ('and', 'CC'), ('broke', 'VBD'), ('it', 'PRP'), ('after', 'IN'), ('making', 'VBG'), ('it.', 'PRP')]
  ner:
   [('He', 'O'), ('created', 'O'), ('the', 'O'), ('robot', 'O'), ('and', 'O'), ('broke', 'O'), ('it', 'O'), ('after', 'O'), ('making', 'O'), ('it.', 'O')]
  srl:
   [{'A1': 'the robot', 'V': 'created', 'A0': 'He'}, {'A1': 'it', 'AM-TMP': 'after making it.', 'V': 'broke', 'A0': 'He'}, {'A1': 'it.', 'V': 'making', 'A0': 'He'}]
  syntax tree:
   (S1(S(NP(PRP He))(VP(VP(VBD created)(NP(DT the)(NN robot)))(CC and)(VP(VBD broke)(NP(PRP it))(PP(IN after)(S(VP(VBG making)(NP(PRP it.)))))))))
  words:
   ['He', 'created', 'the', 'robot', 'and', 'broke', 'it', 'after', 'making', 'it.']
  skip gram
   [('He', 'created', 'the'), ('He', 'created', 'robot'), ('He', 'created', 'and'),
   ('He', 'the', 'robot'), ('He', 'the', 'and'), ('He', 'robot', 'and'), ('created', 'the', 'robot'),
   ('created', 'the', 'and'), ('created', 'the', 'broke'), ('created', 'robot', 'and'),
    ('created', 'robot', 'broke'), ('created', 'and', 'broke'), ('the', 'robot', 'and'),
    ('the', 'robot', 'broke'), ('the', 'robot', 'it'), ('the', 'and', 'broke'),
    ('the', 'and', 'it'), ('the', 'broke', 'it'), ('robot', 'and', 'broke'),
    'broke', 'it'), ('robot', 'broke', 'after'), ('robot', 'it', 'after'),
    ('and', 'broke', 'it'), ('and', 'broke', 'after'), ('and', 'broke', 'making'),
    ('and', 'it', 'after'), ('and', 'it', 'making'), ('and', 'after', 'making'),
    ('broke', 'it', 'after'), ('broke', 'it', 'making'), ('broke', 'it', 'it.'),
    ('broke', 'after', 'making'), ('broke', 'after', 'it.'), ('broke', 'making', 'it.'),
    ('it', 'after', 'making'), ('it', 'after', 'it.'), ('it', 'making', 'it.'), ('after', 'making', 'it.')]

Note

Run the depParser.sh for English PCFG parser on one or more files, printing trees only.

Warning

If on encournter this type of error meaning (Unable to resolve “edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz” as either class path, filename or URL) then you should have CoreNLP(Stanford).

Using Function get_annoations(sentence) returns a dictionary of annotations.

>>> annotator.get_annoations("There are people dying make this world a better place for you and for me.")
   {'dep_parse': '',
   'chunk': [('There', 'S-NP'), ('are', 'S-VP'), ('people', 'S-NP'), ('dying', 'B-VP'), ('make', 'E-VP'), ('this', 'B-NP'), ('world', 'E-NP'), ('a', 'B-NP'), ('better', 'I-NP'), ('place', 'E-NP'), ('for', 'S-PP'), ('you', 'S-NP'), ('and', 'O'), ('for', 'S-PP'), ('me.', 'S-NP')],
   'pos': [('There', 'EX'), ('are', 'VBP'), ('people', 'NNS'), ('dying', 'VBG'), ('make', 'VB'), ('this', 'DT'), ('world', 'NN'), ('a', 'DT'), ('better', 'JJR'), ('place', 'NN'), ('for', 'IN'), ('you', 'PRP'), ('and', 'CC'), ('for', 'IN'), ('me.', '.')],
   'srl': [{'A1': 'people', 'V': 'dying'},
   {'A1': 'people  this world', 'A2': 'a better place for you and for me.', 'V': 'make'}],
    'syntax_tree': '(S1(S(NP(EX There))(VP(VBP are)(NP(NP(NNS people))(SBAR(S(VBG dying)(VP(VB make)(S(NP(DT this)(NN world))(NP(DT a)(JJR better)(NN place)))(PP(PP(IN for)(NP(PRP you)))(CC and)(PP(IN for)(NP(. me.)))))))))))',
    'verbs': ['dying', 'make'],
   'words': ['There', 'are', 'people', 'dying', 'make', 'this', 'world', 'a', 'better', 'place', 'for', 'you', 'and', 'for', 'me.'], \\
   'ner': [('There', 'O'), ('are', 'O'),
   ('people', 'O'), ('dying', 'O'), ('make', 'O'), ('this', 'O'), ('world', 'O'), ('a', 'O'), ('better', 'O'), ('place', 'O'), ('for', 'O'), ('you', 'O'), ('and', 'O'), ('for', 'O'), ('me.', 'O')]}

Using Function get_annoations(sentence,dep_parse=True) returns a dictionary of annotations with dependency parse, by default it is switched off.

>>> annotator.get_annoations("There are people dying make this world a better place for you and for me.",dep_parse=True)
    {'dep_parse': 'expl(are-2, There-1)\nroot(ROOT-0, are-2)\nnsubj(are-2, people-3)\ndep(make-5, dying-4)\nrcmod(people-3, make-5)\ndet(world-7, this-6)\nnsubj(place-10, world-7)\ndet(place-10, a-8)\namod(place-10, better-9)\nxcomp(make-5, place-10)\nprep_for(make-5, you-12)\nconj_and(you-12, me.-15)',
    'chunk': [('There', 'S-NP'), ('are', 'S-VP'), ('people', 'S-NP'),
     ('dying', 'B-VP'), ('make', 'E-VP'), ('this', 'B-NP'), ('world', 'E-NP'), ('a', 'B-NP'), ('better', 'I-NP'), ('place', 'E-NP'), ('for', 'S-PP'), ('you', 'S-NP'), ('and', 'O'), ('for', 'S-PP'), ('me.', 'S-NP')],
      'pos': [('There', 'EX'), ('are', 'VBP'),
      ('people', 'NNS'), ('dying', 'VBG'), ('make', 'VB'), ('this', 'DT'), ('world', 'NN'), ('a', 'DT'), ('better', 'JJR'), ('place', 'NN'), ('for', 'IN'), ('you', 'PRP'), ('and', 'CC'), ('for', 'IN'), ('me.', '.')], 'srl': [{'A1': 'people', 'V': 'dying'},\
      {'A1': 'people  this world', 'A2': 'a better place for you and for me.', 'V': 'make'}],
       'syntax_tree': '(S1(S(NP(EX There))(VP(VBP are)(NP(NP(NNS people))(SBAR(S(VBG dying)(VP(VB make)(S(NP(DT this)(NN world))(NP(DT a)(JJR better)(NN place)))(PP(PP(IN for)(NP(PRP you)))(CC and)(PP(IN for)(NP(. me.)))))))))))',
       'verbs': ['dying', 'make'],
       'words': ['There', 'are', 'people', 'dying', 'make', 'this', 'world', 'a', 'better', 'place', 'for', 'you', 'and', 'for', 'me.'], 'ner': [('There', 'O'), ('are', 'O'), ('people', 'O'), ('dying', 'O'), ('make', 'O'), ('this', 'O'), ('world', 'O'), ('a', 'O'), ('better', 'O'), ('place', 'O'), ('for', 'O'), ('you', 'O'), ('and', 'O'), ('for', 'O'), ('me.', 'O')]}

You can access individual componets as:

>>> annotator.get_annoations("Jawahar is a good boy.")['pos']
  [('Jawahar', 'NNP'), ('is', 'VBZ'), ('a', 'DT'), ('good', 'JJ'), ('boy', 'NN'), ('.', '.')]
>>> annotator.get_annoations("Jawahar is a good boy.")['ner']
  [('Jawahar', 'S-PER'), ('is', 'O'), ('a', 'O'), ('good', 'O'), ('boy', 'O'), ('.', 'O')]
>>> annotator.get_annoations("Jawahar is a good boy.")['chunk']
  [('Jawahar', 'S-NP'), ('is', 'S-VP'), ('a', 'B-NP'), ('good', 'I-NP'), ('boy', 'E-NP'), ('.', 'O')]

To list the verbs for which semantic roles are found.

>>> annotator.get_annoations("He created the robot and broke it after making it.")['verbs']
   ['created', 'broke', 'making']

‘srl’ Returns a list of dictionaries, identifyinging sematic roles for various verbs in sentence.

>>> annotator.get_annoations("He created the robot and broke it after making it.")['srl']
    [{'A1': 'the robot', 'A0': 'He', 'V': 'created'}, {'A1': 'it', 'A0': 'He', 'AM-TMP': 'after making it.', 'V': 'broke'}, {'A1': 'it.', 'A0': 'He', 'V': 'making'}]

‘syntax_tree’ Returns syntax tree in penn Tree Bank Format.

>>> annotator.get_annoations("He created the robot and broke it after making it.")['syntax_tree']
    '(S1(S(NP(PRP He))(VP(VP(VBD created)(NP(DT the)(NN robot)))(CC and)(VP(VBD broke)(NP(PRP it))(PP(IN after)(S(VP(VBG making)(NP(PRP it.)))))))))'

Note

‘dep_parse’ Returns dependency Relations as a string. Each relation is in new line. You may require some post processing on this.

Note

dep_parse may not work properly if stanford dependency parser is not present in practnlptools folder. To change in the output format from edit `lexparser.sh`(self testing only) if you know what you are doing.

To know about outputformat see the Stanford Parser FAQ link and manuall link.

>>> annotator.get_annoations("He created the robot and broke it after making it.",dep_parse=True)['dep_parse']
    nsubj(created-2, He-1)
    root(ROOT-0, created-2)
    det(robot-4, the-3)
    dobj(created-2, robot-4)
    conj_and(created-2, broke-6)
    dobj(broke-6, it-7)
    prepc_after(broke-6, making-9)
    dobj(making-9, it.-10)

Note: For illustration purposes we have used:

>>> annotator.get_annoations("He created the robot and broke it after making it.",dep_parse=True)['dep_parse']

Better method is:

>>> annotation=annotator.get_annoations("He created the robot and broke it after making it.",dep_parse=True)
>>>ner=annotation['ner']
>>>srl=annotation['srl']

get_conll_format( sentence, options=‘-srl -pos -ner -chk -psg’)

This function used to return CoNLL format that is return by the SENNA tool in its process. The option= should be in string format which is converted as list() and passed into the lower communication for shell.

>>> annotator.get_conll_format("He created the robot and broke it after making it.", options='-srl -pos')
He         PRP                -       S-A0        S-A0        S-A0
        created        VBD          created        S-V           O           O
            the         DT                -       B-A1           O           O
          robot         NN                -       E-A1           O           O
            and         CC                -          O           O           O
          broke        VBD            broke          O         S-V           O
             it        PRP                -          O        S-A1           O
          after         IN                -          O    B-AM-TMP           O
         making        VBG           making          O    I-AM-TMP         S-V
            it.        PRP                -          O    E-AM-TMP        S-A1

to get help for this function use the class method help_conll_format() >Annotator.help_conll_format() # pnlt.utils.skipgrams(sentence, n=2, k=1) n = is the value for n-grams k = skip value skipgrams() returns the output in genetator form for better memory management. .. code:: python

>>> from pntl.utils import skipgrams
>>> sent = "He created the robot and broke it after making it."
>>> #return generators
>>> list(skipgrams(sent.split(), n=3, k=2))
[('He', 'created', 'the'), ('He', 'created', 'robot'), ('He', 'created', 'and'),
 ('He', 'the', 'robot'), ('He', 'the', 'and'),
 ('He', 'robot', 'and'),
  ('created', 'the', 'robot'), ('created', 'the', 'and'),
   ('created', 'the', 'broke'), ('created', 'robot', 'and'), ('created', 'robot', 'broke'), ('created', 'and', 'broke'),
 ('the', 'robot', 'and'), ('the', 'robot', 'broke'), ('the', 'robot', 'it'), ('the', 'and', 'broke'),
 ('the', 'and', 'it'), ('the', 'broke', 'it'), ('robot', 'and', 'broke'), ('robot', 'and', 'it'),
  ('robot', 'and', 'after'), ('robot', 'broke', 'it'), ('robot', 'broke', 'after'),
  ('robot', 'it', 'after'), ('and', 'broke', 'it'), ('and', 'broke', 'after'),
   ('and', 'broke', 'making'), ('and', 'it', 'after'), ('and', 'it', 'making'),
   ('and', 'after', 'making'),
  ('broke', 'it', 'after'), ('broke', 'it', 'making'),
  ('broke', 'it', 'it.'),
   ('broke', 'after', 'making'), ('broke', 'after', 'it.'), ('broke', 'making', 'it.'),
   ('it', 'after', 'making'),
   ('it', 'after', 'it.'), ('it', 'making', 'it.'), ('after', 'making', 'it.')]

CLI

To run know more about the CLI entry point use `pntl --help` for detail help command

pntl --help
Usage: pntl [OPTIONS]
...
...

API

A general interface of the SENNA and Stanford Dependency Extractor pipeline that supports any of the operations specified in SUPPORTED_OPERATIONS. SUPPORTED_OPERATIONS: It provides Part of Speech Tags, Semantic Role Labels, Shallow Parsing (Chunking), Named Entity Recognisation (NER), Dependency Parse and Syntactic Constituency Parse. Applying multiple operations at once has the speed advantage. For example, senna v3.0 will calculate the POS tags in case you are extracting the named entities. Applying both of the operations will cost only the time of extracting the named entities. Same is true for dependency Parsing. SENNA pipeline has a fixed maximum size of the sentences that it can read. By default it is 1024 token/sentence. If you have larger sentences, changing the MAX_SENTENCE_SIZE value in SENNA_main.c should be considered and your

Annoator API

System specific binary should be rebuilt. Otherwise this could introduce misalignment errors and for Dependency Parser the requirement is Java Runtime Environment :)

class pntl.tools.Annotator(senna_dir='', stp_dir='', dep_model='edu.stanford.nlp.trees.EnglishGrammaticalStructure', raise_e=False, save_all=False, env=False, env_path='')[source]

:Class:~pntl.Annotator is a class which holds the nessary function.

check_stp_jar(path, raise_e=False, _rec=True)[source]

Check the stanford parser is present in the given directions and nested searching will be added in futurwork

Parameters
  • path (str) – path of where the stanford parser is present

  • raise_e (bool) – to raise exception with user wise and default False don’t raises exception

Returns

given path if it is valid one or return boolean False or if raise FileNotFoundError on raise_exp=True

Return type

bool

get_annoations(sentence='', senna_tags=None, dep_parse=True)[source]

passing the string to senna and performing aboue given nlp process and the returning them in a form of dict()

Parameters
  • or list sentence (str) – a sentence or list of sentence for nlp process.

  • or list senna_tags (str) – this values are by SENNA processed string

  • batch (bool) – the change the mode into batch processing process

  • dep_parse (bool) – to tell the code and user need to communicate with stanford parser

Returns

the dict() of every out in the process such as ner, dep_parse, srl, verbs etc.

Return type

dict

get_batch_annotations(sentences, dep_parse=True)[source]
Parameters

sentences (list) – list of sentences

Return type

list

get_conll_format(sentence, options='-srl -pos -ner -chk -psg')[source]

Communicates with senna through lower level communiction (sub process) and converted the console output(default is file writing) with CoNLL format and argument to be in options pass

Parameters
  • or list (str) – list of sentences for batch processes

  • list (options) – list of arguments

options

desc

-verbose

Display model informations (on the standard error output, so it does not mess up the tag outputs).

-notokentags

Do not output tokens (first output column).

-offsettags

Output start/end character offset (in the sentence), for each token.

-iobtags

Output IOB tags instead of IOBES.

-brackettags

Output ‘bracket’ tags instead of IOBES.

-path

Specify the path to the SENNA data and hash directories, if you do not run SENNA in its original directory. The path must end by “/”.

-usrtokens

Use user’s tokens (space separated) instead of SENNA tokenizer.

-posvbs

Use verbs outputed by the POS tagger instead of SRL style verbs for SRL task. You might want to use this, as the SRL training task ignore some verbs (many “be” and “have”) which might be not what you want.

-usrvbs

Use user’s verbs (given in ) instead of SENNA verbs for SRL task. The file must contain one line per token, with an empty line between each sentence. A line which is not a “-” corresponds to a verb.

-pos

Instead of outputing tags for all tasks, SENNA will output tags for the specified (one or more) tasks.

-chk

Instead of outputing tags for all tasks, SENNA will output tags for the specified (one or more) tasks.

-ner

Instead of outputing tags for all tasks, SENNA will output tags for the specified (one or more) tasks.

-srl

Instead of outputing tags for all tasks, SENNA will output tags for the specified (one or more) tasks.

-psg

Instead of outputing tags for all tasks, SENNA will output tags for the specified (one or more) tasks.

Returns

senna tagged output

Return type

str

get_dependency(parse)[source]

Change to the Stanford parser direction and process the works

Parameters

parse (str) – parse is the input(tree format) and it is writen in as file

Returns

stanford dependency universal format

Return type

str

get_senna_bin(os_name)[source]

get the current os executable binary file.

Parameters

os_name (str) – os name like Linux, Darwin, Windows

Returns

the corresponding exceutable object file of senna

Return type

str

get_senna_tag(input_data)[source]

Communicates with senna through lower level communiction(sub process) and converted the console output(default is file writing)

:param str/list input_data : list of sentences for batch processes :return: senna tagged output :rtype: str

get_senna_tag_batch(sentences)[source]

Communicates with senna through lower level communiction(sub process) and converted the console output(default is file writing). On batch processing each end is add with new line.

Parameters

sentences (list) – list of sentences for batch processes

Return type

str

classmethod help_conll_format()[source]

With the help of this method, detail of senna arguments are displayed

property jar_cli

The return cli for standford-parser.jar(this is python @property)

Return type

string

print_values()[source]

displays the current set of values such as SENNA location, stanford parser jar, jar command interface

save(end_point)[source]

Save is wrapper function build on the top of :Class:~snowbase.end_point.EntryPoint.

property senna_dir

The return the path of senna location and set the path for senna at run time(this is python @property)

Return type

string

property stp_dir

The return the path of stanford parser jar location and set the path for Dependency Parse at run time( this is python @property)

Endpoint API

The Endpoint API is moved to new repo snowbase

Please install snowbase directly from github.

# must contain `setup.py`
pip install ./snowbase

Environment

This page describe the setting(environment) using in this project through environment variable.

By mapping the environment variable with value settings can be changed.

General

The follwing table is mapped for General settings.

Name

Value

Description

DEBUG

true

Set the app in debug mode.

DEFAULT_LEN

400

Global value for which help in setting database VARCHAR field below field init are set as same.

Other Fields

DEFAULT_LEN

POS_LEN, NER_LEN, DEP_LEN, SRL_LEN CHUNK_LEN, VERB_LEN.

DB_CLASS

Package

Setting usage of which entity to be in store and lookup.

DataBase Environment

The Database environment will be use in activing specific property in database.

Name

Value

Description

TABLENAME

package_items

Table will be created under the given name and all the value stored under this table.

DATABASE_ECHO

DEBUG

Show’s what sqlalchemy

doing inside itself.

DATABASE_URL

db_url

Set url which is support by DB-driver

Hash Environment

Setting environment for hash property for generation and setting limit length.

Name

Value

Description

HASH_VALUE_LEN

10

Setting VARCHAR limit to database.

HASH_CLASS

hashlib.md5

Standard python hash class will be used as default hash generator. By this approch user has the power to 3rd party libery which they like with no problem(note: it should be compartable with standard hash libery of python.[Eg: xxhash]

Issues

  1. You cannot give sentence with ‘(’ ‘)’, that is left bracket aor right bracket. It will end up in returning no result. So please clean Sentences before sending to annotator.

  2. Other issue might be senna executable built for various platforms. I have not experienced it, but its highly probable. If you get this issuse:

Go to folder senna location

cd senna
gcc -O3 -o senna-linux64 *.c  (For linux 64 bit)
gcc -O3 -o senna-linux32 *.c  (For linux 32 bit)
gcc -O3 -o senna-senna-osx *.c (For Mac)
*windows: I never compiled C files in Windows.*
python setup.py install
  1. Any other, you can mail to Jawahar273@gmail.com

  2. Issues with “pip install pntl”

You might receive following Error while running:

 Traceback (most recent call last):
 File "test.py", line 3, in <module>
    print a.getAnnotations("This is a test.")
  File "/usr/local/lib/python3.5/dist-packages/pntl/tools.py", line 206, in getAnnotations
    senna_tags=self.getSennaTag(sentence)
  File "/usr/local/lib/python3.5/dist-packages/pntl/tools.py", line 88, in getSennaTag
    p = subprocess.Popen(senna_executable,stdout=subprocess.PIPE, stdin=subprocess.PIPE)
  File "/usr/lib/python3.5/subprocess.py", line 679, in __init__
    errread, errwrite)
  File "/usr/lib/python3.5/subprocess.py", line 1249, in _execute_child
    raise child_exception
OSError: [Errno 13] Permission denied
To Fix this,you can do:
shell  chmod -R +x /usr/local/lib/python3.5/dist-packages/pntl/

Stanford Parser

Introduction

If the stanford parser can not be copy into installing location. So for quick working on Dependence parser the some possible.

  1. Assigning the location of stanford-parser.jar.

    annotator=Annotator()
    annotator.stp_dir="location_pls"#with respect to your os environment or follow the below steps
    
  2. Placing the stanford-parser.jar explicitly inside the installing in linux on python 3.x usual location is /usr/local/lib/python3.5/dist-packages/pntl

  3. if you are using Anaconda distribution python the possible location is /home/<user_name>/anaconda3/lib/python3.5/site-packages/pntl/ (without virtual environment)

  4. For Windows if you have install python in C: the go to the path
    c:\users\<user_name_here>\local\programs\python3<just_version_number_here>\lib\site-packages\pntl
    or for Anaconda Python distribution C:\anaconda3\Lib\site-packages\pntl
Eg:
c:\users\jawahar\local\programs\python35\lib\site-packages\pntl
this is an example path with Python 3.5 installed in Windows system with jawahar as its user name.

Note:- In Anaconda distribution it has its own version number so please change if you have to and change the Python version according which is present in your system. For windows there is no need to .(dot) in between version number of Python.

Note

Don’t forget to make sudo python setup install or admin terminal python setup install

CHANGELOG

0.3.3

  • Bug fixed in saving result in elastic server.

0.3.2

  • Saving result in database and searching with/without elastic search

engine

0.3.0

  • Storing processed sentence into database.

0.2.1

  • Adding CHANGELOG in package.

  • Correction in README

0.2.0 4-alpha

  • Marking standard tools for pntl.

0.1.1 (2017-09-17)

  • Planing to release on PyPI.

Credits

Development Lead

Contributors

None yet. Why not be the first?

Contributing

Contributions are welcome, and they are greatly appreciated! Every little bit helps, and credit will always be given.

You can contribute in many ways:

Types of Contributions

Report Bugs

Report bugs at https://github.com/jawahar273/practNLPTools-lite/issues.

If you are reporting a bug, please include:

  • Your operating system name and version.

  • Any details about your local setup that might be helpful in troubleshooting.

  • Detailed steps to reproduce the bug.

Fix Bugs

Look through the GitHub issues for bugs. Anything tagged with “bug” and “help wanted” is open to whoever wants to implement it.

Implement Features

Look through the GitHub issues for features. Anything tagged with “enhancement” and “help wanted” is open to whoever wants to implement it.

Write Documentation

practNLPTools-lite could always use more documentation, whether as part of the official practNLPTools-lite docs, in docstrings, or even on the web in blog posts, articles, and such.

Submit Feedback

The best way to send feedback is to file an issue at https://github.com/jawahar273/practNLPTools-lite/issues.

If you are proposing a feature:

  • Explain in detail how it would work.

  • Keep the scope as narrow as possible, to make it easier to implement.

  • Remember that this is a volunteer-driven project, and that contributions are welcome :)

Get Started!

Ready to contribute? Here’s how to set up practNLPTools-lite for local development.

  1. Fork the practNLPTools-lite repo on GitHub.

  2. Clone your fork locally:

    $ git clone git@github.com:your_name_here/practNLPTools-lite.git
    
  3. Install your local copy into a virtualenv. Assuming you have virtualenvwrapper installed, this is how you set up your fork for local development:

    $ mkvirtualenv practNLPTools-lite
    $ cd practNLPTools-lite/
    $ python setup.py develop
    
  4. Create a branch for local development:

    $ git checkout -b name-of-your-bugfix-or-feature
    

    Now you can make your changes locally.

  5. When you’re done making changes, check that your changes pass flake8 and the tests, including testing other Python versions with tox:

    $ flake8 practNLPTools-lite tests
    $ python setup.py test or py.test
    $ tox
    

    To get flake8 and tox, just pip install them into your virtualenv.

  6. Commit your changes and push your branch to GitHub:

    $ git add .
    $ git commit -m "Your detailed description of your changes."
    $ git push origin name-of-your-bugfix-or-feature
    
  7. Submit a pull request through the GitHub website.

Pull Request Guidelines

Before you submit a pull request, check that it meets these guidelines:

  1. The pull request should include tests.

  2. If the pull request adds functionality, the docs should be updated. Put your new functionality into a function with a docstring, and add the feature to the list in README.rst.

  3. The pull request should work for Python 2.6, 2.7, 3.3, 3.4 and 3.5, and for PyPy. Check https://travis-ci.org/jawahar273/practNLPTools-lite/pull_requests and make sure that the tests pass for all supported Python versions.

Indices and tables