Welcome to practNLPTools-lite’s documentation!¶
Contents:
practNLPTools-lite¶
This project is a fork of biplab-iitb
Warning
CLI is only for example purpose don’t use for long running jobs.
Get the very old code in devbranch or prior stable version oldVersion.
- this built might take you to
practNLPTools which is testing ground for this repository so don’t
worry.
Practical Natural Language Processing Tools for Humans. practNLPTools is a pythonic library over SENNA and Stanford Dependency Extractor.
name |
status |
---|---|
PyPi |
|
travis |
|
Documentation |
|
dependency |
|
blocker Pyupbot |
|
FOSSA |
Note
After version 0.3.0+ pntl should able to store the result into database for later usage if needed by installing below dependency.
pip install git+https://github.com/jawahar273/snowbase.git
QuickStart¶
Downlarding Stanford Parser JAR¶
To downlard the stanford-parser from github automatically and placing them inside the install direction.
pntl -I true
# downlards required file from github.
Running Predefine Examples Sentences¶
To run predefine example in batch mode(which has more than one list of examples).
pntl -SE home/user/senna -B true
Example¶
Batch mode means listed sentences.
..code:
# Example structure for predefine
# Sentences in the code.
sentences = [
"This is line 1",
"This is line 2",
]
To run predefine example in non batch mode.
pntl -SE home/user/senna
Running user given sentence¶
To run user given example using -S is
pntl -SE home/user/senna -S 'I am gonna make him an offer he can not refuse.'
Functionality¶
Semantic Role Labeling.
Syntactic Parsing.
Part of Speech Tagging (POS Tagging).
Named Entity Recognisation (NER).
Dependency Parsing.
Shallow Chunking.
Skip-gram(in-case).
find the senna path if is install in the system.
stanford parser and depPaser file into installed direction.
Future work¶
tag2file(new)
creating depParser for corresponding os environment
custome input format for stanford parser insted of tree format
Features¶
Fast: SENNA is written is C. So it is Fast.
We use only dependency Extractor Component of Stanford Parser, which takes in Syntactic Parse from SENNA and applies dependency Extraction. So there is no need to load parsing models for Stanford Parser, which takes time.
Easy to use.
Platform Supported - Windows, Linux and Mac
Automatic finds stanford parsing jar if it is present in install path[pntl].
Note
SENNA pipeline has a fixed maximum size of the sentences that it can read. By default it is 1024 token/sentence. If you have larger sentences, changing the MAX_SENTENCE_SIZE value in SENNA_main.c should beconsidered and your system specific binary should be rebuilt. Otherwise this could introduce misalignment errors.
Installation¶
Requires:
A computer with 500mb memory, Java Runtime Environment (1.7 preferably, works with 1.6 too, but didnt test.) installed and python.
Linux:
run:
sudo python setup.py installwindows:
run this commands as administrator:
python setup.py install
Bench Mark comparsion¶
By using the time
command in ubuntu on running the testsrl.py
on
this link and along with tools.py
on pntl
pntl |
NLTK-senna |
|
---|---|---|
at fist run |
||
real 0m1.674s |
real 0m2.484s |
|
user 0m1.564s |
user 0m1.868s |
|
sys 0m0.228s |
sys 0m0.524s |
|
at second run |
||
real 0m1.245s |
real 0m3.359s |
|
user 0m1.560s |
user 0m2.016s |
|
sys 0m0.152s |
sys 0m1.168s |
Note
This benchmark may diffrent from system to sytem. The result produced here is from ububtu 4Gb RAM and i3 process.
Credits¶
This package was created with Cookiecutter and the audreyr/cookiecutter-pypackage project template.
Installation¶
Stable release¶
To install practNLPTools-lite, run this command in your terminal:
$ pip install pntl
This is the preferred method to install practNLPTools-lite, as it will always install the most recent stable release.
If you don’t have pip installed, this Python installation guide can guide you through the process.
From sources¶
The sources for practNLPTools-lite can be downloaded from the Github repo.
You can either clone the public repository:
$ git clone git://github.com/jawahar273/practNLPTools-lite
Or download the tarball:
$ curl -OL https://github.com/jawahar273/practNLPTools-lite/tarball/master
Once you have a copy of the source, you can install it with:
$ python setup.py install
Usage¶
Examples¶
S = Tag covers Single Word.
B = Tag Begins with the Word.
I = Word is internal to tag which has begun.
E = Tag Ends with the Word.
0 = Other tags.
Example:
(‘Republican’, ‘B-NP’), (‘candidate’, ‘I-NP’), (‘George’, ‘I-NP’),
(‘Bush’, ‘E-NP’), (‘was’, ‘S-VP’), (‘great’, ‘S-ADJP’), (‘.’, ‘O’)
means:
[Republican]NP [candidate]NP [a good boy]NP [George]NP [Bush]NP [was]VP
[great]ADJP
Annotator is the only class you need. Create an annotator object.
pntl
| -- tools
| --class-- Annotator
| --jar-- stanford-parser
| -- utils
| --function-- skipgrams
Note
in.parser
file consite syntax tree(for now) which is use as input for dependencie parser. One more thing the last runned sentence output only will be stored.
Annotator[class]¶
>>> from pntl.tools import Annotator
>>> annotator = Annotator(senna_dir = "/home/user/senna", stp_dir = "/home/user/stanford_parser_folder")
>>> # changing senna path at run time is also possible
>>>
>>> annotator.senna_dir = "/home/user/senna"
>>> annotator.senna_dir# return path name
"/home/user/senna"
>>> annotator.stp_dir = "/home/user/stanford_parser_folder"# stanfordparser.jar must present inside it.
>>> annotator.java_cli
java -cp stanford-parser.jar edu.stanford.nlp.trees.EnglishGrammaticalStructure -treeFile in.parse -collapsed
>>>
>>> annotator.java_cli = "java -cp stanford-parser.jar edu.stanford.nlp.trees.EnglishGrammaticalStructure -treeFile in.parse"
>>> # setting the cli
Self-testing¶
Warning
This function is depercated
To test for your self please use function test()
>>> from pntl.tools import test
>>> test(senna_path=“/home/user/senna”,
stp_dir=“/home/user/stanford_parser_folder/stanford_parser.jar”)# input the
location of senna file, if the senna is present the follwing output is printed
conll:
He PRP - S-A0 S-A0 S-A0
created VBD created S-V O O
the DT - B-A1 O O
robot NN - E-A1 O O
and CC - O O O
broke VBD broke O S-V O
it PRP - O S-A1 O
after IN - O B-AM-TMP O
making VBG making O I-AM-TMP S-V
it. PRP - O E-AM-TMP S-A1
dep_parse:
nsubj(created-2, He-1)
root(ROOT-0, created-2)
det(robot-4, the-3)
dobj(created-2, robot-4)
conj_and(created-2, broke-6)
dobj(broke-6, it-7)
prepc_after(broke-6, making-9)
dobj(making-9, it.-10)
chunk:
[('He', 'S-NP'), ('created', 'S-VP'), ('the', 'B-NP'), ('robot', 'E-NP'), ('and', 'O'), ('broke', 'S-VP'), ('it', 'S-NP'), ('after', 'S-PP'), ('making', 'S-VP'), ('it.', 'S-NP')]
pos:
[('He', 'PRP'), ('created', 'VBD'), ('the', 'DT'), ('robot', 'NN'), ('and', 'CC'), ('broke', 'VBD'), ('it', 'PRP'), ('after', 'IN'), ('making', 'VBG'), ('it.', 'PRP')]
ner:
[('He', 'O'), ('created', 'O'), ('the', 'O'), ('robot', 'O'), ('and', 'O'), ('broke', 'O'), ('it', 'O'), ('after', 'O'), ('making', 'O'), ('it.', 'O')]
srl:
[{'A1': 'the robot', 'V': 'created', 'A0': 'He'}, {'A1': 'it', 'AM-TMP': 'after making it.', 'V': 'broke', 'A0': 'He'}, {'A1': 'it.', 'V': 'making', 'A0': 'He'}]
syntax tree:
(S1(S(NP(PRP He))(VP(VP(VBD created)(NP(DT the)(NN robot)))(CC and)(VP(VBD broke)(NP(PRP it))(PP(IN after)(S(VP(VBG making)(NP(PRP it.)))))))))
words:
['He', 'created', 'the', 'robot', 'and', 'broke', 'it', 'after', 'making', 'it.']
skip gram
[('He', 'created', 'the'), ('He', 'created', 'robot'), ('He', 'created', 'and'),
('He', 'the', 'robot'), ('He', 'the', 'and'), ('He', 'robot', 'and'), ('created', 'the', 'robot'),
('created', 'the', 'and'), ('created', 'the', 'broke'), ('created', 'robot', 'and'),
('created', 'robot', 'broke'), ('created', 'and', 'broke'), ('the', 'robot', 'and'),
('the', 'robot', 'broke'), ('the', 'robot', 'it'), ('the', 'and', 'broke'),
('the', 'and', 'it'), ('the', 'broke', 'it'), ('robot', 'and', 'broke'),
'broke', 'it'), ('robot', 'broke', 'after'), ('robot', 'it', 'after'),
('and', 'broke', 'it'), ('and', 'broke', 'after'), ('and', 'broke', 'making'),
('and', 'it', 'after'), ('and', 'it', 'making'), ('and', 'after', 'making'),
('broke', 'it', 'after'), ('broke', 'it', 'making'), ('broke', 'it', 'it.'),
('broke', 'after', 'making'), ('broke', 'after', 'it.'), ('broke', 'making', 'it.'),
('it', 'after', 'making'), ('it', 'after', 'it.'), ('it', 'making', 'it.'), ('after', 'making', 'it.')]
Note
Run the depParser.sh for English PCFG parser on one or more files, printing trees only.
Warning
If on encournter this type of error meaning (Unable to resolve “edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz” as either class path, filename or URL) then you should have CoreNLP(Stanford).
Using Function get_annoations(sentence) returns a dictionary of annotations.
>>> annotator.get_annoations("There are people dying make this world a better place for you and for me.")
{'dep_parse': '',
'chunk': [('There', 'S-NP'), ('are', 'S-VP'), ('people', 'S-NP'), ('dying', 'B-VP'), ('make', 'E-VP'), ('this', 'B-NP'), ('world', 'E-NP'), ('a', 'B-NP'), ('better', 'I-NP'), ('place', 'E-NP'), ('for', 'S-PP'), ('you', 'S-NP'), ('and', 'O'), ('for', 'S-PP'), ('me.', 'S-NP')],
'pos': [('There', 'EX'), ('are', 'VBP'), ('people', 'NNS'), ('dying', 'VBG'), ('make', 'VB'), ('this', 'DT'), ('world', 'NN'), ('a', 'DT'), ('better', 'JJR'), ('place', 'NN'), ('for', 'IN'), ('you', 'PRP'), ('and', 'CC'), ('for', 'IN'), ('me.', '.')],
'srl': [{'A1': 'people', 'V': 'dying'},
{'A1': 'people this world', 'A2': 'a better place for you and for me.', 'V': 'make'}],
'syntax_tree': '(S1(S(NP(EX There))(VP(VBP are)(NP(NP(NNS people))(SBAR(S(VBG dying)(VP(VB make)(S(NP(DT this)(NN world))(NP(DT a)(JJR better)(NN place)))(PP(PP(IN for)(NP(PRP you)))(CC and)(PP(IN for)(NP(. me.)))))))))))',
'verbs': ['dying', 'make'],
'words': ['There', 'are', 'people', 'dying', 'make', 'this', 'world', 'a', 'better', 'place', 'for', 'you', 'and', 'for', 'me.'], \\
'ner': [('There', 'O'), ('are', 'O'),
('people', 'O'), ('dying', 'O'), ('make', 'O'), ('this', 'O'), ('world', 'O'), ('a', 'O'), ('better', 'O'), ('place', 'O'), ('for', 'O'), ('you', 'O'), ('and', 'O'), ('for', 'O'), ('me.', 'O')]}
Using Function get_annoations(sentence,dep_parse=True) returns a dictionary of annotations with dependency parse, by default it is switched off.
>>> annotator.get_annoations("There are people dying make this world a better place for you and for me.",dep_parse=True)
{'dep_parse': 'expl(are-2, There-1)\nroot(ROOT-0, are-2)\nnsubj(are-2, people-3)\ndep(make-5, dying-4)\nrcmod(people-3, make-5)\ndet(world-7, this-6)\nnsubj(place-10, world-7)\ndet(place-10, a-8)\namod(place-10, better-9)\nxcomp(make-5, place-10)\nprep_for(make-5, you-12)\nconj_and(you-12, me.-15)',
'chunk': [('There', 'S-NP'), ('are', 'S-VP'), ('people', 'S-NP'),
('dying', 'B-VP'), ('make', 'E-VP'), ('this', 'B-NP'), ('world', 'E-NP'), ('a', 'B-NP'), ('better', 'I-NP'), ('place', 'E-NP'), ('for', 'S-PP'), ('you', 'S-NP'), ('and', 'O'), ('for', 'S-PP'), ('me.', 'S-NP')],
'pos': [('There', 'EX'), ('are', 'VBP'),
('people', 'NNS'), ('dying', 'VBG'), ('make', 'VB'), ('this', 'DT'), ('world', 'NN'), ('a', 'DT'), ('better', 'JJR'), ('place', 'NN'), ('for', 'IN'), ('you', 'PRP'), ('and', 'CC'), ('for', 'IN'), ('me.', '.')], 'srl': [{'A1': 'people', 'V': 'dying'},\
{'A1': 'people this world', 'A2': 'a better place for you and for me.', 'V': 'make'}],
'syntax_tree': '(S1(S(NP(EX There))(VP(VBP are)(NP(NP(NNS people))(SBAR(S(VBG dying)(VP(VB make)(S(NP(DT this)(NN world))(NP(DT a)(JJR better)(NN place)))(PP(PP(IN for)(NP(PRP you)))(CC and)(PP(IN for)(NP(. me.)))))))))))',
'verbs': ['dying', 'make'],
'words': ['There', 'are', 'people', 'dying', 'make', 'this', 'world', 'a', 'better', 'place', 'for', 'you', 'and', 'for', 'me.'], 'ner': [('There', 'O'), ('are', 'O'), ('people', 'O'), ('dying', 'O'), ('make', 'O'), ('this', 'O'), ('world', 'O'), ('a', 'O'), ('better', 'O'), ('place', 'O'), ('for', 'O'), ('you', 'O'), ('and', 'O'), ('for', 'O'), ('me.', 'O')]}
You can access individual componets as:
>>> annotator.get_annoations("Jawahar is a good boy.")['pos']
[('Jawahar', 'NNP'), ('is', 'VBZ'), ('a', 'DT'), ('good', 'JJ'), ('boy', 'NN'), ('.', '.')]
>>> annotator.get_annoations("Jawahar is a good boy.")['ner']
[('Jawahar', 'S-PER'), ('is', 'O'), ('a', 'O'), ('good', 'O'), ('boy', 'O'), ('.', 'O')]
>>> annotator.get_annoations("Jawahar is a good boy.")['chunk']
[('Jawahar', 'S-NP'), ('is', 'S-VP'), ('a', 'B-NP'), ('good', 'I-NP'), ('boy', 'E-NP'), ('.', 'O')]
To list the verbs for which semantic roles are found.
>>> annotator.get_annoations("He created the robot and broke it after making it.")['verbs']
['created', 'broke', 'making']
‘srl’ Returns a list of dictionaries, identifyinging sematic roles for various verbs in sentence.
>>> annotator.get_annoations("He created the robot and broke it after making it.")['srl']
[{'A1': 'the robot', 'A0': 'He', 'V': 'created'}, {'A1': 'it', 'A0': 'He', 'AM-TMP': 'after making it.', 'V': 'broke'}, {'A1': 'it.', 'A0': 'He', 'V': 'making'}]
‘syntax_tree’ Returns syntax tree in penn Tree Bank Format.
>>> annotator.get_annoations("He created the robot and broke it after making it.")['syntax_tree']
'(S1(S(NP(PRP He))(VP(VP(VBD created)(NP(DT the)(NN robot)))(CC and)(VP(VBD broke)(NP(PRP it))(PP(IN after)(S(VP(VBG making)(NP(PRP it.)))))))))'
Note
‘dep_parse’ Returns dependency Relations as a string. Each relation is in new line. You may require some post processing on this.
Note
dep_parse may not work properly if stanford dependency parser is not present in practnlptools folder. To change in the output format from edit `lexparser.sh`(self testing only) if you know what you are doing.
To know about outputformat
see the Stanford Parser FAQ link and
manuall
link.
>>> annotator.get_annoations("He created the robot and broke it after making it.",dep_parse=True)['dep_parse']
nsubj(created-2, He-1)
root(ROOT-0, created-2)
det(robot-4, the-3)
dobj(created-2, robot-4)
conj_and(created-2, broke-6)
dobj(broke-6, it-7)
prepc_after(broke-6, making-9)
dobj(making-9, it.-10)
Note: For illustration purposes we have used:
>>> annotator.get_annoations("He created the robot and broke it after making it.",dep_parse=True)['dep_parse']
Better method is:
>>> annotation=annotator.get_annoations("He created the robot and broke it after making it.",dep_parse=True)
>>>ner=annotation['ner']
>>>srl=annotation['srl']
get_conll_format( sentence, options=‘-srl -pos -ner -chk -psg’)¶
This function used to return CoNLL format that is return by the SENNA
tool in its process. The option=
should be in string format which is
converted as list()
and passed into the lower communication for
shell.
>>> annotator.get_conll_format("He created the robot and broke it after making it.", options='-srl -pos')
He PRP - S-A0 S-A0 S-A0
created VBD created S-V O O
the DT - B-A1 O O
robot NN - E-A1 O O
and CC - O O O
broke VBD broke O S-V O
it PRP - O S-A1 O
after IN - O B-AM-TMP O
making VBG making O I-AM-TMP S-V
it. PRP - O E-AM-TMP S-A1
to get help for this function use the class method
help_conll_format()
>Annotator.help_conll_format() #
pnlt.utils.skipgrams(sentence, n=2, k=1) n = is the value for n-grams k
= skip value skipgrams()
returns the output in genetator form for
better memory management.
.. code:: python
>>> from pntl.utils import skipgrams
>>> sent = "He created the robot and broke it after making it."
>>> #return generators
>>> list(skipgrams(sent.split(), n=3, k=2))
[('He', 'created', 'the'), ('He', 'created', 'robot'), ('He', 'created', 'and'),
('He', 'the', 'robot'), ('He', 'the', 'and'),
('He', 'robot', 'and'),
('created', 'the', 'robot'), ('created', 'the', 'and'),
('created', 'the', 'broke'), ('created', 'robot', 'and'), ('created', 'robot', 'broke'), ('created', 'and', 'broke'),
('the', 'robot', 'and'), ('the', 'robot', 'broke'), ('the', 'robot', 'it'), ('the', 'and', 'broke'),
('the', 'and', 'it'), ('the', 'broke', 'it'), ('robot', 'and', 'broke'), ('robot', 'and', 'it'),
('robot', 'and', 'after'), ('robot', 'broke', 'it'), ('robot', 'broke', 'after'),
('robot', 'it', 'after'), ('and', 'broke', 'it'), ('and', 'broke', 'after'),
('and', 'broke', 'making'), ('and', 'it', 'after'), ('and', 'it', 'making'),
('and', 'after', 'making'),
('broke', 'it', 'after'), ('broke', 'it', 'making'),
('broke', 'it', 'it.'),
('broke', 'after', 'making'), ('broke', 'after', 'it.'), ('broke', 'making', 'it.'),
('it', 'after', 'making'),
('it', 'after', 'it.'), ('it', 'making', 'it.'), ('after', 'making', 'it.')]
CLI¶
To run know more about the CLI entry point use `pntl --help`
for detail help
command
pntl --help
Usage: pntl [OPTIONS]
...
...
API¶
A general interface of the SENNA and Stanford Dependency Extractor pipeline that supports any of the operations specified in SUPPORTED_OPERATIONS. SUPPORTED_OPERATIONS: It provides Part of Speech Tags, Semantic Role Labels, Shallow Parsing (Chunking), Named Entity Recognisation (NER), Dependency Parse and Syntactic Constituency Parse. Applying multiple operations at once has the speed advantage. For example, senna v3.0 will calculate the POS tags in case you are extracting the named entities. Applying both of the operations will cost only the time of extracting the named entities. Same is true for dependency Parsing. SENNA pipeline has a fixed maximum size of the sentences that it can read. By default it is 1024 token/sentence. If you have larger sentences, changing the MAX_SENTENCE_SIZE value in SENNA_main.c should be considered and your
Annoator API¶
System specific binary should be rebuilt. Otherwise this could introduce misalignment errors and for Dependency Parser the requirement is Java Runtime Environment :)
-
class
pntl.tools.
Annotator
(senna_dir='', stp_dir='', dep_model='edu.stanford.nlp.trees.EnglishGrammaticalStructure', raise_e=False, save_all=False, env=False, env_path='')[source]¶ :Class:~pntl.Annotator is a class which holds the nessary function.
-
check_stp_jar
(path, raise_e=False, _rec=True)[source]¶ Check the stanford parser is present in the given directions and nested searching will be added in futurwork
- Parameters
path (str) – path of where the stanford parser is present
raise_e (bool) – to raise exception with user wise and default False don’t raises exception
- Returns
given path if it is valid one or return boolean False or if raise FileNotFoundError on raise_exp=True
- Return type
bool
-
get_annoations
(sentence='', senna_tags=None, dep_parse=True)[source]¶ passing the string to senna and performing aboue given nlp process and the returning them in a form of dict()
- Parameters
or list sentence (str) – a sentence or list of sentence for nlp process.
or list senna_tags (str) – this values are by SENNA processed string
batch (bool) – the change the mode into batch processing process
dep_parse (bool) – to tell the code and user need to communicate with stanford parser
- Returns
the dict() of every out in the process such as ner, dep_parse, srl, verbs etc.
- Return type
dict
-
get_batch_annotations
(sentences, dep_parse=True)[source]¶ - Parameters
sentences (list) – list of sentences
- Return type
list
-
get_conll_format
(sentence, options='-srl -pos -ner -chk -psg')[source]¶ Communicates with senna through lower level communiction (sub process) and converted the console output(default is file writing) with CoNLL format and argument to be in options pass
- Parameters
or list (str) – list of sentences for batch processes
list (options) – list of arguments
options
desc
-verbose
Display model informations (on the standard error output, so it does not mess up the tag outputs).
-notokentags
Do not output tokens (first output column).
-offsettags
Output start/end character offset (in the sentence), for each token.
-iobtags
Output IOB tags instead of IOBES.
-brackettags
Output ‘bracket’ tags instead of IOBES.
-path
Specify the path to the SENNA data and hash directories, if you do not run SENNA in its original directory. The path must end by “/”.
-usrtokens
Use user’s tokens (space separated) instead of SENNA tokenizer.
-posvbs
Use verbs outputed by the POS tagger instead of SRL style verbs for SRL task. You might want to use this, as the SRL training task ignore some verbs (many “be” and “have”) which might be not what you want.
-usrvbs
Use user’s verbs (given in ) instead of SENNA verbs for SRL task. The file must contain one line per token, with an empty line between each sentence. A line which is not a “-” corresponds to a verb.
-pos
Instead of outputing tags for all tasks, SENNA will output tags for the specified (one or more) tasks.
-chk
Instead of outputing tags for all tasks, SENNA will output tags for the specified (one or more) tasks.
-ner
Instead of outputing tags for all tasks, SENNA will output tags for the specified (one or more) tasks.
-srl
Instead of outputing tags for all tasks, SENNA will output tags for the specified (one or more) tasks.
-psg
Instead of outputing tags for all tasks, SENNA will output tags for the specified (one or more) tasks.
- Returns
senna tagged output
- Return type
str
-
get_dependency
(parse)[source]¶ Change to the Stanford parser direction and process the works
- Parameters
parse (str) – parse is the input(tree format) and it is writen in as file
- Returns
stanford dependency universal format
- Return type
str
-
get_senna_bin
(os_name)[source]¶ get the current os executable binary file.
- Parameters
os_name (str) – os name like Linux, Darwin, Windows
- Returns
the corresponding exceutable object file of senna
- Return type
str
-
get_senna_tag
(input_data)[source]¶ Communicates with senna through lower level communiction(sub process) and converted the console output(default is file writing)
:param str/list input_data : list of sentences for batch processes :return: senna tagged output :rtype: str
-
get_senna_tag_batch
(sentences)[source]¶ Communicates with senna through lower level communiction(sub process) and converted the console output(default is file writing). On batch processing each end is add with new line.
- Parameters
sentences (list) – list of sentences for batch processes
- Return type
str
-
classmethod
help_conll_format
()[source]¶ With the help of this method, detail of senna arguments are displayed
-
property
jar_cli
¶ The return cli for standford-parser.jar(this is python @property)
- Return type
string
-
print_values
()[source]¶ displays the current set of values such as SENNA location, stanford parser jar, jar command interface
-
save
(end_point)[source]¶ Save is wrapper function build on the top of :Class:~snowbase.end_point.EntryPoint.
-
property
senna_dir
¶ The return the path of senna location and set the path for senna at run time(this is python @property)
- Return type
string
-
property
stp_dir
¶ The return the path of stanford parser jar location and set the path for Dependency Parse at run time( this is python @property)
-
Environment¶
This page describe the setting(environment) using in this project through environment variable.
By mapping the environment variable with value settings can be changed.
General¶
The follwing table is mapped for General settings.
Name |
Value |
Description |
---|---|---|
DEBUG |
true |
Set the app in debug mode. |
DEFAULT_LEN |
400 |
Global value for which help in setting database VARCHAR field below field init are set as same. |
Other Fields |
DEFAULT_LEN |
POS_LEN, NER_LEN, DEP_LEN, SRL_LEN CHUNK_LEN, VERB_LEN. |
DB_CLASS |
Package |
Setting usage of which entity to be in store and lookup. |
DataBase Environment¶
The Database environment will be use in activing specific property in database.
Name |
Value |
Description |
---|---|---|
TABLENAME |
package_items |
Table will be created under the given name and all the value stored under this table. |
DATABASE_ECHO |
DEBUG |
|
DATABASE_URL |
db_url |
Set url which is support by DB-driver |
Hash Environment¶
Setting environment for hash property for generation and setting limit length.
Name |
Value |
Description |
---|---|---|
HASH_VALUE_LEN |
10 |
Setting VARCHAR limit to database. |
HASH_CLASS |
hashlib.md5 |
Standard python hash class will be used as default hash generator. By this approch user has the power to 3rd party libery which they like with no problem(note: it should be compartable with standard hash libery of python.[Eg: xxhash] |
Issues¶
You cannot give sentence with ‘(’ ‘)’, that is left bracket aor right bracket. It will end up in returning no result. So please clean Sentences before sending to annotator.
Other issue might be senna executable built for various platforms. I have not experienced it, but its highly probable. If you get this issuse:
Go to folder senna location
cd senna
gcc -O3 -o senna-linux64 *.c (For linux 64 bit)
gcc -O3 -o senna-linux32 *.c (For linux 32 bit)
gcc -O3 -o senna-senna-osx *.c (For Mac)
*windows: I never compiled C files in Windows.*
python setup.py install
Any other, you can mail to Jawahar273@gmail.com
Issues with “pip install pntl”
You might receive following Error while running:
Traceback (most recent call last):
File "test.py", line 3, in <module>
print a.getAnnotations("This is a test.")
File "/usr/local/lib/python3.5/dist-packages/pntl/tools.py", line 206, in getAnnotations
senna_tags=self.getSennaTag(sentence)
File "/usr/local/lib/python3.5/dist-packages/pntl/tools.py", line 88, in getSennaTag
p = subprocess.Popen(senna_executable,stdout=subprocess.PIPE, stdin=subprocess.PIPE)
File "/usr/lib/python3.5/subprocess.py", line 679, in __init__
errread, errwrite)
File "/usr/lib/python3.5/subprocess.py", line 1249, in _execute_child
raise child_exception
OSError: [Errno 13] Permission denied
shell chmod -R +x /usr/local/lib/python3.5/dist-packages/pntl/
Stanford Parser¶
Introduction¶
If the stanford parser can not be copy into installing location. So for quick working on Dependence parser the some possible.
Assigning the location of
stanford-parser.jar
.annotator=Annotator() annotator.stp_dir="location_pls"#with respect to your os environment or follow the below steps
Placing the
stanford-parser.jar
explicitly inside the installing in linux on python 3.x usual location is/usr/local/lib/python3.5/dist-packages/pntl
if you are using Anaconda distribution python the possible location is
/home/<user_name>/anaconda3/lib/python3.5/site-packages/pntl/
(withoutvirtual environment
)- For Windows if you have install python in
C:
the go to the pathc:\users\<user_name_here>\local\programs\python3<just_version_number_here>\lib\site-packages\pntl
or for Anaconda Python distributionC:\anaconda3\Lib\site-packages\pntl
c:\users\jawahar\local\programs\python35\lib\site-packages\pntl
Note:- In Anaconda distribution it has its own version number so please change if you have to and change the Python version according which is present in your system. For windows there is no need to
.
(dot) in between version number of Python.
Note
Don’t forget to make sudo python setup install
or admin terminal python setup install
CHANGELOG¶
0.3.3¶
Bug fixed in saving result in elastic server.
0.3.0¶
Storing processed sentence into database.
0.2.1¶
Adding CHANGELOG in package.
Correction in README
0.2.0 4-alpha¶
Marking standard tools for pntl.
0.1.1 (2017-09-17)¶
Planing to release on PyPI.
Credits¶
Development Lead¶
Jawahar S <jawahar273@gmail.com>
Contributors¶
None yet. Why not be the first?
Contributing¶
Contributions are welcome, and they are greatly appreciated! Every little bit helps, and credit will always be given.
You can contribute in many ways:
Types of Contributions¶
Report Bugs¶
Report bugs at https://github.com/jawahar273/practNLPTools-lite/issues.
If you are reporting a bug, please include:
Your operating system name and version.
Any details about your local setup that might be helpful in troubleshooting.
Detailed steps to reproduce the bug.
Fix Bugs¶
Look through the GitHub issues for bugs. Anything tagged with “bug” and “help wanted” is open to whoever wants to implement it.
Implement Features¶
Look through the GitHub issues for features. Anything tagged with “enhancement” and “help wanted” is open to whoever wants to implement it.
Write Documentation¶
practNLPTools-lite could always use more documentation, whether as part of the official practNLPTools-lite docs, in docstrings, or even on the web in blog posts, articles, and such.
Submit Feedback¶
The best way to send feedback is to file an issue at https://github.com/jawahar273/practNLPTools-lite/issues.
If you are proposing a feature:
Explain in detail how it would work.
Keep the scope as narrow as possible, to make it easier to implement.
Remember that this is a volunteer-driven project, and that contributions are welcome :)
Get Started!¶
Ready to contribute? Here’s how to set up practNLPTools-lite for local development.
Fork the practNLPTools-lite repo on GitHub.
Clone your fork locally:
$ git clone git@github.com:your_name_here/practNLPTools-lite.git
Install your local copy into a virtualenv. Assuming you have virtualenvwrapper installed, this is how you set up your fork for local development:
$ mkvirtualenv practNLPTools-lite $ cd practNLPTools-lite/ $ python setup.py develop
Create a branch for local development:
$ git checkout -b name-of-your-bugfix-or-feature
Now you can make your changes locally.
When you’re done making changes, check that your changes pass flake8 and the tests, including testing other Python versions with tox:
$ flake8 practNLPTools-lite tests $ python setup.py test or py.test $ tox
To get flake8 and tox, just pip install them into your virtualenv.
Commit your changes and push your branch to GitHub:
$ git add . $ git commit -m "Your detailed description of your changes." $ git push origin name-of-your-bugfix-or-feature
Submit a pull request through the GitHub website.
Pull Request Guidelines¶
Before you submit a pull request, check that it meets these guidelines:
The pull request should include tests.
If the pull request adds functionality, the docs should be updated. Put your new functionality into a function with a docstring, and add the feature to the list in README.rst.
The pull request should work for Python 2.6, 2.7, 3.3, 3.4 and 3.5, and for PyPy. Check https://travis-ci.org/jawahar273/practNLPTools-lite/pull_requests and make sure that the tests pass for all supported Python versions.