Annoator API¶
System specific binary should be rebuilt. Otherwise this could introduce misalignment errors and for Dependency Parser the requirement is Java Runtime Environment :)
-
class
pntl.tools.
Annotator
(senna_dir='', stp_dir='', dep_model='edu.stanford.nlp.trees.EnglishGrammaticalStructure', raise_e=False, save_all=False, env=False, env_path='')[source]¶ :Class:~pntl.Annotator is a class which holds the nessary function.
-
check_stp_jar
(path, raise_e=False, _rec=True)[source]¶ Check the stanford parser is present in the given directions and nested searching will be added in futurwork
- Parameters
path (str) – path of where the stanford parser is present
raise_e (bool) – to raise exception with user wise and default False don’t raises exception
- Returns
given path if it is valid one or return boolean False or if raise FileNotFoundError on raise_exp=True
- Return type
bool
-
get_annoations
(sentence='', senna_tags=None, dep_parse=True)[source]¶ passing the string to senna and performing aboue given nlp process and the returning them in a form of dict()
- Parameters
or list sentence (str) – a sentence or list of sentence for nlp process.
or list senna_tags (str) – this values are by SENNA processed string
batch (bool) – the change the mode into batch processing process
dep_parse (bool) – to tell the code and user need to communicate with stanford parser
- Returns
the dict() of every out in the process such as ner, dep_parse, srl, verbs etc.
- Return type
dict
-
get_batch_annotations
(sentences, dep_parse=True)[source]¶ - Parameters
sentences (list) – list of sentences
- Return type
list
-
get_conll_format
(sentence, options='-srl -pos -ner -chk -psg')[source]¶ Communicates with senna through lower level communiction (sub process) and converted the console output(default is file writing) with CoNLL format and argument to be in options pass
- Parameters
or list (str) – list of sentences for batch processes
list (options) – list of arguments
options
desc
-verbose
Display model informations (on the standard error output, so it does not mess up the tag outputs).
-notokentags
Do not output tokens (first output column).
-offsettags
Output start/end character offset (in the sentence), for each token.
-iobtags
Output IOB tags instead of IOBES.
-brackettags
Output ‘bracket’ tags instead of IOBES.
-path
Specify the path to the SENNA data and hash directories, if you do not run SENNA in its original directory. The path must end by “/”.
-usrtokens
Use user’s tokens (space separated) instead of SENNA tokenizer.
-posvbs
Use verbs outputed by the POS tagger instead of SRL style verbs for SRL task. You might want to use this, as the SRL training task ignore some verbs (many “be” and “have”) which might be not what you want.
-usrvbs
Use user’s verbs (given in ) instead of SENNA verbs for SRL task. The file must contain one line per token, with an empty line between each sentence. A line which is not a “-” corresponds to a verb.
-pos
Instead of outputing tags for all tasks, SENNA will output tags for the specified (one or more) tasks.
-chk
Instead of outputing tags for all tasks, SENNA will output tags for the specified (one or more) tasks.
-ner
Instead of outputing tags for all tasks, SENNA will output tags for the specified (one or more) tasks.
-srl
Instead of outputing tags for all tasks, SENNA will output tags for the specified (one or more) tasks.
-psg
Instead of outputing tags for all tasks, SENNA will output tags for the specified (one or more) tasks.
- Returns
senna tagged output
- Return type
str
-
get_dependency
(parse)[source]¶ Change to the Stanford parser direction and process the works
- Parameters
parse (str) – parse is the input(tree format) and it is writen in as file
- Returns
stanford dependency universal format
- Return type
str
-
get_senna_bin
(os_name)[source]¶ get the current os executable binary file.
- Parameters
os_name (str) – os name like Linux, Darwin, Windows
- Returns
the corresponding exceutable object file of senna
- Return type
str
-
get_senna_tag
(input_data)[source]¶ Communicates with senna through lower level communiction(sub process) and converted the console output(default is file writing)
:param str/list input_data : list of sentences for batch processes :return: senna tagged output :rtype: str
-
get_senna_tag_batch
(sentences)[source]¶ Communicates with senna through lower level communiction(sub process) and converted the console output(default is file writing). On batch processing each end is add with new line.
- Parameters
sentences (list) – list of sentences for batch processes
- Return type
str
-
classmethod
help_conll_format
()[source]¶ With the help of this method, detail of senna arguments are displayed
-
property
jar_cli
¶ The return cli for standford-parser.jar(this is python @property)
- Return type
string
-
print_values
()[source]¶ displays the current set of values such as SENNA location, stanford parser jar, jar command interface
-
save
(end_point)[source]¶ Save is wrapper function build on the top of :Class:~snowbase.end_point.EntryPoint.
-
property
senna_dir
¶ The return the path of senna location and set the path for senna at run time(this is python @property)
- Return type
string
-
property
stp_dir
¶ The return the path of stanford parser jar location and set the path for Dependency Parse at run time( this is python @property)
-