Annoator API¶

System specific binary should be rebuilt. Otherwise this could introduce misalignment errors and for Dependency Parser the requirement is Java Runtime Environment :)

class pntl.tools.Annotator(senna_dir='', stp_dir='', dep_model='edu.stanford.nlp.trees.EnglishGrammaticalStructure', raise_e=False, save_all=False, env=False, env_path='')[source]¶

:Class:~pntl.Annotator is a class which holds the nessary function.

check_stp_jar(path, raise_e=False, _rec=True)[source]¶

Check the stanford parser is present in the given directions and nested searching will be added in futurwork

Parameters

path (str) – path of where the stanford parser is present
raise_e (bool) – to raise exception with user wise and default False don’t raises exception

Returns

given path if it is valid one or return boolean False or if raise FileNotFoundError on raise_exp=True

Return type

bool

get_annoations(sentence='', senna_tags=None, dep_parse=True)[source]¶

passing the string to senna and performing aboue given nlp process and the returning them in a form of dict()

Parameters

or list sentence (str) – a sentence or list of sentence for nlp process.
or list senna_tags (str) – this values are by SENNA processed string
batch (bool) – the change the mode into batch processing process
dep_parse (bool) – to tell the code and user need to communicate with stanford parser

Returns

the dict() of every out in the process such as ner, dep_parse, srl, verbs etc.

Return type

dict

get_batch_annotations(sentences, dep_parse=True)[source]¶

Parameters: sentences (list) – list of sentences
Return type: list

get_conll_format(sentence, options='-srl -pos -ner -chk -psg')[source]¶

Communicates with senna through lower level communiction (sub process) and converted the console output(default is file writing) with CoNLL format and argument to be in options pass

Parameters

or list (str) – list of sentences for batch processes
list (options) – list of arguments

options	desc
-verbose	Display model informations (on the standard error output, so it does not mess up the tag outputs).
-notokentags	Do not output tokens (first output column).
-offsettags	Output start/end character offset (in the sentence), for each token.
-iobtags	Output IOB tags instead of IOBES.
-brackettags	Output ‘bracket’ tags instead of IOBES.
-path	Specify the path to the SENNA data and hash directories, if you do not run SENNA in its original directory. The path must end by “/”.
-usrtokens	Use user’s tokens (space separated) instead of SENNA tokenizer.
-posvbs	Use verbs outputed by the POS tagger instead of SRL style verbs for SRL task. You might want to use this, as the SRL training task ignore some verbs (many “be” and “have”) which might be not what you want.
-usrvbs	Use user’s verbs (given in ) instead of SENNA verbs for SRL task. The file must contain one line per token, with an empty line between each sentence. A line which is not a “-” corresponds to a verb.
-pos	Instead of outputing tags for all tasks, SENNA will output tags for the specified (one or more) tasks.
-chk	Instead of outputing tags for all tasks, SENNA will output tags for the specified (one or more) tasks.
-ner	Instead of outputing tags for all tasks, SENNA will output tags for the specified (one or more) tasks.
-srl	Instead of outputing tags for all tasks, SENNA will output tags for the specified (one or more) tasks.
-psg	Instead of outputing tags for all tasks, SENNA will output tags for the specified (one or more) tasks.

Returns: senna tagged output
Return type: str

get_dependency(parse)[source]¶

Change to the Stanford parser direction and process the works

Parameters: parse (str) – parse is the input(tree format) and it is writen in as file
Returns: stanford dependency universal format
Return type: str

get_senna_bin(os_name)[source]¶

get the current os executable binary file.

Parameters: os_name (str) – os name like Linux, Darwin, Windows
Returns: the corresponding exceutable object file of senna
Return type: str

get_senna_tag(input_data)[source]¶

Communicates with senna through lower level communiction(sub process) and converted the console output(default is file writing)

:param str/list input_data : list of sentences for batch processes :return: senna tagged output :rtype: str

get_senna_tag_batch(sentences)[source]¶

Communicates with senna through lower level communiction(sub process) and converted the console output(default is file writing). On batch processing each end is add with new line.

Parameters: sentences (list) – list of sentences for batch processes
Return type: str

classmethod help_conll_format()[source]¶: With the help of this method, detail of senna arguments are displayed

property jar_cli¶

The return cli for standford-parser.jar(this is python @property)

Return type: string

print_values()[source]¶: displays the current set of values such as SENNA location, stanford parser jar, jar command interface

save(end_point)[source]¶: Save is wrapper function build on the top of :Class:~snowbase.end_point.EntryPoint.

property senna_dir¶

The return the path of senna location and set the path for senna at run time(this is python @property)

Return type: string

property stp_dir¶: The return the path of stanford parser jar location and set the path for Dependency Parse at run time( this is python @property)