Annoator API

System specific binary should be rebuilt. Otherwise this could introduce misalignment errors and for Dependency Parser the requirement is Java Runtime Environment :)

class pntl.tools.Annotator(senna_dir='', stp_dir='', dep_model='edu.stanford.nlp.trees.EnglishGrammaticalStructure', raise_e=False, save_all=False, env=False, env_path='')[source]

:Class:~pntl.Annotator is a class which holds the nessary function.

check_stp_jar(path, raise_e=False, _rec=True)[source]

Check the stanford parser is present in the given directions and nested searching will be added in futurwork

Parameters
  • path (str) – path of where the stanford parser is present

  • raise_e (bool) – to raise exception with user wise and default False don’t raises exception

Returns

given path if it is valid one or return boolean False or if raise FileNotFoundError on raise_exp=True

Return type

bool

get_annoations(sentence='', senna_tags=None, dep_parse=True)[source]

passing the string to senna and performing aboue given nlp process and the returning them in a form of dict()

Parameters
  • or list sentence (str) – a sentence or list of sentence for nlp process.

  • or list senna_tags (str) – this values are by SENNA processed string

  • batch (bool) – the change the mode into batch processing process

  • dep_parse (bool) – to tell the code and user need to communicate with stanford parser

Returns

the dict() of every out in the process such as ner, dep_parse, srl, verbs etc.

Return type

dict

get_batch_annotations(sentences, dep_parse=True)[source]
Parameters

sentences (list) – list of sentences

Return type

list

get_conll_format(sentence, options='-srl -pos -ner -chk -psg')[source]

Communicates with senna through lower level communiction (sub process) and converted the console output(default is file writing) with CoNLL format and argument to be in options pass

Parameters
  • or list (str) – list of sentences for batch processes

  • list (options) – list of arguments

options

desc

-verbose

Display model informations (on the standard error output, so it does not mess up the tag outputs).

-notokentags

Do not output tokens (first output column).

-offsettags

Output start/end character offset (in the sentence), for each token.

-iobtags

Output IOB tags instead of IOBES.

-brackettags

Output ‘bracket’ tags instead of IOBES.

-path

Specify the path to the SENNA data and hash directories, if you do not run SENNA in its original directory. The path must end by “/”.

-usrtokens

Use user’s tokens (space separated) instead of SENNA tokenizer.

-posvbs

Use verbs outputed by the POS tagger instead of SRL style verbs for SRL task. You might want to use this, as the SRL training task ignore some verbs (many “be” and “have”) which might be not what you want.

-usrvbs

Use user’s verbs (given in ) instead of SENNA verbs for SRL task. The file must contain one line per token, with an empty line between each sentence. A line which is not a “-” corresponds to a verb.

-pos

Instead of outputing tags for all tasks, SENNA will output tags for the specified (one or more) tasks.

-chk

Instead of outputing tags for all tasks, SENNA will output tags for the specified (one or more) tasks.

-ner

Instead of outputing tags for all tasks, SENNA will output tags for the specified (one or more) tasks.

-srl

Instead of outputing tags for all tasks, SENNA will output tags for the specified (one or more) tasks.

-psg

Instead of outputing tags for all tasks, SENNA will output tags for the specified (one or more) tasks.

Returns

senna tagged output

Return type

str

get_dependency(parse)[source]

Change to the Stanford parser direction and process the works

Parameters

parse (str) – parse is the input(tree format) and it is writen in as file

Returns

stanford dependency universal format

Return type

str

get_senna_bin(os_name)[source]

get the current os executable binary file.

Parameters

os_name (str) – os name like Linux, Darwin, Windows

Returns

the corresponding exceutable object file of senna

Return type

str

get_senna_tag(input_data)[source]

Communicates with senna through lower level communiction(sub process) and converted the console output(default is file writing)

:param str/list input_data : list of sentences for batch processes :return: senna tagged output :rtype: str

get_senna_tag_batch(sentences)[source]

Communicates with senna through lower level communiction(sub process) and converted the console output(default is file writing). On batch processing each end is add with new line.

Parameters

sentences (list) – list of sentences for batch processes

Return type

str

classmethod help_conll_format()[source]

With the help of this method, detail of senna arguments are displayed

property jar_cli

The return cli for standford-parser.jar(this is python @property)

Return type

string

print_values()[source]

displays the current set of values such as SENNA location, stanford parser jar, jar command interface

save(end_point)[source]

Save is wrapper function build on the top of :Class:~snowbase.end_point.EntryPoint.

property senna_dir

The return the path of senna location and set the path for senna at run time(this is python @property)

Return type

string

property stp_dir

The return the path of stanford parser jar location and set the path for Dependency Parse at run time( this is python @property)