scripts

The are 4 entry points.

  • macsyfinder: which is the main scripts

  • macsydata: which allow to manage the models

  • macsyconfig: an interactive conversational utility to generate macsyfinder configuration file

  • macsyprofile: an utility dedicated to modelers which gather information about hmmer output

API reference

macsyfinder

Main entrypoint to macsyfinder

macsypy.scripts.macsyfinder._loner_warning(systems)[source]
Parameters

systems – sequence of systems

Returns

warning for loner which have less occurrences than systems occurrences in which this lone is used except if the loner is also multi system

Return type

list of string

macsypy.scripts.macsyfinder._outfile_header()[source]
Returns

The 2 firsts lines of each results file

Return type

str

macsypy.scripts.macsyfinder._search_in_ordered_replicon(hits_by_replicon, models_to_detect, config, logger)[source]
Parameters
  • hits_by_replicon

  • models_to_detect

  • config

  • logger

Returns

macsypy.scripts.macsyfinder._search_in_unordered_replicon(hits_by_replicon, models_to_detect, logger)[source]
Parameters
  • hits_by_replicon

  • models_to_detect

  • logger

Returns

macsypy.scripts.macsyfinder.get_version_message()[source]
Returns

the long description of the macsyfinder version

Return type

str

macsypy.scripts.macsyfinder.likely_systems_to_tsv(likely_systems, hit_system_tracker, sys_file)[source]

print likely systems occurrences (from unordered replicon) in a file in tabulated separeted value (tsv) format

Parameters
Returns

None

macsypy.scripts.macsyfinder.likely_systems_to_txt(likely_systems, hit_system_tracker, sys_file)[source]

print likely systems occurrences (from unordered replicon) in a file in text human readable format :param likely_systems: list of systems found :type likely_systems: list of macsypy.system.LikelySystem objects :param hit_system_tracker: a filled HitSystemTracker. :type hit_system_tracker: macsypy.system.HitSystemTracker object :param sys_file: file object :return: None

macsypy.scripts.macsyfinder.list_models(args)[source]
Parameters

args (argparse.Namespace object) – The command line argument once parsed

Returns

a string representation of all models and submodels installed.

Return type

str

macsypy.scripts.macsyfinder.loners_to_tsv(systems, sys_file)[source]

get loners from valid systems and save them on file

Parameters
  • systems (list of macsypy.system.System object) – the systems from which the loners are extract

  • sys_file (file object open in write mode) – the file where loners are saved

macsypy.scripts.macsyfinder.main(args=None, loglevel=None)[source]

main entry point to MacSyFinder do some check before to launch main_search_systems() which is the real function that perform a search

Parameters
  • args (List of string) – the arguments passed on the command line without the program name

  • loglevel (a positive int or a string among 'DEBUG', 'INFO', 'WARNING', 'ERROR', 'CRITICAL') – the output verbosity

macsypy.scripts.macsyfinder.multisystems_to_tsv(systems, sys_file)[source]

get multisystems from valid systems and save them on file

Parameters
  • systems (list of macsypy.system.System object) – the systems from which the loners are extract

  • sys_file (file object open in write mode) – the file where multisystems are saved

macsypy.scripts.macsyfinder.parse_args(args)[source]
Parameters

args (List of strings [without the program name]) – The arguments provided on the command line

Returns

The arguments parsed

Return type

aprgparse.Namespace object.

macsypy.scripts.macsyfinder.rejected_clst_to_txt(rejected_clusters, clst_file)[source]

print rejected clusters in a file

Parameters
  • rejected_clusters (list of macsypy.cluster.RejectedClusters objects) – list of clusters which does not contitute a system

  • clst_file (file object) – The file where to write down the rejected clusters

Returns

None

macsypy.scripts.macsyfinder.search_systems(config, model_registry, models_def_to_detect, logger)[source]

Do the job, this function is the orchestrator of all the macsyfinder mechanics at the end several files are produced containing the results

  • macsyfinder.conf: The set of variables used to runt this job

  • macsyfinder.systems: The list of the potential systems

  • macsyfinder.rejected_cluster: The list of all clusters and clustrs combination

    which has been rejected and the reason

  • macsyfinder.log: the copy of the standard output

Parameters
Returns

the systems and rejected clusters found

Return type

([macsypy.system.System, …], [macsypy.cluster.RejectedCluster, …])

macsypy.scripts.macsyfinder.solutions_to_tsv(solutions, hit_system_tracker, sys_file)[source]

print solution in a file in tabulated format A solution is a set of systems which represents an optimal combination of systems to maximize the score.

Parameters
Returns

None

macsypy.scripts.macsyfinder.summary_best_solution(best_solution_path, sys_file, models_fqn, replicon_names)[source]

do a summary of best_solution in best_solution_path and write it on out_path a summary compute the number of system occurrence for each model and each replicon .. code-block:: text

replicon model_fqn_1 model_fqn_2 …. rep_name_1 1 2 rep_name_2 2 0

columns are separated by character

Parameters
  • best_solution_path (str) – the path to the best_solution file in tsv format

  • sys_file – the file where to save the summary

  • models_fqn (list of string) – the fully qualified names of the models

  • replicon_names (list of string) – the name of the replicons used

macsypy.scripts.macsyfinder.systems_to_tsv(systems, hit_system_tracker, sys_file)[source]

print systems occurrences in a file in tabulated format

Parameters
Returns

None

macsypy.scripts.macsyfinder.systems_to_txt(systems, hit_system_tracker, sys_file)[source]

print systems occurrences in a file in human readable format

Parameters
Returns

None

macsypy.scripts.macsyfinder.unlikely_systems_to_txt(unlikely_systems, sys_file)[source]

print hits (from unordered replicon) which probably does not make a system occurrences in a file in human readable format

Parameters
  • unlikely_systems – list of macsypy.system.UnLikelySystem objects

  • sys_file (file object) – The file where to write down the systems occurrences

Returns

None

macsydata

This is the entrypoint to the macsydata command macsydata allow the user to manage the MacSyFinder models

macsypy.scripts.macsydata._find_all_installed_packages(models_dir=None) macsypy.registries.ModelRegistry[source]
Returns

all models installed

macsypy.scripts.macsydata._find_installed_package(pack_name, models_dir=None) Optional[macsypy.registries.ModelLocation][source]

search if a package names pack_name is already installed

Parameters

pack_name – the name of the family model to search

Returns

The model location corresponding to the pack_name

Return type

macsypy.registries.ModelLocation object

macsypy.scripts.macsydata._search_in_desc(pattern: str, remote: macsypy.package.RemoteModelIndex, packages: List[str], match_case: bool = False)[source]
Parameters
  • pattern – the substring to search packages descriptions

  • remote – the uri of the macsy-models index

  • packages – list of packages to search in

  • match_case – True if the search is case sensitive, False otherwise

Returns

macsypy.scripts.macsydata._search_in_pack_name(pattern: str, remote: macsypy.package.RemoteModelIndex, packages: List[str], match_case: bool = False) List[Tuple[str, str, Dict]][source]
Parameters
  • pattern – the substring to search packages names

  • remote – the uri of the macsy-models index

  • packages – list of packages to search in

  • match_case – True if the search is case sensitive, False otherwise

Returns

macsypy.scripts.macsydata.build_arg_parser() argparse.ArgumentParser[source]

Build argument parser.

Return type

argparse.ArgumentParser object

macsypy.scripts.macsydata.cmd_name(args: argparse.Namespace) str[source]

Return the name of the command being executed (scriptname + operation).

Example

macsydata uninstall

Parameters

args (argparse.Namespace object) – the arguments passed on the command line

Return type

str

macsypy.scripts.macsydata.do_available(args: argparse.Namespace) None[source]

List Models available on macsy-models :param args: the arguments passed on the command line :return: None

macsypy.scripts.macsydata.do_check(args: argparse.Namespace) None[source]
Parameters

args (argparse.Namespace object) – the arguments passed on the command line

Return type

None

macsypy.scripts.macsydata.do_cite(args: argparse.Namespace) None[source]

How to cite an installed model.

Parameters

args (argparse.Namespace object) – the arguments passed on the command line

Return type

None

macsypy.scripts.macsydata.do_download(args: argparse.Namespace) str[source]

Download tarball from remote models repository.

Parameters

args (argparse.Namespace object) – the arguments passed on the command line

Return type

None

macsypy.scripts.macsydata.do_freeze(args: argparse.Namespace) None[source]

display all models installed with there respective version, in requirement format.

macsypy.scripts.macsydata.do_help(args: argparse.Namespace) None[source]

Display on stdout the content of readme file if the readme file does nopt exists display a message to the user see macsypy.package.help()

Parameters

args (argparse.Namespace object) – the arguments passed on the command line (the package name)

Returns

None

Raises

ValueError – if the package name is not known.

macsypy.scripts.macsydata.do_info(args: argparse.Namespace) None[source]

Show information about installed model.

Parameters

args (argparse.Namespace object) – the arguments passed on the command line

Return type

None

macsypy.scripts.macsydata.do_install(args: argparse.Namespace) None[source]

Install new models in macsyfinder local models repository.

Parameters

args (argparse.Namespace object) – the arguments passed on the command line

Return type

None

macsypy.scripts.macsydata.do_list(args: argparse.Namespace) None[source]

List installed models.

Parameters

args (argparse.Namespace object) – the arguments passed on the command line

Return type

None

Search macsy-models for Model in a remote index. by default search in package name, if option -S is set search also in description by default the search is case insensitive except if option –match-case is set.

Parameters

args (argparse.Namespace object) – the arguments passed on the command line

Return type

None

macsypy.scripts.macsydata.do_show_definition(args: argparse.Namespace) None[source]

display on stdout the definition if only a package or sub-package is specified display all model definitions in the corresponding package or subpackage

for instance

TXSS+/bacterial T6SSii T6SSiii

display models TXSS+/bacterial/T6SSii and TXSS+/bacterial/T6SSiii

TXSS+/bacterial all or TXSS+/bacterial

display all models contains in TXSS+/bacterial subpackage

Parameters

args (argparse.Namespace object) – the arguments passed on the command line

Return type

None

macsypy.scripts.macsydata.do_uninstall(args: argparse.Namespace) None[source]

Remove models from macsyfinder local models repository.

Parameters

args (argparse.Namespace object) – the arguments passed on the command line

Return type

None

macsypy.scripts.macsydata.get_version_message()[source]
Returns

the long description of the macsyfinder version

Return type

str

macsypy.scripts.macsydata.init_logger(level='INFO', out=True)[source]
Parameters
  • level – The logger threshold could be a positive int or string among: ‘CRITICAL’, ‘ERROR’, ‘WARNING’, ‘INFO’, ‘DEBUG’

  • out – if the log message must be displayed

Returns

logger

Return type

logging.Logger instance

macsypy.scripts.macsydata.main(args=None) None[source]

Main entry point.

Parameters

args (list) – the arguments passed on the command line (before parsing)

Return type

int

macsypy.scripts.macsydata.verbosity_to_log_level(verbosity: int) int[source]

transform the number of -v option in loglevel :param int verbosity: number of -v option on the command line :return: an int corresponding to a logging level

macsyconfig

Entrypoint for macsyconfig command which generate a MacSyFinder config file

class macsypy.scripts.macsyconfig.ConfigParserWithComments(defaults=None, dict_type=<class 'dict'>, allow_no_value=False, *, delimiters=('=', ':'), comment_prefixes=('#', ';'), inline_comment_prefixes=None, strict=True, empty_lines_in_values=True, default_section='DEFAULT', interpolation=<object object>, converters=<object object>)[source]

Extend ConfigParser to allow comment in serialization

add_comment(section, option, comment, comment_nb=count(1), add_space_before=False, add_space_after=True)[source]

Write a comment in .ini-format (start line with #)

Parameters
  • section – the name of the sction

  • option (str) – the name of the option

  • comment (str) – the comment linked to this option

  • comment_nb (int) – the identifier of the comment by default an integer

  • add_space_before (bool) –

  • add_space_after (bool) –

write(file)[source]

Write an .ini-format representation of the configuration state.

Parameters

file (file) – the file object wher to write the configuration

class macsypy.scripts.macsyconfig.Theme(ERROR: str = '\x1b[1m\x1b[31m', WARN: str = '\x1b[33m', SECTION: str = '\x1b[35m', RESET: str = '\x1b[0m', RETRY: str = '\x1b[33m', QUESTION: str = '\x1b[32m', EMPHASIZE: str = '\x1b[1m', EXPLANATION: str = '\x1b[0m', DEFAULT: str = '\x1b[1m\x1b[32m')[source]

Handle color combination to hylight interactive question

__delattr__(name)

Implement delattr(self, name).

__eq__(other)

Return self==value.

__hash__()

Return hash(self).

__init__(ERROR: str = '\x1b[1m\x1b[31m', WARN: str = '\x1b[33m', SECTION: str = '\x1b[35m', RESET: str = '\x1b[0m', RETRY: str = '\x1b[33m', QUESTION: str = '\x1b[32m', EMPHASIZE: str = '\x1b[1m', EXPLANATION: str = '\x1b[0m', DEFAULT: str = '\x1b[1m\x1b[32m') None
__repr__()

Return repr(self).

__setattr__(name, value)

Implement setattr(self, name, value).

__weakref__

list of weak references to the object (if defined)

macsypy.scripts.macsyconfig.ask(question, validator, default=None, expected=None, explanation='', sequence=False, question_color=None, retry=2)[source]

ask a question on the terminal and return the user response check if the user response is allowed (right type, among allowed values, …)

Parameters
  • question (str) – The question to prompt to the user on the terminal

  • validator (a function define in this module starting by check_) – what validator to be used to check the user response

  • default – the default value

  • expected – the values allowed (can be a list of value

  • explanation (str) – some explanation about the option

  • sequence (bool) – True if the parameter accept a sequence of value (comma separated values)

  • question_color (an attribute of macsypy.scripts.macsyconfig.Theme) – the color of the question display to the user

  • retry (int) – The number of time to repeat the question if the response is rejected

Returns

the value casted in right type

macsypy.scripts.macsyconfig.check_bool(raw, default, expected, sequence=False)[source]

Check if value can be cast in str

Parameters
  • raw (str) – the value return by the user

  • default (str) – the default value for the option

  • expected – not used here to have the same signature for all check_xxx functions

Returns

value

Raises

MacsypyError – if the value cannot be cast in right type

macsypy.scripts.macsyconfig.check_choice(raw, default, expected, sequence=False)[source]

Check if value is in list of expected values

Parameters
  • raw (str) – the value return by the user

  • default (str) – the default value for the option

  • expected – the allowed vlaues for this option

Returns

value

Raises

MacsypyError – if the value cannot be cast in right type

macsypy.scripts.macsyconfig.check_dir(raw, default, expected, sequence=False)[source]

Check if value point to a directory

Parameters
  • raw (str) – the value return by the user

  • default (str) – the default value for the option

  • expected – not used here to have the same signature for all check_xxx functions

Returns

value

Raises

MacsypyError – if the value cannot be cast in right type

macsypy.scripts.macsyconfig.check_exe(raw, default, expected, sequence=False)[source]

Check if value point to an executable

Parameters
  • raw (str) – the value return by the user

  • default (str) – the default value for the option

  • expected – not used here to have the same signature for all check_xxx functions

Returns

value

Raises

MacsypyError – if the value cannot be cast in right type

macsypy.scripts.macsyconfig.check_file(raw, default, expected, sequence=False)[source]

Check if value point to a file

Parameters
  • raw (str) – the value return by the user

  • default (str) – the default value for the option

  • expected – not used here to have the same signature for all check_xxx functions

Returns

value

Raises

MacsypyError – if the value cannot be cast in right type

macsypy.scripts.macsyconfig.check_float(raw, default, expected, sequence=False)[source]

Check if value can be cast in float

Parameters
  • raw (str) – the value return by the user

  • default (float) – the default value for the option

  • expected – not used here to have the same signature for all check_xxx functions

Returns

value

Raises

MacsypyError – if the value cannot be cast in right type

macsypy.scripts.macsyconfig.check_positive_int(raw, default, expected, sequence=False)[source]

Check if value can be cast in integer >=0

Parameters
  • raw (str) – the value return by the user

  • default (int) – the default value for the option

  • expected – not used here to have the same signature for all check_xxx functions

Returns

value

Raises

MacsypyError – if the value cannot be cast in right type

macsypy.scripts.macsyconfig.check_str(raw, default, expected, sequence=False)[source]

Check if value can be cast in str

Parameters
  • raw (str) – the value return by the user

  • default (str) – the default value for the option

  • expected – not used here to have the same signature for all check_xxx functions

Returns

value

Raises

MacsypyError – if the value cannot be cast in right type

macsypy.scripts.macsyconfig.epilog(path)[source]

return the text to the user before to start the configuration

macsypy.scripts.macsyconfig.main(args=None) None[source]

The main entrypoint of the script

Parameters

args

macsypy.scripts.macsyconfig.parse_args(args)[source]

parse command line

Parameters

args (list of string) – the command line arguments

Returns

Return type

argparse.Namespace object

macsypy.scripts.macsyconfig.prolog()[source]

return the text displayed to the user when the configuration file is generated

macsypy.scripts.macsyconfig.serialize(config, path)[source]

save the configuration on file

Parameters
macsypy.scripts.macsyconfig.set_base_options(config, defaults, use_defaults=False)[source]

Options for base section

Parameters
macsypy.scripts.macsyconfig.set_general_options(config, defaults, use_defaults=False)[source]

Options for general section

Parameters
macsypy.scripts.macsyconfig.set_hmmer_options(config, defaults, use_defaults=False)[source]

Options for hmmer section

Parameters
macsypy.scripts.macsyconfig.set_path_options(config, defaults, use_defaults=False)[source]

Options for directories section

Parameters
macsypy.scripts.macsyconfig.set_score_options(config, defaults, use_defaults=False)[source]

Options for scoring section

Parameters
macsypy.scripts.macsyconfig.set_section(sec_name, options, config, defaults, use_defaults=False)[source]

iter over options of a section ask question for each option and set this option in the config

Parameters
  • sec_name (str) – the name of the section

  • options (dict) – a dictionnary with the options to set up for this section

  • config (ConfigParserWithComments object) – The config to fill in.

  • defaults (macsypy.config.MacsyDefaults object) – the macsyfinder defaults values

  • use_defaults (bool) – The user skip this section so use defaults to set in config object

Returns

macsyprofile

class macsypy.scripts.macsyprofile.HmmProfile(gene_name, gene_profile_lg, hmmer_output, cfg)[source]

Handle the HMM output files

__init__(gene_name, gene_profile_lg, hmmer_output, cfg)[source]
Parameters
  • gene (macsypy.gene.CoreGene object) – the gene corresponding to the profile search reported here

  • hmmer_output (string) – The path to the raw Hmmer output file

  • cfg (macsypy.config.Config object) – the configuration object

__weakref__

list of weak references to the object (if defined)

_build_my_db(hmm_output: str) Dict[source]

Build the keys of a dictionary object to store sequence identifiers of hits.

Parameters

hmm_output (string) – the path to the hmmsearch output to parse.

Returns

a dictionary containing a key for each sequence id of the hits

Return type

dict

_fill_my_db(macsyfinder_idx: str, db: Dict) None[source]

Fill the dictionary with information on the matched sequences

Parameters
  • macsyfinder_idx (string) – the path the macsyfinder index corresponding to the dataset

  • db (dict) – the database containing all sequence id of the hits.

_hit_start(line: str) bool[source]
Parameters

line (string) – the line to parse

Returns

True if it’s the beginning of a new hit in Hmmer raw output files. False otherwise

Return type

boolean.

_parse_hmm_body(hit_id, gene_profile_lg, seq_lg, coverage_threshold, replicon_name, position_hit, i_evalue_sel, b_grp)[source]

Parse the raw Hmmer output to extract the hits, and filter them with threshold criteria selected (“coverage_profile” and “i_evalue_select” command-line parameters)

Parameters
  • hit_id (str) – the sequence identifier

  • gene_profile_lg (int) – the length of the profile matched

  • coverage_threshold (float) – the minimal coverage of the profile to be reached in the Hmmer alignment for hit selection.

  • replicon_name (str) – the identifier of the replicon

  • position_hit (int) – the rank of the sequence matched in the input dataset file

  • i_evalue_sel (float) – the maximal i-evalue (independent evalue) for hit selection

  • b_grp (list of list of strings) – the Hmmer output lines to deal with (grouped by hit)

Paramint seq_lg

the length of the sequence

Returns

a sequence of hits

Return type

list of macsypy.report.CoreHit objects

_parse_hmm_header(h_grp) str[source]
Parameters

h_grp (sequence of string (<itertools._grouper object at 0x7ff9912e3b50>)) – the sequence of string return by groupby function representing the header of a hit

Returns

the sequence identifier from a set of lines that corresponds to a single hit

Return type

string

parse() List[macsypy.scripts.macsyprofile.LightHit][source]

parse a hmm output file and extract all hits and do some basic computation (coverage profile)

Returns

The list of extracted hits

class macsypy.scripts.macsyprofile.LightHit(gene_name: str, id: str, seq_length: int, replicon_name: str, position: int, i_eval: float, score: float, profile_coverage: float, sequence_coverage: float, begin_match: int, end_match: int)[source]

Handle hmm hits

__eq__(other)

Return self==value.

__hash__ = None
__init__(gene_name: str, id: str, seq_length: int, replicon_name: str, position: int, i_eval: float, score: float, profile_coverage: float, sequence_coverage: float, begin_match: int, end_match: int) None
__repr__()

Return repr(self).

__str__() str[source]

Return str(self).

__weakref__

list of weak references to the object (if defined)

macsypy.scripts.macsyprofile.get_gene_name(path: str, suffix: str) str[source]
Parameters
  • path (str) – The path to the hmm output to analyse

  • suffix (str) – the suffix of the hmm output file

Returns

the name of the analysed gene

Return type

str

macsypy.scripts.macsyprofile.get_profile_len(path: str) int[source]

Parse the HMM profile to extract the length and the presence of GA bit threshold

Parameters

path (str) – The path to the hmm profile used to produced the hmm search output to analyse

Returns

the length, presence of ga bit threshold

Return type

tuple(int length, bool ga_threshold)

macsypy.scripts.macsyprofile.get_version_message() str[source]
Returns

the long description of the macsyfinder version

Return type

str

macsypy.scripts.macsyprofile.header(cmd: List[str]) str[source]
Parameters

cmd – the command use dto launch this analyse

Returns

The header of the result file

macsypy.scripts.macsyprofile.init_logger(level='INFO', out=True)[source]
Parameters
  • level – The logger threshold could be a positive int or string among: ‘CRITICAL’, ‘ERROR’, ‘WARNING’, ‘INFO’, ‘DEBUG’

  • out – if the log message must be displayed

Returns

logger

Return type

logging.Logger instance

macsypy.scripts.macsyprofile.main(args=None, log_level=None) None[source]

main entry point to macsyprofile

Parameters
  • args (List of string) – the arguments passed on the command line without the program name

  • log_level (a positive int or a string among 'DEBUG', 'INFO', 'WARNING', 'ERROR', 'CRITICAL') – the output verbosity

macsypy.scripts.macsyprofile.parse_args(args: List[str]) argparse.Namespace[source]
Parameters

args (List of strings [without the program name]) – The arguments provided on the command line

Returns

The arguments parsed

Return type

aprgparse.Namespace object.

macsypy.scripts.macsyprofile.verbosity_to_log_level(verbosity: int) int[source]

transform the number of -v option in loglevel :param int verbosity: number of -v option on the command line :return: an int corresponding to a logging level