parser API reference

class taxadb.parser.Accession2TaxidParser(acc_file=None, chunk=500, fast=False, **kwargs)[source]

Main parser class for nucl_xxx_accession2taxid files

This class is used to parse accession2taxid files.

Parameters:
  • acc_file (str) – File to parse
  • chunk (int) – Chunk insert size. Default 500
  • fast (bool) – Directly load accession into database, do not check existence.
__init__(acc_file=None, chunk=500, fast=False, **kwargs)[source]

Base class

accession2taxid(acc2taxid=None, chunk=None)[source]

Parses the accession2taxid files

This method parses the accession2taxid file, build a dictionary,
stores it in a list and yield for insertion in the database.
{
    'accession': accession_id_from_file,
    'taxid': associated_taxonomic_id
}
Parameters:
  • acc2taxid (str) – Path to acc2taxid input file (gzipped)
  • chunk (int) – Chunk size of entries to gather before yielding. Default 500 (set at object construction)
Yields:

list – Chunk size of read entries

set_accession_file(acc_file)[source]

Set the accession file to use

Parameters:acc_file (str) – File to be set
Returns:True
Raises:SystemExit – If acc_file is None or not a file (check_file)
class taxadb.parser.TaxaDumpParser(nodes_file=None, names_file=None, **kwargs)[source]

Main parser class for ncbi taxdump files

This class is used to parse NCBI taxonomy files found in taxdump.gz archive

Parameters:
  • nodes_file (str) – Path to nodes.dmp file
  • names_file (str) – Path to names.dmp file
__init__(nodes_file=None, names_file=None, **kwargs)[source]
set_names_file(names_file)[source]

Set names_file

Set the accession file to use

Parameters:names_file (str) – Nodes file to be set
Returns:True
Raises:SystemExit – If names_file is None or not a file (check_file)
set_nodes_file(nodes_file)[source]

Set nodes_file

Set the accession file to use

Parameters:nodes_file (str) – Nodes file to be set
Returns:True
Raises:SystemExit – If nodes_file is None or not a file (check_file)
taxdump(nodes_file=None, names_file=None)[source]

Parse .dmp files

Parse nodes.dmp and names.dmp files (from taxdump.tgz) and insert
taxons in Taxa table.
Parameters:
  • nodes_file (str) – Path to nodes.dmp file
  • names_file (str) – Path to names.dmp file
Returns:

Zipped data from both files

Return type:

list

class taxadb.parser.TaxaParser(verbose=False)[source]

Base parser class for taxonomic files

__init__(verbose=False)[source]

Base class

__weakref__

list of weak references to the object (if defined)

static cache_taxids()[source]

Load data from taxa table into a dictionary

Returns:Data from taxa table mapped as dictionary
Return type:data (dict)
static check_file(element)[source]

Make some check on a file

This method is used to check an element is a real file.

Parameters:

element (type) – File to check

Returns:

True

Raises:
  • SystemExit – if element file does not exist
  • SystemExit – if element is not a file