psims

Controlled Vocabulary Objects

psims uses controlled vocabularies to refer to externally controlled and organized terms to describe the entities being written about in the file formats it produces. These domain-specific vocabularies can be updated independently from the file schemas for faster update and maintenance life cycles.

The ControlledVocabulary type represents a parsed and interpreted controlled vocabulary, a collection of Entity objects.

class psims.controlled_vocabulary.controlled_vocabulary.ControlledVocabulary(terms, id=None, metadata=None, version=None, name=None, import_resolver: Optional[Callable[[str], psims.controlled_vocabulary.controlled_vocabulary.ControlledVocabulary]] = None)[source]

A Controlled Vocabulary is a collection of terms or entities with controlled meanings and semantics.

This object makes entities resolvable by name, accession number, or synonym.

This object implements the Mapping protocol.

id

Unique identifier for this collection

Type

str

metadata

A mapping of metadata describing this controlled vocabulary

Type

dict

version

A string describing the version of this controlled vocabulary. Not all vocabularies are versioned the same way, so this is value is not interpreted further automatically.

Type

str

id

An identifier for this controlled vocabulary that is unique within a particular context

Type

str

name

A human-friendly name for this controlled vocabulary

Type

str

terms

The storage for storing the primary mapping from term ID to terms

Type

dict

classmethod from_obo(handle, **kwargs)[source]

Construct a new instance from an OBO format stream.

Parameters

handle (file-like) – A file-like object over an OBO format.

Returns

Return type

ControlledVocabulary

Raises

ValueError: – When the controlled vocabulary produced contains no terms

items() a set-like object providing a view on D's items[source]
keys() a set-like object providing a view on D's keys[source]
names()[source]

A key-view over all the names in this controlled vocabulary, distinct from accessions.

Returns

Return type

collections.KeysView

query(key)[source]

Search for a term whose id or name matches key, or if it is a synonym.

This search is case-insensitive, but case-matching is preferred.

Parameters

key (str) – The key to look up.

Returns

term – The found entity, if any.

Return type

Entity

Raises

KeyError : – If there is no match to any term in this vocabulary

See also

search, __getitem__

search(query)[source]

Search for any term containing the query in its id, name, or synonyms.

This algorithm uses substring containment and may return multiple hits, and can be ambiguous when given a common or short substring. For exact string matches, use query()

Parameters

query (str) – The search query

Returns

matched – The matched terms.

Return type

list

See also

query

Caching

psims accesses controlled vocabularies from the internet to retrieve the most up-to-date version of each vocabularies. If an internet connection is unavailable, it will fall back to a vendored copy of a specific version of each controlled vocabulary bundled with psims at build time.

Additionally, an application might choose to save a copy of each required controlled vocabulary file on the file system in a specific location. This can be accomplished with the psims.controlled_vocabulary.controlled_vocabulary.obo_cache object, an instance of OBOCache type. Setting cache_path will specify the path to the directory to cache files in, and enabled to toggle whether or not the cache is used. If the cache is enabled and a copy of the controlled vocabulary is not in the cache, a new copy will be downloaded or loaded from the vendored copy if unavailable, and writes it to the cache directory for future re-use.

If a library wants to create its own separate cache directory, it can create a new instance of OBOCache and configure it separately. This custom cache instance can be passed to all XML file writing classes as the vocabulary_resolver parameter.

Note

OBOCache has two behavioral switches that interact:
  • OBOCache.enabled - When this is True, files from the cache directory will be used and new files will be added to the cache directory. Otherwise, a new copy of each CV file will be requested when accessing a vocabulary.

  • OBOCache.use_remote - When this is True, new copies of CV files will be requested over the network, falling back to packaged copy in psims only when the network request fails. Otherwise, the packaged copy will be used automatically.

class psims.controlled_vocabulary.controlled_vocabulary.OBOCache(cache_path='.obo_cache', enabled=True, resolvers=None, use_remote=True, user_agent_emulation=True)[source]

A cache for retrieved ontology sources stored on the file system, and an abstraction layer to make registered controlled vocabularies constructable from a URI even if they are not in the same format.

cache_exists

Whether the cache directory exists

Type

bool

cache_path

The path to the cache directory

Type

str

enabled

Whether the cache will be used or not

Type

bool

resolvers

A mapping from ontology URL to a function which will be called instead of opening the URL to retrieve the ControlledVocabulary object. A resolver is any callable that takes only an OBOCache instance as a single argument.

Type

dict

use_remote

Whether or not to try to access remote repositories over the network to retrieve controlled vocabularies. If not, will automatically default to either the cached copy or use the fallback value.

Type

bool

user_agent_emulation

Whether or not to try to emulate a web browser’s user agent when trying to download a controlled vocabulary.

Type

bool

fallback(uri)[source]

Obtain a stream for the vocabulary specified by uri from the packaged bundle distributed with psims.

Parameters

uri (str) – The URI to retrieve a fallback stream for.

Returns

result – Returns a backup stream, or None if no fallback exists.

Return type

file-like or None

has_custom_resolver(uri)[source]

Test if uri has a resolver function.

Parameters

uri (str) – The URI to test

Returns

Return type

bool

path_for(name, setext=False)[source]

Construct a path for a given controlled vocabulary file in the cache on the file system.

Note

If the cache directory does not exist, this will create it.

Parameters
  • name (str) – The name of the controlled vocabulary file

  • setext (bool) – Whether or not to enforce the .obo extension

Returns

path – The path in the file system cache to use for this name.

Return type

str

resolve(uri)[source]

Get an readable file-like object for the controlled vocabulary referred to by uri.

If uri has a custom resolver, by has_custom_resolver(), the custom resolver function will be called instead.

Parameters

uri (str) – The URI for the controlled vocabulary to access

Returns

fp – If uri has a custom resolver, any type may be returned, otherwise a readable file-like object in binary mode over the requested controlled vocabulary.

Return type

object

set_resolver(uri, resolver)[source]

Register a resolver callable for uri

Parameters
  • uri (str) – The URI to register the custom resolver for

  • resolver (Callable) – A resolver is any callable that takes only an OBOCache instance as a single argument.

Semantic Data

Terms in a controlled vocabulary define entities, categories, properties and relationships between them. The Entity type is how these are represented in memory.

class psims.controlled_vocabulary.entity.Entity(vocabulary=None, **attributes)[source]

Represent a term in a controlled vocabulary.

While this type implements the Mapping, it supports attribute access notation for keys.

children

Additional entities derived from this one

Type

list of Entity

data

An arbitrary attribute store representing key-value pairs

Type

dict

vocabulary

The source vocabulary. May be used for upward references

Type

ControlledVocabulary

id

The CURI-style identifier of this entity, the accession of the term.

Type

str

definition

The “def” field of a term.

Type

str

get(k[, d]) D[k] if k in D, else d.  d defaults to None.[source]
is_of_type(tp: Union[str, psims.controlled_vocabulary.entity.Entity]) bool[source]

Test if tp is an ancestor of this Entity

Parameters

tp (str) – The identifier for the entity to test

Returns

Return type

bool

items() a set-like object providing a view on D's items[source]
keys() a set-like object providing a view on D's keys[source]
parent() Union[None, psims.controlled_vocabulary.entity.Entity, List[psims.controlled_vocabulary.entity.Entity]][source]

Fetch the parent or parents of this Entity in the bound controlled vocabulary.

Returns

Return type

Entity or list of Entity

values() an object providing a view on D's values[source]