Configuration

Donfig is meant to be used by other packages or scripts that need to configure their own environment and make it easy for their users to modify it too. A Donfig configuration object must be named. Typically this name matches the name of the package being configured. For all of the examples below the package name mypkg is used, but your package name should be substituted for your configuration.

Donfig configuration options can be used to control any aspect of a package that may not be suited for a keyword argument or would otherwise be difficult for a user to configure. This might be to control logging verbosity, specify cluster configuration, provide credentials for security, or any of several other options that arise in production.

Configuration is specified in one of the following ways:

  1. YAML files in specified paths (see below)
  2. Environment variables like MYPKG_DISTRIBUTED__SCHEDULER__WORK_STEALING=True
  3. Default settings within sub-libraries

This combination makes it easy to specify configuration in a variety of settings ranging from personal workstations, to IT-mandated configuration, to docker images.

Configuration Object

donfig.Config.__init__(name[, defaults, …]) x.__init__(…) initializes x; see help(type(x)) for signature

The main way to use a Donfig configuration object is to make your own in your packages __init__.py module and access it through the rest of your package by importing it. To initialize the config object:

from donfig import Config
config = Config('mypkg')

This will initialize the configuration object with information found from YAML files in the above mentioned paths and environment variables. Paths to search for YAML files can be customized as well as default options:

config = Config('mypkg', defaults={'key1': 'default_val'}, paths=['/usr/local/etc/'])

Access Configuration

donfig.Config.get(key[, default]) Get elements from global config

Once the configuration object is created, settings can be accessed using the get method. To get a sense for what the configuration is in the current system the pprint method can be used to print the current state of the configuration.

>>> from mypkg import config
>>> config.pprint()
{
  'logging': {
    'distributed': 'info',
    'bokeh': 'critical',
    'tornado': 'critical',
  }
  'admin': {
    'log-format': '%(name)s - %(levelname)s - %(message)s'
  }
}

>>> config.get('logging')
{'distributed': 'info',
 'bokeh': 'critical',
 'tornado': 'critical'}

>>> config.get('logging.bokeh')  # use `.` for nested access
'critical'

Note that the get function treats underscores and hyphens identically. For example, mypkg.config.get('num_workers') is equivalent to mypkg.config.get('num-workers').

Specify Configuration

YAML files

You can specify configuration values in YAML files like the following:

logging:
  distributed: info
  bokeh: critical
  tornado: critical

scheduler:
  work-stealing: True
  allowed-failures: 5

 admin:
   log-format: '%(name)s - %(levelname)s - %(message)s'

These files can live in any of the following locations:

  1. The ~/.config/mypkg directory in the user’s home directory
  2. The {sys.prefix}/etc/mypkg directory local to Python
  3. The root directory (specified by the MYPKG_ROOT_CONFIG environment variable or /etc/mypkg/ by default)

Donfig searches for all YAML files within each of these directories and merges them together, preferring configuration files closer to the user over system configuration files (preference follows the order in the list above). Additionally users can specify a path with the MYPKG_CONFIG environment variable, that takes precedence at the top of the list above.

The contents of these YAML files are merged together, allowing different subprojects to manage configuration files separately, but have them merge into the same global configuration (ex. dask, dask-kubernetes, dask-ml).

Note

For historical reasons we also look in the ~/.mypkg directory for config files. This is deprecated and will soon be removed.*

Environment Variables

You can also specify configuration values with environment variables like the following:

export MYPKG_DISTRIBUTED__SCHEDULER__WORK_STEALING=True
export MYPKG_DISTRIBUTED__SCHEDULER__ALLOWED_FAILURES=5

resulting in configuration values like the following:

{'distributed':
  {'scheduler':
    {'work-stealing': True,
     'allowed-failures': 5}
  }
}

Donfig searches for all environment variables that start with MYPKG_, then transforms keys by converting to lower case and changing double-underscores to nested structures.

Donfig tries to parse all values with ast.literal_eval, letting users pass numeric and boolean values (such as True in the example above) as well as lists, dictionaries, and so on with normal Python syntax.

Environment variables take precedence over configuration values found in YAML files.

Defaults

Additionally, individual subprojects may add their own default values when they are imported. These are always added with lower priority than the YAML files or environment variables mentioned above.

>>> import mypkg.config
>>> import mypkg.distributed
>>> mypkg.config.pprint()  # New values have been added
{'scheduler': ...,
 'worker': ...,
 'tls': ...}

Directly within Python

donfig.Config.set([arg]) Set configuration values within a context manager.

Additionally, you can temporarily set a configuration value using the mypkg.config.set function. This function accepts a dictionary as an input and interprets "." as nested access

>>> mypkg.config.set({'scheduler.work-stealing': True})

This function can also be used as a context manager for consistent cleanup.

with mypkg.config.set({'scheduler.work-stealing': True}):
    ...

Note that the set function treats underscores and hyphens identically. For example, mypkg.config.set({'scheduler.work-stealing': True}) is equivalent to mypkg.config.set({'scheduler.work_stealing': True}).

Updating Configuration

Manipulating configuration dictionaries

donfig.Config.merge(*dicts) Merge this configuration with multiple dictionaries.
donfig.Config.update(new[, priority]) Update the internal configuration dictionary with new.
donfig.Config.expand_environment_variables() Expand any environment variables in this configuration in-place.

As described above, configuration can come from many places, including several YAML files, environment variables, and project defaults. Each of these provides a configuration that is possibly nested like the following:

x = {'a': 0, 'c': {'d': 4}}
y = {'a': 1, 'b': 2, 'c': {'e': 5}}

Dask will merge these configurations respecting nested data structures, and respecting order.

>>> mypkg.config.pprint()
{}
>>> mypkg.config.merge(x, y)
>>> mypkg.config.pprint()
{'a': 1, 'b': 2, 'c': {'d': 4, 'e': 5}}

You can also use the update method to update the existing configuration in place with a new configuration. This can be done with priority being given to either config.

mypkg.config.update(new, priority='new')  # Give priority to new values
mypkg.config.update(new, priority='old')  # Give priority to old values

Sometimes it is useful to expand environment variables stored within a configuration. This can be done with the expand_environment_variables method:

mypkg.config.expand_environment_variables()

Refreshing Configuration

donfig.Config.collect([paths, env]) Collect configuration from paths and environment variables
donfig.Config.refresh(**kwargs) Update configuration by re-reading yaml files and env variables.

If you change your environment variables or YAML files the configuration object will not immediately see the changes. Instead, you can call refresh to go through the configuration collection process and update the default configuration.

>>> mypkg.config.pprint()
{}

>>> # make some changes to yaml files

>>> mypkg.config.refresh()
>>> mypkg.config.pprint()
{...}

This function uses donfig.Config.collect, which returns the configuration without modifying the global configuration. You might use this to determine the configuration of particular paths not yet on the config path.

>>> mypkg.config.collect(paths=[...])
{...}

Downstream Libraries

donfig.Config.ensure_file(source[, …]) Copy file to default location if it does not already exist
donfig.Config.update(new[, priority]) Update the internal configuration dictionary with new.
donfig.Config.update_defaults(new) Add a new set of defaults to the configuration

One way to structure the configuration of a series of downstream packages and one central package is to follow the model used by Dask. Dask downstream libraries often follow a standard convention to use the central Dask configuration. This section provides recommendations for integration of new downstream libraries to the mypkg example, using another fictional project, mypkg-foo, as an example.

Downstream projects can follow the following convention:

  1. Maintain default configuration in a YAML file within their source directory:

    setup.py
    mypkg_foo/__init__.py
    mypkg_foo/config.py
    mypkg_foo/core.py
    mypkg_foo/foo.yaml  # <---
    
  2. Place configuration in that file within a namespace for the project

    # mypkg_foo/foo.yaml
    
    foo:
      color: red
      admin:
        a: 1
        b: 2
    
  3. With the configuration for mypkg_foo in mypkg_foo/__init__.py (or anywhere) load the default mypkg config and update it into the global configuration:

    # mypkg_foo/config.py
    import os
    import yaml
    
    import mypkg.config
    
    fn = os.path.join(os.path.dirname(__file__), 'foo.yaml')
    
    with open(fn) as f:
        defaults = yaml.load(f)
    
    mypkg.config.update_defaults(defaults)
    
  4. Within that same module, copy the 'foo.yaml' file to the user’s configuration directory if it doesn’t already exist.

    We also comment the file to make it easier for us to change defaults in the future.

    # ... continued from above
    
    mypkg.config.ensure_file(source=fn, comment=True)
    

    The user can investigate ~/.config/mypkg/*.yaml to see all of the commented out configuration files to which they have access.

  5. Ensure that this file is run on import by including it in mypkg_foo/__init__.py if not already there.

  6. Within mypkg_foo code, use the mypkg.config.get function to access configuration values:

    # dask_foo/core.py
    
    def process(fn, color=None):
        if color is None:
            color = mypkg.config.get('foo.color')
        ...
    

Note

The config object is accessed as runtime instead of at import (in the function declaration) in case users customize the value later.

  1. You may also want to ensure that your yaml configuration files are included in your package. This can be accomplished by including the following line in your MANIFEST.in:

    recursive-include <PACKAGE_NAME> *.yaml
    

    and the following in your setup.py setup call

    from setuptools import setup
    
    setup(...,
          include_package_data=True,
          ...)
    

This process keeps configuration in a central place, but also keeps it safe within namespaces. It places config files in an easy to access location ,``~/.config/mypkg/*.yaml`` by default so that users can easily discover what they can change, but maintains the actual defaults within the source code, so that they more closely track changes in the library.

However, downstream libraries may choose alternative solutions, such as isolating their configuration within their library, rather than using the global mypkg.config system.