datalad_next.datasets

Representations of DataLad datasets built on git/git-annex repositories

Two sets of repository abstractions are available LeanGitRepo and LeanAnnexRepo vs. LegacyGitRepo and LegacyAnnexRepo.

LeanGitRepo and LeanAnnexRepo provide a more modern, small-ish interface and represent the present standard API for low-level repository operations. They are geared towards interacting with Git and git-annex more directly, and are more suitable for generator-like implementations, promoting low response latencies, and a leaner processing footprint.

The Legacy*Repo classes provide a, now legacy, low-level API to repository operations. This functionality stems from the earliest days of DataLad and implements paradigms and behaviors that are no longer common to the rest of the DataLad API. LegacyGitRepo and LegacyAnnexRepo should no longer be used in new developments, and are not documented here.

class datalad_next.datasets.LeanAnnexRepo(*args, **kwargs)[source]

Bases: AnnexRepo

git-annex repository representation with a minimized API

This is a companion of LeanGitRepo. In the same spirit, it restricts its API to a limited set of method that extend LeanGitRepo.

class datalad_next.datasets.LeanGitRepo(*args, **kwargs)

Bases: RepoInterface

Representation of a Git repository

add_fake_dates_to_env(env=None)

Add fake dates to env.

Parameters:

env (dict, optional) -- Environment variables.

Returns:

  • A dict (copied from env), with date-related environment

  • variables for git and git-annex set.

call_git(args, files=None, expect_stderr=False, expect_fail=False, env=None, pathspec_from_file: Optional[bool] = False, read_only=False)

Call git and return standard output.

Parameters:
  • args (list of str) -- Arguments to pass to git.

  • files (list of str, optional) -- File arguments to pass to git. The advantage of passing these here rather than as part of args is that the call will be split into multiple calls to avoid exceeding the maximum command line length.

  • expect_stderr (bool, optional) -- Standard error is expected and should not be elevated above the DEBUG level.

  • expect_fail (bool, optional) -- A non-zero exit is expected and should not be elevated above the DEBUG level.

  • pathspec_from_file (bool, optional) -- Could be set to True for a git command which supports --pathspec-from-file and --pathspec-file-nul options. Then pathspecs would be passed through a temporary file.

  • read_only (bool, optional) -- By setting this to True, the caller indicates that the command does not write to the repository, which lets this function skip some operations that are necessary only for commands the modify the repository. Beware that even commands that are conceptually read-only, such as git-status and git-diff, may refresh and write the index.

Return type:

standard output (str)

Raises:

CommandError if the call exits with a non-zero status. --

call_git_items_(args, files=None, expect_stderr=False, expect_fail=False, env=None, pathspec_from_file: Optional[bool] = False, read_only=False, sep=None, keep_ends=False)

Call git, yield output lines when available. Output lines are split at line ends or sep if sep is not None.

Parameters:
  • sep (str, optional) -- Use sep as line separator. Does not create an empty last line if the input ends on sep.

  • call_git. (All other parameters match those described for) --

Returns:

  • Generator that yields stdout items, i.e. lines with the line ending or

  • separator removed.

  • Please note, this method is meant to be used to process output that is

  • meant for 'interactive' interpretation. It is not intended to return

  • stdout from a command like "git cat-file". The reason is that

  • it strips of the line endings (or separator) from the result lines,

  • unless 'keep_ends' is True. If 'keep_ends' is False, you will not know

  • which line ending was stripped (if 'separator' is None) or whether a

  • line ending (or separator) was stripped at all, because the last line

  • may not have a line ending (or separator).

  • If you want to reliably recreate the output set 'keep_ends' to True and

  • "".join() the result, or use 'GitRepo.call_git()' instead.

Raises:

CommandError if the call exits with a non-zero status. --

call_git_oneline(args, files=None, expect_stderr=False, pathspec_from_file: Optional[bool] = False, read_only=False)

Call git for a single line of output.

All other parameters match those described for call_git.

Raises:
  • CommandError if the call exits with a non-zero status. --

  • AssertionError if there is more than one line of output. --

call_git_success(args, files=None, expect_stderr=False, pathspec_from_file: Optional[bool] = False, read_only=False)

Call git and return true if the call exit code of 0.

All parameters match those described for call_git.

Return type:

bool

property cfg

Get a ConfigManager instance for this repository

Return type:

ConfigManager

for_each_ref_(fields=('objectname', 'objecttype', 'refname'), pattern=None, points_at=None, sort=None, count=None, contains=None)

Wrapper for git for-each-ref

Please see manual page git-for-each-ref(1) for a complete overview of its functionality. Only a subset of it is supported by this wrapper.

Parameters:
  • fields (iterable or str) -- Used to compose a NULL-delimited specification for for-each-ref's --format option. The default field list reflects the standard behavior of for-each-ref when the --format option is not given.

  • pattern (list or str, optional) -- If provided, report only refs that match at least one of the given patterns.

  • points_at (str, optional) -- Only list refs which points at the given object.

  • sort (list or str, optional) -- Field name(s) to sort-by. If multiple fields are given, the last one becomes the primary key. Prefix any field name with '-' to sort in descending order.

  • count (int, optional) -- Stop iteration after the given number of matches.

  • contains (str, optional) -- Only list refs which contain the specified commit.

Yields:

dict with items matching the given fields

Raises:
  • ValueError -- if no fields are given

  • RuntimeError -- if git for-each-ref returns a record where the number of properties does not match the number of fields

init(sanity_checks=True, init_options=None)

Initializes the Git repository.

Parameters:
  • create_sanity_checks (bool, optional) -- Whether to perform sanity checks during initialization if the target path already exists, such as that new repository is not created in the directory where git already tracks some files.

  • init_options (list, optional) -- Additional options to be appended to the git-init call.

is_valid()

Returns whether the underlying repository appears to be still valid

This method can be used as an instance method or a class method.