Skip to content

The embedded library

From version 0.7.0, the pak source repo and the pak source package includes all of the dependent packages in src/library. They are installed during R CMD INSTALL. in the compilation step. It is started from Makevars (generaged from Makevars.in), and it is implemented via the src/install-embedded.R script.

install-embedded.R istalls all embedded dependencies in the correct order, into /library in the package we are building.

Advantages

  • Much simpler installation. No need to download anything. Hopefully it works fine on CRAN.
  • It is now possible to write real tests for pak, that can also run on CRAN, if we want.
  • A pak version always has the same dependencies. In the past the versions of the embedded packages could be different.
  • Better development. No need to manually install dependencies before calling load_all().
  • We can install dev pak from GitHub or other sources now seemlessly, using pak itself (in another library), or remotes, or from a local git clone, etc.

Disadvantages

  • Starting another installation during the installation is slightly tricky. We need to unset a number of environment variables (see them in Makevars.in). Hopefully this won’t be too fragile.
  • The source package is bigger. Still not very big.
  • pak always needs a compiler now when it is being installed from source.

Updating embedded packages

The embed.R file has some tools for * listing the versions of the embedded packages, * listing the versions available on CRAN, * checking if we need to update a dependency, * updating a dependency, etc.

pkgload::load_all()
embed$check_update()
embed$update()
...

pkgload::load_all()

We do not embed dependencies in this case, but we create a separate library in the user cache directory. This happens from .onLoad(). .onLoad() regcognizes load_all() and calls install-embedded.R again, with a special argument, to install dependencies into the user cache library. We use hashing to decide if a dependency should be updated in the private library, so updates work for dependencies on GitHub.

Embedding dev packages from GitHub

Is now supported.

Cross compilation

pak supports cross-compilation now. You need to pass --host=<platform> to ./configure via the --configure-args argument of R CMD INSTALL to do cross-compilation. You also need to set up the compilers, e.g. via ~/.R/Makevars.

Currently you need to install all of pak’s dependencies (ideally the vendored ones) before cross-compiling. This is because the dependencies have to be able to load their dependencies during cross compilation. This is not great, and I am planning to improve it in the near future. In particular, PPM cannot build pak binaries now for macOS because of this, because the PPM build system only installs the declared dependencies before cross compilation, but of course it does not know about the embedded dependencies.

install-embedded.R handles cross-compoilation specially. (It has to.) It installs each dependency into its own library, and then moves them into the common library at the end. This is so that they can load their dependencies from the native library during their installation.

install-embedded.R updates the Built fields in the metadata of the embedded packages, and also for the pak package itself, after compilation.

Cross-compilation is currently used

  • to build the macOS arm64 binaries on x86_64 macOS (see tools/build/macos),
  • to build static aarch64 Linux binaries on x86_64 Linux (see tools/build/Linux/Dockerfile-aarch64 and other files there),
  • on PPM, to cross compile (x86_64 and arm64) macOS binaries on x86_64 Linux. (This does not work currently.)

Our repository of pak builds

R versions

We usually support at least one more R version than the tidyverse, so the last six release and R-devel. This is to give people time to update their GitHub actions.

Platforms

Currently:

  • Windows x86_64.
  • macOS x86_64.
  • macOS arm64 (cross compiled on x86_64 macOS).
  • Linux x86_64 static.
  • Linux aarch64 static (cross compiled on x86_64 Linux).

Hopefully soon:

  • Windows arm64. (The challenge is that we either need to cross-compile or have a self-hosted runner to build packages.)

Streams

Stable

Latest CRAN release, with a version number with three components.

RC

Release condidate, typically shortly before a release, with a version number with four or more components, the fourth one is 9999.

At other times it is the same as the stable stream.

Devel

Build daily from the main branch of https://github.com/r-lib/pak. It has a version number with four or more components, the fourth one is 9000 or larger, but it is not 9999 (as that would be an RC version).

Update procedure

  1. Package binaries are built on GitHub Actions, in the nightly.yaml workflow.
  2. They are pushed to GitHub Packages, to the pak package, in the r-lib/pak repo: https://github.com/r-lib/pak/pkgs/container/pak.
  3. For each successful push, the manifest file on the packages branch is updated: https://github.com/r-lib/pak/tree/packages.
  4. They are deployed to GitHub Packages, to this repo: https://github.com/r-lib/r-lib.github.io.

The nightly.yaml workflow

Can be also run manually, on a subset of all platforms, if needed. Othewise it runs daily: https://github.com/r-lib/pak/actions/workflows/nightly.yaml.

GitHub Packages

All builds (via Makefile) upload the built packages to GitHub Packages: https://github.com/r-lib/pak/pkgs/container/pak. They also update manifest.json at https://github.com/r-lib/pak/tree/packages. manifest.json is the ground truth for the repository.

GitHub Pages

The challenge is that the site is very close to the 1GB limit, sometimes it is slightly over, which still works, but we won’t be able to add new platforms to it.

An alternative would be to keep the package in a repository, and use https://raw.githubusercontent.com. We would need to create a repo branch, and have https://raw.githubusercontent.com/r-lib/pak/repo as the repo URL. So the Windows and macOS binaries would go into

https://raw.githubusercontent.com/r-lib/pak/repo/bin/windows/...
https://raw.githubusercontent.com/r-lib/pak/repo/bin/macosx/...

etc.

Challenges

Concurrency

Dealing with concurrency on GHA is a challenge. GitHub Packages requires that we push all architectures (=pak builds) at the same time. (The package files do not need to be present of the client if they stay the same.) So we treat manifest.json as the ground truth, and we update it for the pak build(s) that we are about to push.

After pushing to GitHub Packages, we update manifest.json and push that to GitHub. If the push is not clean, then we need to do the whole update process again. With many concurrent builds finishing around the same time, this happens a lot, so in the end we do a lot of pushes to GitHub Packages.

An alternative would be to store each pak build in its own package at GitHub Packages, instead of one multi-arch package. But it is also nice to have a single package, instead of about a hundred one. (One for each platform for each R version, for each stream.)

Cleanup

Old packages are not cleaned up on GitHub Packages, so ideally we would need to clean them up regularly, to avoid using too much space. E.g. we could clean up packages that are older than (say) a week, and that are not referenced from the manifest.json file. But we don’t do that currently.