Pack Installer Security Reference¶
The wikigr.packs.installer module provides the PackInstaller class for installing,
uninstalling, and updating Knowledge Packs. This document describes its security model,
configuration constants, and full Python API.
Security Model Overview¶
Pack archives can arrive from untrusted sources (public registries, third-party authors, CI artefact stores). The installer applies multiple independent defences so that a compromised or malicious archive cannot harm the host machine:
| Layer | Where | Threat mitigated |
|---|---|---|
Pack-name validation (PACK_NAME_RE) |
uninstall(), update() entry points |
Path traversal via caller-supplied name |
| HTTPS-only + private-IP rejection | _url_validation.validate_download_url() |
SSRF, plain-text interception |
| DNS-bind download | install_from_url() |
DNS rebinding (TOCTOU between validation and download) |
| Size limit + timeout | install_from_url() |
Disk exhaustion, bandwidth DoS, hang |
| SHA-256 verification | install_from_url() optional |
Tampered archive substitution |
| Archive member scan | distribution.unpackage_pack() |
Zip-slip, symlink escapes |
PEP 706 filter='data' |
distribution.unpackage_pack() |
setuid bits, device nodes, path normalisation |
| Path containment check | distribution.unpackage_pack() |
Crafted manifest.name redirecting install path |
| World-writable directory warning | PackInstaller.__init__() |
Privilege escalation via install_dir permissions |
| Source-URL HTTPS check | manifest.validate_manifest() |
Non-HTTPS source provenance recorded in manifests |
Class: PackInstaller¶
from wikigr.packs.installer import PackInstaller
from pathlib import Path
installer = PackInstaller(install_dir=Path.home() / ".wikigr" / "packs")
Class-level constants¶
| Constant | Default | Description |
|---|---|---|
PackInstaller.MAX_DOWNLOAD_BYTES |
2_147_483_648 (2 GiB) |
Maximum bytes accepted in a streaming download. Override on the class before constructing an instance to change the limit globally. |
Example — lowering the limit in resource-constrained environments:
PackInstaller.MAX_DOWNLOAD_BYTES = 512 * 1024 * 1024 # 512 MiB
installer = PackInstaller()
__init__¶
PackInstaller(install_dir: Path | None = None)
Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
install_dir |
Path \| None |
~/.wikigr/packs |
Base directory under which each pack gets its own subdirectory. |
Side effects:
After resolving install_dir, the constructor checks whether the directory is
world-writable (permission bit o+w, i.e. stat & 0o002). If it is, a
warnings.warn is emitted:
UserWarning: install_dir '/path/to/packs' is world-writable. Any local user
can replace pack files. Consider restricting permissions with:
chmod o-w /path/to/packs
This is a warning, not an error, so existing automation is not broken. Operators running shared systems should fix the directory permissions.
install_from_file¶
pack_info = installer.install_from_file(archive_path: Path) -> PackInfo
Install a pack from a local .tar.gz archive.
Parameters:
| Parameter | Type | Description |
|---|---|---|
archive_path |
Path |
Local path to the .tar.gz pack archive |
Returns: PackInfo for the newly installed pack.
Raises:
| Exception | When |
|---|---|
FileNotFoundError |
archive_path does not exist |
tarfile.TarError |
Archive is corrupt or not a valid tar file |
ValueError |
Archive contains illegal paths, symlinks, or manifest validation fails |
install_from_url¶
pack_info = installer.install_from_url(
url: str,
*,
expected_sha256: str | None = None,
max_bytes: int | None = None,
timeout: int = 30,
) -> PackInfo
Download and install a pack from an HTTPS URL.
Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
url |
str |
— | HTTPS URL of the .tar.gz pack archive |
expected_sha256 |
str \| None |
None |
Hex digest of the expected SHA-256 hash of the archive. When supplied, the downloaded bytes are hashed and compared before installation proceeds. |
max_bytes |
int \| None |
None |
Per-call byte limit. When None, PackInstaller.MAX_DOWNLOAD_BYTES applies. |
timeout |
int |
30 |
Connection and read timeout in seconds. |
Security behaviour — DNS binding:
install_from_url calls validate_download_url() to resolve the hostname and validate the
resulting IP address once. It then opens a direct TCP connection to that resolved IP
address (not to the hostname string) with the original hostname supplied in the Host:
HTTP header. This eliminates the DNS rebinding window that exists when validation and
download use separate DNS lookups.
For IPv6 addresses, the connection target is wrapped in square brackets as required by
RFC 7230 ([::1]).
Proxy support
Because connections are made directly to the pre-resolved IP, system HTTP proxies
configured via HTTPS_PROXY or HTTP_PROXY environment variables are bypassed.
If your network requires a proxy for outbound HTTPS, use install_from_file with a
separately downloaded archive instead.
SHA-256 verification:
When expected_sha256 is supplied, the installer:
- Streams the archive to a temporary file in 64 KiB chunks (up to
max_bytes). - Computes
hashlib.sha256incrementally over each chunk during the stream. - Compares the final hex digest to
expected_sha256(case-insensitive). - Raises
ValueErrorif the digests do not match — the temporary file is deleted and no pack files are written. - Calls
install_from_fileon the verified temporary file.
Memory usage
The archive is streamed to a temporary file in 64 KiB chunks; the SHA-256 digest is computed incrementally over each chunk. Memory consumption is effectively constant regardless of archive size — only the hasher state (≈ 256 bytes) and one chunk buffer (64 KiB) are held in RAM at any time.
Raises:
| Exception | When |
|---|---|
ValueError |
URL is not HTTPS, hostname resolves to a private/reserved/loopback IP, DNS resolution fails (hostname not found), download exceeds max_bytes, or SHA-256 digest does not match expected_sha256 |
http.client.HTTPException |
HTTP-level error during download |
socket.timeout |
Connection or read exceeded timeout seconds |
tarfile.TarError |
Downloaded bytes are not a valid tar archive |
Example — basic:
installer = PackInstaller()
pack_info = installer.install_from_url(
"https://registry.wikigr.com/packs/go-expert-1.2.0.tar.gz"
)
Example — with SHA-256 verification (recommended):
pack_info = installer.install_from_url(
"https://registry.wikigr.com/packs/go-expert-1.2.0.tar.gz",
expected_sha256="e3b0c44298fc1c149afb…", # from registry checksums page
)
Example — tighter limits for constrained environments:
pack_info = installer.install_from_url(
"https://cdn.example.com/packs/small-pack-1.0.0.tar.gz",
max_bytes=128 * 1024 * 1024, # 128 MiB
timeout=60,
)
uninstall¶
removed = installer.uninstall(pack_name: str) -> bool
Remove an installed pack.
Parameters:
| Parameter | Type | Description |
|---|---|---|
pack_name |
str |
Name of the pack to remove. Must match PACK_NAME_RE. |
Returns: True if the pack was found and removed; False if the pack was not installed.
Raises:
| Exception | When |
|---|---|
ValueError |
pack_name does not match ^[a-zA-Z0-9][a-zA-Z0-9_-]{0,63}$ |
The ValueError is raised before any filesystem operations, preventing path-traversal
attacks from reaching the directory tree.
update¶
pack_info = installer.update(pack_name: str, archive_path: Path) -> PackInfo
Replace an existing pack installation with a new archive, preserving the pack's accumulated evaluation results.
Parameters:
| Parameter | Type | Description |
|---|---|---|
pack_name |
str |
Name of the currently-installed pack to update. Must match PACK_NAME_RE. |
archive_path |
Path |
Local path to the new .tar.gz archive |
Returns: PackInfo for the updated pack.
Raises:
| Exception | When |
|---|---|
ValueError |
pack_name does not match PACK_NAME_RE |
FileNotFoundError |
Pack is not currently installed, or archive_path does not exist |
tarfile.TarError |
archive_path is corrupt or not a valid tar file |
ValueError |
New archive fails manifest validation |
Eval-result preservation — crash-resilient design:
If the installed pack contains an eval/results/ directory, update() backs it up to
a sibling path on the same filesystem before installing the new version:
~/.wikigr/packs/.eval-backup-<pack_name>/results/
Because the backup lives on the same filesystem as the install directory, the restore
from backup to the new eval/results/ uses shutil.move. When the destination does not
yet exist, the underlying os.rename call is atomic and survives a mid-update process kill.
If a stale backup directory already exists (from a previously interrupted update),
shutil.move falls back to a copy-then-delete sequence, which is not atomic — clean up
stale .eval-backup-<pack_name> directories before retrying if this matters for your
deployment. A temporary directory in /tmp (the previous approach) would be lost
entirely if the process was killed after the old pack was removed but before the backup
was restored.
The backup directory is cleaned up only on success. If the update fails, the backup remains and can be inspected manually.
Stale backups
If an update is interrupted and restarted, the existing backup directory is overwritten. Do not rely on the backup directory for long-term archival.
Module: wikigr.packs._url_validation¶
validate_download_url¶
from wikigr.packs._url_validation import validate_download_url
import ipaddress
resolved_ip: ipaddress.IPv4Address | ipaddress.IPv6Address | None = \
validate_download_url(url: str)
Validate a URL for safe download and return the resolved IP address.
Parameters:
| Parameter | Type | Description |
|---|---|---|
url |
str |
URL to validate |
Returns:
- An
ipaddress.IPv4Addressoripaddress.IPv6Addressobject representing the IP that the hostname resolved to, or Noneif DNS resolution fails (socket.gaierror) — the caller is responsible for treating aNonereturn as a fatal error and raising an appropriate exception.
Raises:
| Exception | When |
|---|---|
ValueError |
Scheme is not https |
ValueError |
URL has no hostname |
ValueError |
Hostname resolves to a private, reserved, or loopback IP address |
install_from_url converts a None return into a ValueError("DNS resolution failed for …") before attempting the download, so callers of install_from_url never see a silent pass-through.
Why the return value matters:
install_from_url binds its download connection directly to this resolved IP to close
the DNS rebinding window. Callers that only want validation (and do not need the IP)
can ignore the return value.
# Validation only
validate_download_url("https://registry.wikigr.com/packs/foo-1.0.tar.gz")
# Validation + IP binding
ip = validate_download_url("https://registry.wikigr.com/packs/foo-1.0.tar.gz")
if ip:
# connect directly to ip, supplying Host: header separately
...
Module: wikigr.packs.manifest¶
PACK_NAME_RE¶
from wikigr.packs.manifest import PACK_NAME_RE
import re
PACK_NAME_RE: re.Pattern = re.compile(r"^[a-zA-Z0-9][a-zA-Z0-9_-]{0,63}$")
Compiled regular expression used to validate pack names throughout the installer and manifest modules. A valid pack name:
- Starts with an alphanumeric character (
[a-zA-Z0-9]) - Continues with zero to 63 more characters from
[a-zA-Z0-9_-] - Total maximum length: 64 characters
validate_manifest¶
The validate_manifest function (see Pack Manifest Reference)
now enforces an additional rule on source_urls:
Every entry in
source_urlsmust use thehttps://scheme.
A manifest with HTTP or other non-HTTPS source URLs fails validation:
errors = validate_manifest(manifest)
# errors → ["source_urls entry 0 must use https://, got: 'http://example.com'"]
This rule applies at manifest validation time, independently of the download-time
HTTPS check in validate_download_url. It ensures that provenance metadata recorded
in a published pack is itself HTTPS, even if the distribution archive was downloaded
from a different location.
Security FAQ¶
Q: What happens if a pack archive contains a symlink?
Symlinks are rejected by unpackage_pack before any extraction begins. The check
iterates all archive members and raises ValueError on the first symlink or hard link
found. See Pack Distribution Reference for details.
Q: Can a crafted manifest.name (e.g. ../../etc) escape the install directory?
No. After extraction to a temporary directory, unpackage_pack resolves
install_dir / manifest.name and asserts it is contained within install_dir.resolve().
A name that resolves outside raises ValueError before any shutil.move occurs.
Q: Why is DNS rebinding a concern for pack downloads?
Without IP binding, two DNS lookups occur: one in validate_download_url (validation)
and one inside urllib.request.urlretrieve (download). If an attacker controls the DNS
record and returns a public IP for the first lookup and a private IP for the second, the
validation passes but the download reaches an internal service. install_from_url
eliminates this window by resolving the hostname once, validating the IP, and connecting
directly to that IP for the download.
Q: Does the SHA-256 check protect against a registry being compromised?
Yes, provided the expected_sha256 digest comes from a source independent of the
registry (e.g. a separate checksums file signed by the pack author, or pinned in your
CI configuration). If both the archive and the digest come from the same untrusted
registry, the check provides no additional protection.
Q: Does install_from_url work through a corporate HTTP proxy?
No. Because the download connects directly to the pre-resolved IP, HTTPS_PROXY/
HTTP_PROXY environment variables are ignored. Use install_from_file with an archive
downloaded via your proxy tooling instead.