Take a second look at the comments for realpath() for sample functions that resolve relative URLs. It's not that nasty. RFC 3986 § 5 even gives an algorithm. Here are two more examples:
PHP Code:
function resolveURL($url, $base) {
preg_match('%^([^:/]+://[^/]+)([^?]*/)%', $base, $baseParts);
// $baseParts[1] is the scheme & host; $baseParts[2] is a '/' terminated absolute path
preg_match('%^((?:https?://[^/]+)?)(/?)(.*)%', $url, $urlParts);
if (empty($urlParts[1])) {
$urlParts[1] = $baseParts[1];
}
if (empty($urlParts[2])) {
$urlParts[2] = $baseParts[2];
}
array_shift($urlParts);
return implode('', $urlParts);
}
// or, based on parse_url()
function resolveURL($url, $base) {
$url = parse_url($url);
if (! is_array($base)) {
$base = parse_url($base);
}
foreach ($base as $name => $part) {
if (!isset($url[$name])) {
$url[$name] = $base[$name];
}
}
return "$url[scheme]://$url[host]$url[path]";
}
Note that the first ignores query strings in the base URL and treat query strings in the URL to resolve as part of the path. The second will copy over any query string that's in the base URL.
If you want to remove dot segments, remove any occurrence matching %/(\.|[^/]+/+\.\.)(/|$)% from the path segment after you've added the missing URL components.
If the HTTP extension is installed, it turns out all you need is http_build_url():
PHP Code:
function resolveURL($url, $base) {
return http_build_url($base, $url, HTTP_URL_JOIN_PATH);
}
Make sure you check for the <base> tag when setting the URL base.
PHP Code:
if (preg_match('%<base\s*href=['"]?([^'">]*)%', $url, $matches)) {
// in case the base tag doesn't have an absolute URL, resolve it
$base = resolveURL($matches[1], $url);
} else {
$base = $url;
}
// remove trailing non-directory component, if any. Not strictly necessary
$base = preg_replace('%[^/]*$%', '', $base);