CRUX

Welcome to CRUX bug tracking.
Tasklist

FS#923 - pkgmk able to change the name of downloaded tarballs

Attached to Project: CRUX
Opened by Lukc (Lukc) - Wednesday, 15 May 2013, 17:15 GMT
Last edited by Tim Biermann (tb) - Saturday, 12 June 2021, 12:17 GMT
Task Type Improvement
Category tools → pkgutils
Status New
Assigned To CRUX Developers (crux)
Operating System CRUX
Severity Low
Priority Normal
Reported Version 3.0
Due in Version Undecided
Due Date Undecided
Percent Complete 0%
Votes 4
Private No

Details

Some tarballs are available only under very generic names, like “0.1.tar.gz”. This can be a problem if two or more recipes use tarballs with the same name.

A solution to that would be to set the local name of the tarballs. Here’s an example:

source=(http://foo/bar/$version.tar.gz as $name-$version.tar.gz)

In this example, the $version.tar.gz tarball is downloaded as usual, but locally stored as $name-$version.tar.gz. The content of the tarball is left unmodified, but collision with other packages is hopefully avoided.

A proposed patch is attached. It allows the “as” keyword to be used in the source array. Downloads, extractions and md5sums have been tested with it on renamed and un-renamed sources.

(that being said, if someone finds a simpler way to add support of that keyword to pkgmk, it’d be great \o/)
   as.diff (4.9 KiB)
This task depends upon

Comment by Jose V Beneyto (sepen) - Thursday, 16 May 2013, 18:48 GMT
Why not use a coma separated way?
I wrote a little patch that worked with sources like:

source=(http://foo/bar/$version.tar.gz,$name-$version.tar.gz)

or

source=(http://foo/bar/$version.tar.gz,$name-$version.tar.gz http://foo/bar-zzz/$version.tar.gz,$name-zzz-$version.tar.gz)



Comment by Lukc (Lukc) - Thursday, 16 May 2013, 21:06 GMT
Someone also suggested using "http://foo/bar/$version.tar.gz as $name-$version.tar.gz" as a single entry in the source array.

I think the coma separated way could break URLs, though I’m not sure, while spaces are forbidden from URLs. Still, you’re probably right that having each source as a single element is best. :-/
Comment by Danny Rawlins (Romster) - Wednesday, 05 February 2014, 01:43 GMT
I would prefer doing it at the start of the url like so so.

source=($name-$version.tar.gz|http://foo/bar/$version.tar.gz
bar-1.2.3.tar.gz|http://foo/bar/1.2.3.tar.gz
foo.patch)

But that breaks existing scripts that depend on source. We could do a new array like:

names=($name-$version.tar.gz bar-1.2.3.tar.gz SKIP)

Which should have minimal impact to existing scripts. SKIP is there to notify pkgmk that foo.patch does not need a rename. Alternatively we could just do:

names=($name-$version.tar.gz bar-1.2.3.tar.gz foo.patch)

I had thought about using the sever http header for returning the correct file name.

curl -sI https://github.com/williamh/dotconf/archive/v1.3.zip

But there is not always a file name header.

Another way would be to allow a directory in

PKGMK_SOURCE_DIR="/var/ports/distfiles/$name/"

Or even:

PKGMK_SOURCE_DIR="/var/ports/distfiles/$md5sum/" (requires patching pkgmk to read the sum out of .md5sum to the variable $md5sum)

I started doing a hashed file directory structure because of changed files of the same name.

http://romster.dyndns.org/distfiles/hash/
Comment by Jose V Beneyto (sepen) - Friday, 07 February 2014, 11:25 GMT
I think we are moving away from the real problem. For now there are only a few cases (most github) that could be fixed (better or worst) by using wget manually from the build function.

The fact is that the problem ATM is github release model, and IMHO we can't depend on github because tomorrow another site will appear with the same problem, etc. and pkgmk need to have a solid code.

Anyway if we should find a solution then I'm declined by the least impact on pkgmk cause. I don't know which scripts are that depends on pkgmk and can have problems (as Danny mentioned), but in any case should be adjusted following changes in pkgmk, and not vice versa.

Here is a good discussion we had about URI separator: http://irclogs.shortcircuit.net.au/%23crux-devel/2013-12-27.log.html
As commented, if we change ',' for '\|' as a separator then my patch will work fine, and that change to pkgmk is a really small and trivial compared to other solutions proposed here.
Comment by Lukc (Lukc) - Monday, 24 February 2014, 11:58 GMT
For the record, Arch is using the filename::url syntax, and gentoo the "url -> filename" syntax. Would it not be better to use the same syntax as someone else?

Also, the use of the pipe seems wrong to me (it’s a shell syntax element, after all…), but any character would do if the filename is to be specified before the URL.
Comment by Alan Mizrahi (alancio) - Friday, 05 December 2014, 02:53 GMT
How about downloading to separate directories for each port, for example: /usr/ports/dist/$name/file.tar.gz
We can implement this without the complexity of the other proposed solutions.

I have a patch[1] for pkgutils that allows this by setting this in pkgmk.conf:
PKGMK_CREATE_DIRS="yes"
PKGMK_SOURCE_DIR="/usr/ports/dist/${PWD##*/}"

PKGMK_CREATE_DIRS will create missing dist, pkg and work dirs. This has an additional benefit when using /dev/shm, etc: you don't have to create the directories every time after booting.

[root@koji ~]# cd /usr/ports/opt/dotconf
[root@koji dotconf]# sudo -H -u pkgmk fakeroot pkgmk -d
=======> WARNING: Creating directory: /usr/ports/dist/dotconf
=======> Downloading 'https://github.com/williamh/dotconf/archive/v1.3.zip'.
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 119 0 119 0 0 148 0 --:--:-- --:--:-- --:--:-- 148
100 67763 100 67763 0 0 29680 0 0:00:02 0:00:02 --:--:-- 85884
=======> Building '/usr/ports/pkgs/dotconf#1.3-1.pkg.tar.gz'.
bsdtar -p -o -C /dev/shm/work/dotconf/src -xf /usr/ports/dist/dotconf/v1.3.zip
+ build
+ cd dotconf-1.3
+ autoreconf -i
...

[1] Patch for pkgutils: httpup sync http://www.mizrahi.com.ve/crux/pkgs/#pkgutils pkgutils
Comment by li (phi) - Tuesday, 20 September 2016, 12:53 GMT
I think the way Alan proposed is good (has least impact on Pkgfiles). It's like
creating an unique namespace for each port. But it has one drawback, that ports
can't share distfiles. For example, port A and B both require source X,
If in /etc/pkgmk.conf, we have:
PKGMK_SOURCE_DIR="/usr/ports/distfiles/$name"
We would end up with two copies of source X (/usr/ports/distfiles/A/X and
/usr/ports/distfiles/B/X)
A simple way to avoid this issue is to set:
PKGMK_SOURCE_DIR="/usr/ports/distfiles/$srcdir"
Here srcdir is an unused variable, and we need to set it explicitly in Pkgfiles
for ports with odd source names. For example,
in A/Pkgfile:
source=(v1.0 ...)
srcdir="uniqname"
in B/Pkgfile:
source=(v1.0 ...)
srcdir="uniqname"
Thus the source v1.0 is shared between port A and port B.
It has one more benifit, that for normal ports, the location of source is
unchanged (/usr/ports/distfiles/).
Comment by Fredrik Rinnestam (frinnst) - Tuesday, 20 September 2016, 19:23 GMT
Can we get some clean patches maybe? We need to make sure it works with both wget and curl obviously.
Comment by li (phi) - Wednesday, 21 September 2016, 04:21 GMT
--- /usr/bin/pkgmk 2016-09-21 12:15:03.000000000 +0800
+++ pkgmk 2016-09-21 12:14:50.226072703 +0800
@@ -650,6 +650,15 @@ main() {
check_directory "$PKGMK_PACKAGE_DIR"
check_directory "`dirname $PKGMK_WORK_DIR`"

+ if [ -n "$srcdir" -a ! -d "$PKGMK_SOURCE_DIR/$srcdir" ]; then
+ mkdir "$PKGMK_SOURCE_DIR/$srcdir"
+ if [ $? -ne 0 ]; then
+ error "Can't create intermediate source direcotry."
+ exit 1
+ fi
+ fi
+ PKGMK_SOURCE_DIR="$PKGMK_SOURCE_DIR/$srcdir"
+
check_pkgfile

case $PKGMK_COMPRESSION_MODE in

====
(No need to modify pkgmk.conf)
Take dotconf as an example, just add 'srcdir=dotconf' in dotconf/Pkgfile.

Comment by li (phi) - Wednesday, 21 September 2016, 04:23 GMT
the patch.
Comment by li (phi) - Wednesday, 21 September 2016, 15:18 GMT
Besides source name collisions, another issue we might need to consider is that
for source names that have no version numbers, we need to remove the old
source files before updating them. To address this issue, I think source
renaming is still the way to go.
Comment by Fun (fun) - Tuesday, 28 February 2017, 15:59 GMT
With all these patches, the ideal solution appears to be:
a) simple - (alancio)
b) without touching ports - (alancio)
c) ports should share sources (e.g. qemu, qemu-all) - (phi)

The phi's solution is simple, but is defensive and touches port files (cover: a,c).
Saving distfiles based on MD5 (or SHA1 :D) will make the patch bigger (cover: b,c) or not (cover: a,b,c).

I say devs should choose Alan's solution and make it default. If the solution will not be set as default, I don't think it matters which one is chosen.

PS. It will still need patches to share the distfiles through a web server.
Comment by John Vogel (j_v) - Tuesday, 14 March 2017, 12:42 GMT
I think that source renaming should be enabled. Saving to separate directory sounds like it would work, but I think that change could be more invasive than it might sound. Source renaming can be done in such a way that ports that don't need it won't be affected. From what I've read here, I don't think everyone will agree on any one solution. But like it or not, many upstreams are moving to github and source tarball names are getting weird; I have come across at least a few upstream bug reports that complain about the release tarball naming conventions and the attitude of upstream is trending toward that packaging utilities that can't handle renaming are broken; my initial reaction to that was ire, but in reality, source renaming is not that difficult nor invasive. I have attached the patch that I am currently testing for source renaming.
   pkgmk-source-rename.patch (1.1 KiB)
Comment by Fun (fun) - Tuesday, 14 March 2017, 18:01 GMT
John, you are right. A separate directory is more invasive than I was thinking. I use PKGMK_SOURCE_DIR=.../$name for two weeks (see https://crux.nu/bugs/index.php?do=details&task_id=1384) and I've seen some tools, including one of mine, lol, that have problems with this change.

You are right that changing only the problematic ports will avoid some problems, but changing source=() will also break the tools that use 'source'.

Without breaking any tool (that uses PKGMK_SOURCE_DIR or 'source') I think we have another method:

source=(http://.../1.tar.gz)
local_source_prefix=xxx-

The patch for pkgmk might be bigger, but this change won't break any tool.
Comment by Fun (fun) - Tuesday, 14 March 2017, 18:26 GMT
If we would like different prefix for different remote sources (in the same port), we could use associative arrays:

remap=( ["http://.../1.tar.gz"] = "xxx-1.tar.gz" )

Comment by John Vogel (j_v) - Wednesday, 15 March 2017, 01:01 GMT
Just Fun, I see your point and think your ideas are interesting and valid. I think that changing other tools to work with the new source syntax would be more direct, simpler, and less likely to break in the future. Existing tools would still have to be trained to deal with the local_source_prefix variable or the remap associative array in order to find the saved sources and in order to download them correctly when missing.

The syntax in the patch is easy to parse, so it should be straight forward to update existing utilities to use it. I will start tracking down utilities (starting with prt-utils, I would think) and patching them to work with the source syntax in my patch. Some utilities are in need of attention anyways, I think; last time I tried prtwash or prtsweep (I can not remember which one), it raged through my ports tree removing all manner of files that should not have been removed; easily fixed and easily avoided, but fairly disturbing all the same.
Comment by Fun (fun) - Friday, 14 April 2017, 14:15 GMT
This is the patch for local_source_prefix:
   0001-pkgmk-use-local_source_p... (0.7 KiB)
Comment by Fun (fun) - Thursday, 27 April 2017, 08:44 GMT
This is a patch that helps ports to pick whatever method they like, by overriding the function:

map_remote_source() { echo "$PKGMK_SOURCE_DIR/$1"; }

This patch is the best IF devs are undecided and want every port to handle the problem (but it will pollute Pkgfile)

The alan+phi patch is the best IF devs don't want to touch the ports (but it won't work with some ports, as j_v pointed out on irc):

PKGMK_SOURCE_DIR=/var/src/${srcdir:-$name}
PKGMK_CREATE_DIRS=yes

Changing the source syntax is the best IF devs want to make things clear (but nobody wants to change, unless believing "this is the right way" :))

It might look that I'm obsessed with this bug, but I keep stumbling on it :).
   0001-pkgmk-small-refactoring-... (1 KiB)
Comment by Fun (fun) - Thursday, 18 May 2017, 11:01 GMT Comment by Alan Mizrahi (alancio) - Tuesday, 19 March 2019, 13:02 GMT
This patch allows using an associative array for the source variable in this way:

name=dotconf
version=1.3
release=1
declare -A \
source=([$name-$version.zip]=https://github.com/williamh/$name/archive/v$version.zip)

and will still work with regular arrays (eg: backward compatible with current Pkgfile's)
Comment by John McQuah (farkuhar) - Friday, 11 June 2021, 21:25 GMT
This patch is an implementation of Romster's idea to declare a second array containing either 'SKIP' or the desired new filename, for each element of the sources array. A working minimal example of a Pkgfile is quoted below (Note: savecrop.sh is a plugin that could be bundled with the port).

# Description: fast, lightweight image viewer for wayland
# URL: https://github.com/eXeC64/imv
# Maintainer:
# Depends on: meson ninja libpthread-stubs xkbcommon cairo pango icu wayland-egl
# Optional: libjpeg-turbo libpng libtiff librsvg giflib xorg-libxcb

name=imv
version=4.2.0
release=1
source=(https://github.com/eXeC64/imv/archive/refs/tags/v${version}.tar.gz savecrop.sh)
renames=($name-$version.tar.gz SKIP)

build() {
cd $name-$version
meson --buildtype=release --prefix=/usr build
ninja -C build
DESTDIR=$PKG ninja -C build install
prt-get isinst slurp && install -m 0755 $SRC/savecrop.sh $PKG/usr/bin
}
Comment by Tim Biermann (tb) - Saturday, 12 June 2021, 15:50 GMT
seems to work perfectly fine for me, i like it
Comment by John McQuah (farkuhar) - Monday, 14 June 2021, 22:39 GMT
Of the many scripts in prt-utils, I think only prtwash and prtsweep would be affected by the new renaming capability. I've attached new versions of these scripts for review. The usual precautions apply: use at your own risk, inspect the code before running it, and make backups before testing. I've tested the scripts as an unprivileged user in dry-run mode, and they seem capable of handling most of the renaming situations we might want, including the trick of circumventing automatic tarball decompression: https://dpaste.com/5QAJ6EFFC

Also new in these versions:
- simpleminded attempt at process isolation. Pkgfiles are sourced inside a nested subshell, and the needed parameters are extracted with command substitution.
- a test for a common download directory (shared among all ports) has been inserted early in the decision tree. This allows for a quick exit in the event that the system is configured to leave the ports directory relatively uncluttered.
- some trivial refactoring of the nested loops.

Ideas for future versions:
- signature verification before sourcing any Pkgfile.
- actually do something if there's a common download directory, rather than advise the admin to write a custom script.
Comment by John McQuah (farkuhar) - Wednesday, 16 June 2021, 20:59 GMT
In revising prtwash and prtsweep to handle this additional feature, I tried to stick as closely as possible to the behavior documented in the original manpages. Nevertheless, the changes are substantial enough to warrant a revised manpage for each tool as well. Attached are drafts of the two manpages, as well as the prtwash and prtsweep scripts themselves (after more extensive testing to address the bugs in my previous attachments).
Comment by Juergen Daubert (jue) - Tuesday, 21 September 2021, 09:53 GMT
Another important tool from prt-utils that needs to be adjusted is oldfiles, it's like prtwash but for centralized source/package locations.
Volunteers to do the job?
Comment by Tim Biermann (tb) - Wednesday, 22 September 2021, 10:54 GMT
@Juergen:
I'll quote Johns findings he reported on IRC yesterday:

farkuhar: "it looks like oldfiles is comparing the contents of $PKGMK_SOURCE_DIR and $PKGMK_PACKAGE_DIR with the files listed in .signature for each installed package. Any port that renames its sources will have the new name in its .signature file already, so /usr/bin/oldfiles should work without any modifications."
Comment by John McQuah (farkuhar) - Wednesday, 22 September 2021, 11:30 GMT
@Tim and Juergen:

The attachments to my last post in this Flyspray thread continued to receive incremental improvements over the subsequent weeks. I didn't follow up here once my repo was added to the list at crux.nu/portdb. If the core team approves my patch and the needed rewrites of prtsweep, prtwash (latest versions available at http://sdf.org/~jquah/cruxports/{prt-utils,pkgutils}), I'd be happy to drop those directories from my repo (so that maintenance of these core tools remains centralized).
Comment by Tim Biermann (tb) - Saturday, 02 October 2021, 08:49 GMT
@Juergen: can we merge or is anything missing?
Comment by Juergen Daubert (jue) - Saturday, 02 October 2021, 14:06 GMT
Looks nice and complete so far for me, will try to test it a bit more the next days.

Loading...