PAUSE finds package statements in data

For each uploaded distribution, PAUSE tries to identify the namespaces that the modules use so it can add them to its index files. It does this without running any code, since anyone can upload anything, and sometimes that fails in interesting ways.

This week, Karen Etheridge (ether) uploaded Package-Variant-1.001004 and was surprised that she got mail from PAUSE that said it could not index the package string. She hadn’t created any such package. What was going on?

  Distribution file: Package-Variant-1.001004.tar.gz
  Number of files: 12
  *.pm files: 1
  README: Package-Variant-1.001004/README
  META-File: Package-Variant-1.001004/META.json
  META-Parser: Parse::CPAN::Meta 1.4404
  META-driven index: no
  Timestamp of file: Sat May  4 16:43:34 2013 UTC
  Time of this run: Sun May  5 05:06:11 2013 UTC

Status of this distro: Permission missing
=========================================

The following packages (grouped by status) have been found in the distro:

Status: Permission missing
          ==========================

     module: string
          version: 1.001004
          in file: Package-Variant-1.001004/lib/Package/Variant.pm
          status: Not indexed because permission missing. Current registered
             primary maintainer is String. Hint: you can always find the
             legitimate maintainer(s) on PAUSE under "View Permissions".

Status: Successfully indexed
          ============================

     module: Package::Variant
          version: 1.001004
          in file: Package-Variant-1.001004/lib/Package/Variant.pm
          status: indexed

__END__

I know that PAUSE, through PAUSE::pmfile, uses a regular expression in packages_per_pmfile to find what it thinks is a package statement on a single line:

        if (
            $pline =~ m{
                      (.*)
                      \bpackage\s+
                      ([\w\:\']+)
                      \s*
                      (?: $ | [\}\;] | ($version::STRICT) )
                    }x) {

In Karen’s code, she has the line
In English, that looks in the single line for package followed by whitespace (not newlines, since we have a single line), followed by a something that looks like a legal package name, followed by possible whitespace, followed by the end of line, one of } or ;, or a version number.

Karen’s problem is that PAUSE can’t tell the difference between Perl code and literal strings. She has an warning in lib/Package/Variant.pm:

croak qq{Value $arg_count in 'importing' is not a package string},

That package string} satisfies the regular expression. Why would it? It’s syntactically valid (although not particularly useful) to put a package statement at the end of a block:

BLOCK: {
   ...
   package Foo
   }

There’s no good way to solve this problem as long as PAUSE does not parse code. I’ve created various code generators, for instance, that have Perl code in strings that will make it into files:

my $module_string =<<"HERE";
package Local::Foo;

... interpolate stuff ...
HERE

open my $fh, '>', $module_file or die ...;
print $fh $module_string;

PAUSE will still catch that. If that were code instead of data, we’d expect to be able to hide it from PAUSE by spreading the package statement over two lines:

package 
    Local::Foo;

I could do that in the data string too, but people aren’t going to be thinking about hiding data from PAUSE.

What’s the solution? Karen can just ignore it. Nothing that shouldn’t have been indexed was, and nothing that should have been indexed was ignored. It’s annoying to get the mail, and it’s a rare edge case. This is the first time I’ve seen this error in all my years of PAUSE support. Sure, the regex is broken, but if we get a false negative once every twelve years and nothing wrong happens, so what?

PAUSE could also try using PPI to parse the Perl to the extent that it can, which I think is fine for this case. I already do that for my BackPAN archeology work with Module::Extract::Namespaces.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>