Author Archives: brian d foy

About brian d foy

brian is a Perl guru

What you can do on CPAN Day

CPAN Day, the anniversary of the very first upload to the archive, is a good day to spend improving CPAN a little. You don’t have to be a module author to participate.

CPAN Authors

CPAN Users

“Cuckoo” CPAN packages

“Cuckoo” packages are those that exist in a file that isn’t named after that package. Neil Bowers coined the term after the cuckoo birds, which lay their eggs in the nests of other birds (see, for instance, “Nest stealing cuckoo birds are locked in evolutionary war with their would-be victims”). He noticed that these “cuckoo” packages aren’t in Module::CoreList, the module which tracks which versions of modules came with which perls.

The Cowbird's Nest

For example, in my App::Cpan module, I include Local::Null::Logger, a fallback class that fakes the interface of Log::Log4perl. The module is in core, but I don’t intend Local::Null::Logger to ever be user visible.

This relates to my one big hate of Perl: namespaces and filenames are linked because of the way use works. I made that the subject of my 2011 Frozen Perl keynote address.

I was curious how many cuckoos I find, so I wrote a little program to extract the package names and compare them to the file name:

#!perl
use v5.14;
use strict;

use Module::Extract::Namespaces;

LINE: while( <> ) {
	chomp;
	my @namespaces = Module::Extract::Namespaces->from_file($_); 
	if( @namespaces == 1 ) {
		my $file = $namespaces[0] =~ s/::/\//g;
		$file .= '.pm';
		next LINE if /\Q$file\E\Z/;
		}
	
	my $said = 0;
	my %Seen;	
	foreach my $n ( @namespaces ) {
		my $file = $n =~ s/::/\//rg;
		$file .= '.pm';
		next if /\Q$file\E\Z/;
		next if $Seen{$n}++;
		say unless $said++;
		print "\t$n\n";
		}
	print "\n" if $said;
	}	

I got a much longer list than I expected when I ran this against my v5.18.1 installation

% find /usr/local/perls/perl-5.18.1/. -name '*.pm' | perl5.18.1 cuckoo.pl
/usr/local/perls/perl-5.18.1/./lib/5.18.1/App/Cpan.pm
	Local::Null::Logger

/usr/local/perls/perl-5.18.1/./lib/5.18.1/B/Lint/Debug.pm
	B::SPECIAL
	B::OP
	B::SVOP
	DB

/usr/local/perls/perl-5.18.1/./lib/5.18.1/Carp.pm
	DB

/usr/local/perls/perl-5.18.1/./lib/5.18.1/CGI.pm
	Fh
	MultipartBuffer
	CGITempFile

/usr/local/perls/perl-5.18.1/./lib/5.18.1/Class/Struct.pm
	Class::Struct::Tie_ISA

/usr/local/perls/perl-5.18.1/./lib/5.18.1/Compress/Zlib.pm
	Zlib::OldDeflate
	Zlib::OldInflate

/usr/local/perls/perl-5.18.1/./lib/5.18.1/CPAN/Distroprefs.pm
	CPAN::Distroprefs::Result
	CPAN::Distroprefs::Result::Error
	CPAN::Distroprefs::Result::Warning
	CPAN::Distroprefs::Result::Fatal
	CPAN::Distroprefs::Result::Success
	CPAN::Distroprefs::Iterator
	CPAN::Eval
	CPAN::Distroprefs::Pref

/usr/local/perls/perl-5.18.1/./lib/5.18.1/CPAN/HandleConfig.pm
	CPAN::Config

/usr/local/perls/perl-5.18.1/./lib/5.18.1/CPAN/Meta/Requirements.pm
	CPAN::Meta::Requirements::_Range::Exact
	CPAN::Meta::Requirements::_Range::Range

/usr/local/perls/perl-5.18.1/./lib/5.18.1/CPAN/Mirrors.pm
	CPAN::Mirrored::By

/usr/local/perls/perl-5.18.1/./lib/5.18.1/CPAN/Queue.pm
	CPAN::Queue::Item

/usr/local/perls/perl-5.18.1/./lib/5.18.1/CPAN.pm
	CPAN::Eval

/usr/local/perls/perl-5.18.1/./lib/5.18.1/CPANPLUS/Error.pm
	Log::Message::Handlers

/usr/local/perls/perl-5.18.1/./lib/5.18.1/CPANPLUS/Selfupdate.pm
	CPANPLUS::Selfupdate::Module

/usr/local/perls/perl-5.18.1/./lib/5.18.1/CPANPLUS/Shell.pm
	CPANPLUS::Shell::_Base::ReadLine

/usr/local/perls/perl-5.18.1/./lib/5.18.1/darwin-2level/B.pm
	B::OBJECT
	B::Section

/usr/local/perls/perl-5.18.1/./lib/5.18.1/darwin-2level/DB_File.pm
	DB_File::HASHINFO
	DB_File::RECNOINFO
	DB_File::BTREEINFO

/usr/local/perls/perl-5.18.1/./lib/5.18.1/darwin-2level/Encode.pm
	Encode::UTF_EBCDIC
	Encode::Internal
	Encode::utf8

/usr/local/perls/perl-5.18.1/./lib/5.18.1/darwin-2level/IO/Pipe.pm
	IO::Pipe::End

/usr/local/perls/perl-5.18.1/./lib/5.18.1/darwin-2level/IPC/Msg.pm
	IPC::Msg::stat

/usr/local/perls/perl-5.18.1/./lib/5.18.1/darwin-2level/IPC/Semaphore.pm
	IPC::Semaphore::stat

/usr/local/perls/perl-5.18.1/./lib/5.18.1/darwin-2level/IPC/SharedMem.pm
	IPC::SharedMem::stat

/usr/local/perls/perl-5.18.1/./lib/5.18.1/darwin-2level/mro.pm
	next
	maybe::next

/usr/local/perls/perl-5.18.1/./lib/5.18.1/darwin-2level/POSIX.pm
	POSIX::SigAction
	POSIX::SigSet
	POSIX::SigRt

/usr/local/perls/perl-5.18.1/./lib/5.18.1/DBM_Filter.pm
	Tie::Hash

/usr/local/perls/perl-5.18.1/./lib/5.18.1/Env.pm
	Env::Array
	Env::Array::VMS

/usr/local/perls/perl-5.18.1/./lib/5.18.1/ExtUtils/Install.pm
	ExtUtils::Install::Warn

/usr/local/perls/perl-5.18.1/./lib/5.18.1/ExtUtils/MakeMaker.pm
	main
	MY

/usr/local/perls/perl-5.18.1/./lib/5.18.1/ExtUtils/Mkbootstrap.pm
	DynaLoader

/usr/local/perls/perl-5.18.1/./lib/5.18.1/Fatal.pm
	autodie::Scope::Guard

/usr/local/perls/perl-5.18.1/./lib/5.18.1/File/Temp.pm
	File::Temp::Dir

/usr/local/perls/perl-5.18.1/./lib/5.18.1/Getopt/Long.pm
	Getopt::Long::Parser
	Getopt::Long::CallBack

/usr/local/perls/perl-5.18.1/./lib/5.18.1/HTTP/Tiny.pm
	HTTP::Tiny::Handle

/usr/local/perls/perl-5.18.1/./lib/5.18.1/IO/Compress/Base/Common.pm
	U64

/usr/local/perls/perl-5.18.1/./lib/5.18.1/JSON/PP.pm
	JSON::PP::Boolean
	JSON::PP::IncrParser

/usr/local/perls/perl-5.18.1/./lib/5.18.1/Log/Message/Simple.pm
	Log::Message::Handlers

/usr/local/perls/perl-5.18.1/./lib/5.18.1/Math/BigInt/CalcEmu.pm
	Math::BigInt

/usr/local/perls/perl-5.18.1/./lib/5.18.1/Module/Build/Base.pm
	Module::Build::PodTester

/usr/local/perls/perl-5.18.1/./lib/5.18.1/NEXT.pm
	NEXT::UNSEEN
	NEXT::DISTINCT
	NEXT::ACTUAL
	NEXT::ACTUAL::UNSEEN
	NEXT::ACTUAL::DISTINCT
	NEXT::UNSEEN::ACTUAL
	NEXT::DISTINCT::ACTUAL
	EVERY
	EVERY::LAST

/usr/local/perls/perl-5.18.1/./lib/5.18.1/Object/Accessor.pm
	Object::Accessor::Lvalue
	Object::Accessor::TIE

/usr/local/perls/perl-5.18.1/./lib/5.18.1/Pod/Html.pm
	Pod::Simple::XHTML::LocalPodLinks

/usr/local/perls/perl-5.18.1/./lib/5.18.1/Pod/InputObjects.pm
	Pod::InputSource
	Pod::Paragraph
	Pod::InteriorSequence
	Pod::ParseTree

/usr/local/perls/perl-5.18.1/./lib/5.18.1/Pod/ParseUtils.pm
	Pod::List
	Pod::Hyperlink
	Pod::Cache
	Pod::Cache::Item

/usr/local/perls/perl-5.18.1/./lib/5.18.1/sigtrap.pm
	DB

/usr/local/perls/perl-5.18.1/./lib/5.18.1/Term/ReadLine.pm
	Term::ReadLine::Stub
	Term::ReadLine::TermCap
	Term::ReadLine::Tk

/usr/local/perls/perl-5.18.1/./lib/5.18.1/Term/UI/History.pm
	Log::Message::Handlers

/usr/local/perls/perl-5.18.1/./lib/5.18.1/Test/Builder/Tester.pm
	Test::Builder::Tester::Tie

/usr/local/perls/perl-5.18.1/./lib/5.18.1/Text/Balanced.pm
	Text::Balanced::Extractor
	Text::Balanced::ErrorMsg

/usr/local/perls/perl-5.18.1/./lib/5.18.1/Tie/Array.pm
	Tie::StdArray

/usr/local/perls/perl-5.18.1/./lib/5.18.1/Tie/File.pm
	Tie::File::Cache
	Tie::File::Heap

/usr/local/perls/perl-5.18.1/./lib/5.18.1/Tie/Hash.pm
	Tie::StdHash
	Tie::ExtraHash

/usr/local/perls/perl-5.18.1/./lib/5.18.1/Tie/RefHash.pm
	Tie::RefHash::Nestable

/usr/local/perls/perl-5.18.1/./lib/5.18.1/Tie/Scalar.pm
	Tie::StdScalar

/usr/local/perls/perl-5.18.1/./lib/5.18.1/warnings.pm
	DB

/usr/local/perls/perl-5.18.1/./lib/5.18.1/XSLoader.pm
	DynaLoader

/usr/local/perls/perl-5.18.1/./lib/site_perl/5.18.1/Hook/LexWrap.pm
	Hook::LexWrap::Cleanup

/usr/local/perls/perl-5.18.1/./lib/site_perl/5.18.1/Module/Extract/Namespaces.pm
	PPI::Lexer

/usr/local/perls/perl-5.18.1/./lib/site_perl/5.18.1/PPI/XSAccessor.pm
	PPI::Document
	PPI::Document::File
	PPI::Document::Fragment
	PPI::Document::Normalized
	PPI::Element
	PPI::Exception
	PPI::Node
	PPI::Normal
	PPI::Statement
	PPI::Statement::Compound
	PPI::Statement::Data
	PPI::Statement::End
	PPI::Statement::Given
	PPI::Token

PAUSE finds package statements in data

For each uploaded distribution, PAUSE tries to identify the namespaces that the modules use so it can add them to its index files. It does this without running any code, since anyone can upload anything, and sometimes that fails in interesting ways.

This week, Karen Etheridge (ether) uploaded Package-Variant-1.001004 and was surprised that she got mail from PAUSE that said it could not index the package string. She hadn’t created any such package. What was going on?

  Distribution file: Package-Variant-1.001004.tar.gz
  Number of files: 12
  *.pm files: 1
  README: Package-Variant-1.001004/README
  META-File: Package-Variant-1.001004/META.json
  META-Parser: Parse::CPAN::Meta 1.4404
  META-driven index: no
  Timestamp of file: Sat May  4 16:43:34 2013 UTC
  Time of this run: Sun May  5 05:06:11 2013 UTC

Status of this distro: Permission missing
=========================================

The following packages (grouped by status) have been found in the distro:

Status: Permission missing
          ==========================

     module: string
          version: 1.001004
          in file: Package-Variant-1.001004/lib/Package/Variant.pm
          status: Not indexed because permission missing. Current registered
             primary maintainer is String. Hint: you can always find the
             legitimate maintainer(s) on PAUSE under "View Permissions".

Status: Successfully indexed
          ============================

     module: Package::Variant
          version: 1.001004
          in file: Package-Variant-1.001004/lib/Package/Variant.pm
          status: indexed

__END__

I know that PAUSE, through PAUSE::pmfile, uses a regular expression in packages_per_pmfile to find what it thinks is a package statement on a single line:

        if (
            $pline =~ m{
                      (.*)
                      \bpackage\s+
                      ([\w\:\']+)
                      \s*
                      (?: $ | [\}\;] | ($version::STRICT) )
                    }x) {

In Karen’s code, she has the line
In English, that looks in the single line for package followed by whitespace (not newlines, since we have a single line), followed by a something that looks like a legal package name, followed by possible whitespace, followed by the end of line, one of } or ;, or a version number.

Karen’s problem is that PAUSE can’t tell the difference between Perl code and literal strings. She has an warning in lib/Package/Variant.pm:

croak qq{Value $arg_count in 'importing' is not a package string},

That package string} satisfies the regular expression. Why would it? It’s syntactically valid (although not particularly useful) to put a package statement at the end of a block:

BLOCK: {
   ...
   package Foo
   }

There’s no good way to solve this problem as long as PAUSE does not parse code. I’ve created various code generators, for instance, that have Perl code in strings that will make it into files:

my $module_string =<<"HERE";
package Local::Foo;

... interpolate stuff ...
HERE

open my $fh, '>', $module_file or die ...;
print $fh $module_string;

PAUSE will still catch that. If that were code instead of data, we’d expect to be able to hide it from PAUSE by spreading the package statement over two lines:

package 
    Local::Foo;

I could do that in the data string too, but people aren’t going to be thinking about hiding data from PAUSE.

What’s the solution? Karen can just ignore it. Nothing that shouldn’t have been indexed was, and nothing that should have been indexed was ignored. It’s annoying to get the mail, and it’s a rare edge case. This is the first time I’ve seen this error in all my years of PAUSE support. Sure, the regex is broken, but if we get a false negative once every twelve years and nothing wrong happens, so what?

PAUSE could also try using PPI to parse the Perl to the extent that it can, which I think is fine for this case. I already do that for my BackPAN archeology work with Module::Extract::Namespaces.

Clean up your CPAN directory programmatically

There’s a semi-automated way to clean up your PAUSE author directory using the WWW:::PAUSE::CleanUpHomeDir module by Zoffix Znet. Steve Haryanto modified the example slightly to take the PAUSE user name and password from a configuration file. His cleanup-pause-homedir is in his Github account.

All of your uploads to PAUSE stay in your author directory until you delete them. All of these files also show up on the CPAN mirrors. If we never cleaned out these directories, a mirror would be need about 15 GB to store it all. CPAN started in the early 1990s, so that’s a lot of history to collect.

PAUSE has an interface to delete old distributions that you don’t need anymore. You can get rid of experimental versions and older distributions. Some authors prefer to keep the previous two older distributions as well as the most current one. The PAUSE interface actually schedules the files for deletion and gives you about three days to change your mind.

You’re only deleting the files from PAUSE, though. They are still collected in BackPAN, which is a CPAN mirror that only adds, almost never deleting.

All about MANIFEST.SKIP

The MANIFEST.SKIP file lets you specify what shouldn’t be in a distribution instead of what should be (the MANIFEST). Inside MANIFEST.SKIP, you list one Perl pattern per line to specify what to exclude, and then use that file to generate MANIFEST.

For example, you can exclude backup files and the Git directory:

\.bak$
\.git$

You update MANIFEST with the manifest target, which runs the the mkmanifest subroutine from ExtUtils::Manifest:

% perl Makefile.PL

% make manifest
/usr/bin/perl "-MExtUtils::Manifest=mkmanifest" -e mkmanifest
Added to MANIFEST: Changes
Added to MANIFEST: examples/README
Added to MANIFEST: lib/Module.pm
Added to MANIFEST: LICENSE
Added to MANIFEST: Makefile.PL
Added to MANIFEST: MANIFEST
Added to MANIFEST: MANIFEST.SKIP
Added to MANIFEST: README
Added to MANIFEST: t/load.t
Added to MANIFEST: t/pod.t
Added to MANIFEST: t/pod_coverage.t

This goes through all the files in the distribution (the current directory and all subdirectories, recursively), filters out the ones matched by the patterns in MANIFEST.SKIP, and adds the rest to MANIFEST.

If it finds lines in MANIFEST that match a pattern in MANIFEST.SKIP (perhaps because you edited the file by hand), the t

% make manifest
/usr/bin/perl "-MExtUtils::Manifest=mkmanifest" -e mkmanifest
Removed from MANIFEST: .git

There is a slight problem that it doesn’t remove from MANIFEST any files that have disappeared, such as renamed tests or data files.

The trick, however, is to get the right patterns in MANIFEST.SKIP. You could do that through trial and error, but ExtUtils::Manifest provides a way to include a default set with a special directive:

#!include_default

Right next to the ExtUtils::Manifest there’s the default MANIFEST.SKIP. Find where you have that module then look in that directory:

% perl -l ExtUtils::Manifest
/System/Library/Perl/5.16/ExtUtils/Manifest.pm

% more /System/Library/Perl/5.16/ExtUtils/MANIFEST.SKIP
# Avoid version control files.
\bRCS\b
\bCVS\b
\bSCCS\b
,v$
\B\.svn\b
\B\.git\b
\B\.gitignore\b
\b_darcs\b
\B\.cvsignore$
...

Those are just the defaults, which don’t know anything about your local setup and the special files that you might want to exclude. You can load another file. Perhaps you want to exclude the same things across all of our projects:

#!include /path/to/some/file

When you run the manifest target again, ExtUtils::Manifest replaces these directives with the contents of their files. The start and stop of the imported patterns are marked:

#!start included /System/Library/Perl/5.16/ExtUtils/MANIFEST.SKIP
# Avoid version control files.
\bRCS\b
\bCVS\b
...
# Avoid MYMETA files
^MYMETA\.
#!end included /System/Library/Perl/5.16/ExtUtils/MANIFEST.SKIP

Once included, however, these sections aren’t updated if the those files change.

When you run the dist target, all of the files listed in MANIFEST are archived in the distribution file. The output shows the Perl one-liner that does the work to copy those:

% make dist
...
perl "-MExtUtils::Manifest=manicopy,maniread" \
    -e "manicopy(maniread(),'Some-Module-1.23', 'best');"
...

Before you release that file though, you should also run the disttest target. That creates a new directory with just the files from MANIFEST, changes to that directory, and runs the tests. This ferrets out missing files that you might not miss as you work in your repository.

Unauthorized releases

On CPAN Search or MetaCPAN, you might see a module marked as UNAUTHORIZED. These services show everything in an author’s CPAN directory, but also can tell that the distribution is not indexed. PAUSE doesn’t do extra work to unauthorize a model. It just ignores it when it indexes.

These indexing failures can happen for a variety of reasons:

  • A co-maintainer uploaded a new release, but because of an oversight wasn’t granted permission on one of the modules. This often happens with distributions that have a different release manager each cycle.
  • Someone without comaintainer permissions forked the distribution and uploaded it.
  • An author makes a new release with a new namespace without realizing that namespace is taken by another author.

To fix this, the primary maintainer can add the appropriate permissions to the author and reindex the distribution.

Try an experimental release

Every time you upload a distribution to PAUSE and it is mirrored to CPAN, the CPAN Testers download the distribution to test it. You can do this with an experimental version so PAUSE won’t index it and CPAN clients won’t know it is there. You can try something, see if it works, such as a patch for an operating system you don’t have, and when you are satisfied make it a real release.

To do this, simple put an underscore in your module version:


our $VERSION = '1.23_01';

The distribution won’t show up in the index files the clients see, but it will still be on CPAN.

Hide a package name from PAUSE

When you upload a distribution, PAUSE examines it so it knows what to put into the index files that the various CPAN clients use to translate a namespace into a distribution name. PAUSE doesn’t want to run any code since anyone can upload anything, so it does a very simple minded analysis in PAUSE::pmfile. It goes through the file one line at a time. The most complicated work it does remembers if it is in POD or not. For non-POD lines, it checks this regex:

        if (
            $pline =~ m{
                      (.*)
                      \bpackage\s+
                      ([\w\:\']+)
                      \s*
                      (?: $ | [\}\;] | ($version::STRICT) )
                    }x) {

The value in $pline is a single line, so PAUSE will only find a package statement that is completely on the same line. If your package statement is not on one line, PAUSE won’t see it:

package # hide from PAUSE
   Some::Package;

There are a couple reasons you might want to do this. You might insert or replace code in a package that you don’t own, perhaps to fix it. I talk about this sort of fix in Mastering Perl, although I don’t show this PAUSE trick.

Use git to easily make third party module patches

[This post originally showed up on use.Perl. Yanick subsequently released Git::CPAN::Patch]

In the olden days, to make a patch to a module, you had to have the original, untouched file and a copy that you modified. You’d then use diff to compare the two files.

At the Pittsburgh Perl Workshop, Ricardo was asking how to do some odd thing in git. Instead of anyone answering his question, everyone asked what he was doing. It turns out he was patching someone’s module and making it a git repo while he worked. The process is really handy:

  • Download module distribution and unpack it
  • Make it a git archive with git init
  • Add the initial content to the index with git add .
  • Commit the initial content with git commit -m “* Version 1.23 from CPAN”
  • work, work, work
  • Generate your patch with git format-patch –stdout -1
  • And Bob’s your uncle

There are other ways that you can do this, and you can change around the process in git. I like that git is lightweight enough to make it actually useful for everyday work.