|
Date Published: 2001-02-16
During the growth curve of every Perl hacker they come to managing
complex data structures like hash of hashes and lists of lists, etc. They
usually get the hang of it with help from
perllol,
perldsc, some good
books, usenet, #perl and whatever other resources they can find. But one
subtle Perl feature seems to trip many of them up and that is the
subject of this tutorial.
Let's say you create a data structure like this:
[Code]
$HoH = {
'foo' => {
'x' => 23,
},
'bar' => {
'y' => 18,
},
} ;
|
We can print this using Data::Dumper.
[Code]
use Data::Dumper ;
print Dumper $HoH ; |
and we see:
[Output]
$VAR1 = {
'foo' => {
'x' => 23
},
'bar' => {
'y' => 18
}
};
|
which is what we expect.
But now we try to see if there is a entry for $HoH->{'baz'}{'z'} which
we know doesn't exist. And we are smart enough to test it with exists:
[Code]
print "baz->z doesn't exist\n"
unless exists $HoH->{'baz'}{'z'} ;
print Dumper $HoH ;
|
But when we look at the data structure again we see:
[Output]
$VAR1 = {
'foo' => {
'x' => 23
},
'baz' => {},
'bar' => {
'y' => 18
}
};
|
Where did that 'baz' entry come from? We never created it? Or did we?
What happened is that Perl saw that $HoH->{'baz'} was being used as a
hash reference (referring to a hash with 'z' as the key) and that
$HoH->{'baz'} was not defined (actually it doesn't exist either) so Perl
created it for you. That is called autovivification which means bringing
to life automagically!
Here is the same concept but with anonymous arrays instead of hashes:
[Code]
$LoL = [
[ 2, 4, 6 ],
[ 3, 5, 7 ],
] ;
print Dumper $LoL ;
|
[Output]
$VAR1 = [
[
2,
4,
6
],
[
3,
5,
7
]
];
|
[Code]
print "[2][1] isn't defined\n"
unless defined $LoL->[2][1] ;
|
[Output]
[Code]
[Output]
$VAR1 = [
[
2,
4,
6
],
[
3,
5,
7
],
[]
];
|
Notice the anonymous array created in $LoL->[2]! It just got
autovivified because the code assumed it had to exist and Perl created
it for you.
Here is another example which is a common idiom and confuses some
newbies:
[Code]
$list_ref = undef ;
push @{$list_ref}, 1 .. 4 ;
print Dumper $list_ref ;
|
[Output]
Note that undef is only assigned to $list_ref for this example. In
normal code it would probably be a my'ed variable and start out
undefined. Without autovivification you would have to assign an empty
anonymous array to $list_ref first.
[Code]
$list_ref = [] ;
push @{$list_ref}, 1 .. 4 ;
|
A variant on that would be:
[Code]
push @{$list_ref ||= []}, 1 .. 4 ;
|
That initializes $list_ref to []
if it is false (most likely it was undefined as in the above cases).
It is still cleaner and definitely faster to let Perl do the defined
test and initialization with [] for you.
Autovivification even works on references to scalars:
[Code]
my $scalar_ref = undef ;
${$scalar_ref} = 'i am refered to' ;
print "ref $scalar_ref value [${$scalar_ref}]\n" ;
|
Now is the time for some explanation of what is happening under the
hood. Autovivification of references only occurs when you dereference an
undefined value. If there is a defined value (and not a reference of the
proper type), it will be used as a symbolic reference and not be what
you want. Remember, symbolic references are black magic and should only
be used in very few cases and never by newbies. You should be using
strict which disables symbolic references and would thereby detect the
error of dereferencing a variable which has a value other than undef or
a proper reference.
So Perl first evaluates a dereference expression and sees that the
current reference value is undefined. It notes the type of dereference
(scalar, array or hash) and allocates an anonymous reference of that
type. Perl then stores that new reference value where the undefined
value was stored. Then the dereference operation in progress is
continued. If you do a nested dereference expression, then each level
from top to bottom can cause its own autovivication. Look at this:
[Code]
$deep_ref = undef ;
$deep_ref->{'foo'}{'bar'}[1]{'baz'} = 1 ;
print Dumper $deep_ref ;
|
[Output]
$VAR1 = {
'foo' => {
'bar' => [
undef,
{
'baz' => 1
}
]
}
};
|
Four anonymous references were created there by autovivification working
from the top level with $deep_ref all the way down to the hash that has
'baz' for its only key.
This last example illustrates the power and primary use of
autovivifiction. If you wanted to assign the lowest level hash before
the higher levels existed, without autovivifiaction, you would have to
do the loop yourself and test each level and optionally create it as you
went down. The call would have to take a list of pairs - reference type
and index or key. You could simplify it by restricting it to one type:
[Code]
sub deep_hash_assign {
my( $ref_ref, $val, @keys ) = @_ ;
unless ( @keys ) {
warn "deep_hash_assign: no keys" ;
return ;
}
foreach my $key ( @keys ) {
my $ref = ${$ref_ref} ;
# this is the autoviv step
unless ( defined( $ref ) ) {
$ref = { $key => undef } ;
${$ref_ref} = $ref ;
}
# this checks we have a valid hash ref as a current value
unless ( ref $ref eq 'HASH' and exists( $ref->{ $key } ) ) {
warn "deep_hash_assign: not a hash ref at $key in @keys" ;
return ;
}
# this points to the next level down the hash tree
$ref_ref = \$ref->{ $key } ;
}
${$ref_ref} = $val ;
}
$deep_ref2 = undef ;
deep_hash_assign( \$deep_ref2, 17, qw( foo bar baz ) ) ;
print Dumper $deep_ref2 ;
$deep_ref2 = undef ;
deep_hash_assign( \$deep_ref2, 17 ) ;
|
As you can see, that sub is not very robust, clumsy to use and probably
a lot slower than having Perl do it for you. Also it can't handle a mix
of hashes and arrays. To do that you would have to also specify hash or
array along with each key or index.
So autovivification saves code and trouble when assigning deep into a
data structure, but why does it also happen when using exists and
defined? Many people think that exists and defined should fail at the
first level thay can. Let's look at exists and
defined again with this code:
[Code]
%hash = (
'foo' => 3,
) ;
print Dumper \%hash ;
if ( exists( $hash{'bar'}{'baz'} ) ) {
print "{'bar'}{'baz'} exists\n" ;
}
print Dumper \%hash ;
|
Where did the 'bar' => {} and 'array' => []
entries in %hash come from?
Well, the way Perl works, exists and
defined do not provide any special
contexts to their expressions. So if their expression would autovivify,
it will happen before the exists or defined test occurs. This issue has
been argued heavily in various fora including p5p but it won't be
changed as too much code works with the current behavior. It is the way
Perl treats it and you can't directly get around it. Perl6 has been
discussing this and may do something to support this and it could be
controlled by a pragma. But there are still gray areas, such as if you
take a reference deep into a tree where autovivification would be
triggered, does passing that to an exists call stop it from happening?
Similarly passing a potentially autovivified expression to a sub which
may only call defined on it, should that work as it does now?
Here is a sub you can use to test for existance of a key at any level
and it will not trigger autovivification:
[Code]
sub deep_exists {
my( $hash_ref, @keys ) = @_ ;
unless ( @keys ) {
warn "deep_exists: no keys" ;
return ;
}
foreach my $key ( @keys ) {
unless( ref $hash_ref eq 'HASH' ) {
warn "$hash_ref not a HASH ref" ;
return ;
}
return 0 unless exists( $hash_ref->{$key} ) ;
$hash_ref = $hash_ref->{$key} ;
}
return 1 ;
}
%exist_hash = (
'foo' => {
'bar' => 3
}
) ;
print "\$exist_hash{foo}{bar} exists\n"
if deep_exists( \%exist_hash, qw( foo bar ) ) ;
print "\$exist_hash{foo}{bar}{baz} doesn't exist\n"
unless deep_exists( \%exist_hash, qw( foo bar baz ) ) ;
print Dumper \%exist_hash ;
|
[Output]
$VAR1 = {
'foo' => {
'bar' => 3
}
};
|
Notice that the data structure did not get modified as we didn't trigger
autovivification and we exited as soon as an exists call failed. Also it
returns 0 on normal failure and undef
on detecting an error.
That sub only works on hashes of hashes and it tests with exists. Here
it is, modified to work with hashes or arrays and it uses defined for the test:
[Code]
sub deep_defined {
my( $ref, @keys ) = @_ ;
unless ( @keys ) {
warn "deep_defined: no keys" ;
return ;
}
foreach my $key ( @keys ) {
if( ref $ref eq 'HASH' ) {
# fail when the key doesn't exist at this level
return unless defined( $ref->{$key} ) ;
$ref = $ref->{$key} ;
next ;
}
if( ref $ref eq 'ARRAY' ) {
# fail when the index is out of range or is not defined
return unless 0 <= $key && $key < @{$ref} ;
return unless defined( $ref->[$key] ) ;
$ref = $ref->[$key] ;
next ;
}
# fail when the current level is not a hash or array ref
return ;
}
return 1 ;
}
my $defined_tree = {
'foo' => [
{
'bar' => 3,
'baz' => 'four',
},
{
'bar' => 5,
'baz' => 'six',
}
],
'oof' => [
{
'bar' => 7,
'baz' => 'eight',
},
{
'bar' => 9,
}
],
} ;
print "\$defined_tree->{foo}[0]{bar} is defined\n"
if deep_defined( $defined_tree, 'foo', 0, 'bar' ) ;
print "\$defined_tree->{oof}[1]{baz} isn't defined\n"
unless deep_defined( $defined_tree, 'oof', 1, 'baz' ) ;
print "\$defined_tree->{goof}[1]{baz} isn't defined\n"
unless deep_defined( $defined_tree, 'goof', 1, 'baz' ) ;
print DumperX $defined_tree ;
$defined_tree->{foo}[0]{bar} is defined
$defined_tree->{oof}[1]{baz} isn't defined
$defined_tree->{goof}[1]{baz} isn't defined
|
[Output]
$VAR1 = {
'oof' => [
{
'baz' => 'eight',
'bar' => 7
},
{
'bar' => 9
}
],
'foo' => [
{
'baz' => 'four',
'bar' => 3
},
{
'baz' => 'six',
'bar' => 5
}
]
};
|
As you can see it works and doesn't autovivify higher levels as it
returns when it doesn't find a reference. It is a cleaner subroutine
than deep_hash_assign since it can see what there is at each level and
do the right thing.
So to review the concept, autovivification happens when Perl
automatically create a reference of the appropriate type when an
undefined scalar value is dereferenced. It is a useful concept and is
used in many programs. If Perl didn't do it, you would have to resort to
clumsier code and special subroutines to create the new levels of your
data structures. Some complain it shouldn't happen with exists or
defined but the sub to work around that is not tricky to create or
use. There is interest that in Perl 6 those two operations won't
autovivify but that is not for certain.
Note: all the above code is also in the file auto.pl. It uses
Data::Dumper and the DumperX sub as regular
Dumper seems to have problems with this code.
About the Author:
Uri Guttman co-authored the award
winning paper, "A Fresh Look at Efficient Perl Sorting", presented at the 3rd Perl Conference in August,
1999; was a technical reviewer of Object Oriented Perl by Damian Conway and was a past Technical editor of The
Perl Journal. He is also an active participant in the comp.lang.perl.misc newsgroup and
boston.pm, the local chapter of
Perl Mongers.
|