=pod =head1 NAME Data::Walk::Extracted - An extracted dataref walker =begin html perl version Build Status Coverage Status github version CPAN version kwalitee =end html =head1 SYNOPSIS This is a contrived example! For a more functional (complex/useful) example see the roles in this package. package Data::Walk::MyRole; use Moose::Role; requires '_process_the_data'; use MooseX::Types::Moose qw( Str ArrayRef HashRef ); my $mangle_keys = { Hello_ref => 'primary_ref', World_ref => 'secondary_ref', }; #########1 Public Method 3#########4#########5#########6#########7#########8 sub mangle_data{ my ( $self, $passed_ref ) = @_; @$passed_ref{ 'before_method', 'after_method' } = ( '_mangle_data_before_method', '_mangle_data_after_method' ); ### Start recursive parsing $passed_ref = $self->_process_the_data( $passed_ref, $mangle_keys ); ### End recursive parsing with: $passed_ref return $passed_ref->{Hello_ref}; } #########1 Private Methods 3#########4#########5#########6#########7#########8 ### If you are at the string level merge the two references sub _mangle_data_before_method{ my ( $self, $passed_ref ) = @_; if( is_Str( $passed_ref->{primary_ref} ) and is_Str( $passed_ref->{secondary_ref} ) ){ $passed_ref->{primary_ref} .= " " . $passed_ref->{secondary_ref}; } return $passed_ref; } ### Strip the reference layers on the way out sub _mangle_data_after_method{ my ( $self, $passed_ref ) = @_; if( is_ArrayRef( $passed_ref->{primary_ref} ) ){ $passed_ref->{primary_ref} = $passed_ref->{primary_ref}->[0]; }elsif( is_HashRef( $passed_ref->{primary_ref} ) ){ $passed_ref->{primary_ref} = $passed_ref->{primary_ref}->{level}; } return $passed_ref; } package main; use MooseX::ShortCut::BuildInstance qw( build_instance ); my $AT_ST = build_instance( package => 'Greeting', superclasses => [ 'Data::Walk::Extracted' ], roles => [ 'Data::Walk::MyRole' ], ); print $AT_ST->mangle_data( { Hello_ref =>{ level =>[ { level =>[ 'Hello' ] } ] }, World_ref =>{ level =>[ { level =>[ 'World' ] } ] }, } ) . "\n"; ################################################################################# # Output of SYNOPSIS # 01:Hello World ################################################################################# =head1 DESCRIPTION This module takes a data reference (or two) and L travels through it(them). Where the two references diverge the walker follows the primary data reference. At the L and L of each branch or L in the data the code will attempt to call a L on the remaining unparsed data. =head2 Acknowledgement of MJD This is an implementation of the concept of extracted data walking from L Chapter 1 by L. I With that said I diverged from MJD purity in two ways. This is object oriented code not functional code. Second, when taking action the code will search for class methods provided by (your) role rather than acting on passed closures. There is clearly some overhead associated with both of these differences. I made those choices consciously and if that upsets you L! =head2 What is the unique value of this module? With the recursive part of data walking extracted the various functionalities desired when walking the data can be modularized without copying this code. The Moose framework also allows diverse and targeted data parsing without dragging along a L API for every use of this class. =head2 Extending Data::Walk::Extracted B. It usually also makes sense to build an initial action method as well. The initial action method can do any data-preprocessing that is useful as well as providing the necessary set up for the generic walker. All of these elements can be combined with this class using a L, by L, or it can be joined to the class at run time. See L. or L for more class building information. See the L to understand the details of how the methods are used. See L for the available methods to implement the roles. Then, L =head1 Recursive Parsing Flow =head2 Initial data input and scrubbing The primary input method added to this class for external use is refered to as the 'action' method (ex. 'mangle_data'). This action method needs to receive data and organize it for sending to the L for the generic data walker. I =head2 Assess and implement the before_method The class next checks for an available 'before_method'. Using the test; exists $passed_ref->{before_method}; If the test passes then the next sequence is run. $method = $passed_ref->{before_method}; $passed_ref = $self->$method( $passed_ref ); If the $passed_ref is modified by the 'before_method' then the recursive parser will parse the new ref and not the old one. The before_method can set; $passed_ref->{skip} = 'YES' Then the flow checks for the need to investigate deeper. =head2 Test for deeper investigation The code now checks if deeper investigation is required checking both that the 'skip' key = 'YES' in the $passed_ref or if the node is a L. If either case is true the process jumps to the L otherwise it begins to investigate the next level. =head2 Identify node elements If the next level in is not skipped then a list is generated for all L in the node. For example a 'HASH' node would generate a list of hash keys for that node. SCALAR nodes will generate a list with only one element containing the scalar contents. UNDEF nodes will generate an empty list. =head2 Sort the node as required If the list L then the list is sorted. B I =head2 Process each element For each identified element of the node a new $data_ref is generated containing data that represents just that sub element. The secondary_ref is only constructed if it has a matching type and element to the primary ref. Matching for hashrefs is done by key matching only. Matching for arrayrefs is done by position exists testing only. I Scalars are matched on content. The list of items generated for this element is as follows; =over B> --Ename of before method for this role hereE-- B> --Ename of after method for this role hereE-- B> the piece of the primary data ref below this element B> the lower primary (walker) L B> YES|NO (This indicates if the secondary ref meets matching critera) B> YES|NO Checks L against the lower primary_ref node. This can also be set in the 'before_method' upon arrival at that node. B> if match eq 'YES' then built like the primary ref B> if match eq 'YES' then calculated like the primary type B> L =back =head2 A position trace is generated The current node list position is then documented and pushed onto the array at $passed_ref->{branch_ref}. The array reference stored in branch_ref can be thought of as the stack trace that documents the node elements directly between the current position and the initial (or zeroth) level of the parsed primary data_ref. Past completed branches and future pending branches are not maintained. Each element of the branch_ref contains four positions used to describe the node and selections used to traverse that node level. The values in each sub position are; [ ref_type, #The node reference type the list item value or '' for ARRAYs, #key name for hashes, scalar value for scalars element sequence position (from 0), #For hashes this is only relevent if sort_HASH is called level of the node (from 0), `#The zeroth level is the initial data ref ] =head2 Going deeper in the data The down level ref is then passed as a new data set to be parsed and it starts at the L again. =head2 Actions on return from recursion When the values are returned from the recursion call the last branch_ref element is Led off and the returned data ref is used to L the sub elements of the primary_ref and secondary_ref associated with that list element in the current level of the $passed_ref. If there are still pending items in the node element list then the program L =head2 Assess and implement the after_method After the node elements have all been processed the class checks for an available 'after_method' using the test; exists $passed_ref->{after_method}; If the test passes then the following sequence is run. $method = $passed_ref->{after_method}; $passed_ref = $self->$method( $passed_ref ); If the $passed_ref is modified by the 'after_method' then the recursive parser will parse the new ref and not the old one. =head2 Go up The updated $passed_ref is passed back up to the L. =head1 Attributes Data passed to -Enew when creating an instance. For modification of these attributes see L. The -Enew function will either accept fat comma lists or a complete hash ref that has the possible attributes as the top keys. Additionally some attributes that have the following prefixed methods; get_$name, set_$name, clear_$name, and has_$name can be passed to L<_process_the_data |/_process_the_data( $passed_ref, $conversion_ref )> and will be adjusted for just the run of that method call. These are called L attributes. Nested calls to _process_the_data will be tracked and the attribute will remain in force until the parser returns to the calling 'one shot' level. Previous attribute values are restored after the 'one shot' attribute value expires. =head2 sorted_nodes =over B If the primary_type of the L<$element_ref|/Process each element> is a key in this attribute hash ref then the node L is sorted. If the value of that key is a CODEREF then the sort L function will called as follows. @node_list = sort $coderef @node_list I B {} #Nothing is sorted B This accepts a HashRef. B sorted_nodes =>{ ARRAY => 1,#Will sort the primary_ref only HASH => sub{ $b cmp $a }, #reverse sort the keys } =back =head2 skipped_nodes =over B If the primary_type of the L<$element_ref|/Process each element> is a key in this attribute hash ref then the 'before_method' and 'after_method' are run at that node but no L is done. B {} #Nothing is skipped B This accepts a HashRef. B sorted_nodes =>{ OBJECT => 1,#skips all object nodes } =back =head2 skip_level =over B This attribute is set to skip (or not) node parsing at the set level. Because the process doesn't start checking until after it enters the data ref it effectivly ignores a skip_level set to 0 (The base node level). I array ref + 1>. B undef = Nothing is skipped B This accepts an integer =back =head2 skip_node_tests =over B This attribute contains a list of test conditions used to skip certain targeted nodes. The test can target an array position, match a hash key, even restrict the test to only one level. The test is run against the latest L element so it skips the node below the matching conditions not the node at the matching conditions. Matching is done with '=~' and so will accept a regex or a string. The attribute contains an ArrayRef of ArrayRefs. Each sub_ref contains the following; =over B<$type> - This is any of the L reference node types B<$key> - This is either a scalar or regex to use for matching a hash key B<$position> - This is used to match an array position. It can be an integer or 'ANY' B<$level> - This restricts the skipping test usage to a specific level only or 'ANY' =back B [ [ 'HASH', 'KeyWord', 'ANY', 'ANY'], # Skip the node below the value of any hash key eq 'Keyword' [ 'ARRAY', 'ANY', '3', '4'], ], # Skip the node stored in arrays at position three on level four ] B An infinite number of skip tests added to an array B [] = no nodes are skipped =back =head2 change_array_size =over B This attribute will not be used by this class directly. However the L role may share it with other roles in the future so it is placed here so there will be no conflicts. This is usually used to define whether an array size shinks when an element is removed. B 1 (This probably means that the array will shrink when a position is removed) B Boolean values. =back =head2 fixed_primary =over B This means that no changes made at lower levels will be passed upwards into the final ref. B 0 = The primary ref is not fixed (and can be changed) I<0 -E effectively deep clones the portions of the primary ref that are traversed.> B Boolean values. =back =head1 Methods =head2 Methods used to write roles These are methods that are not meant to be exposed to the final user of a composed role and class but are used by the role to excersize the class. =head3 _process_the_data( $passed_ref, $conversion_ref ) =over B This method is the gate keeper to the recursive parsing of Data::Walk::Extracted. This method ensures that the minimum requirements for the recursive data parser are met. If needed it will use a conversion ref (also provided by the caller) to change input hash keys to the generic hash keys used by this class. This function then calls the actual recursive function. For an overview of the recursive steps see the L. B ( $passed_ref, $conversion_ref ) =over B<$passed_ref> this ref contains key value pairs as follows; =over B - a dataref that the walker will walk - required =over review the $conversion_ref functionality in this function for renaming of this key. =back B - a dataref that is used for comparision while walking. - optional =over review the $conversion_ref functionality in this function for renaming of this key. =back B - a method name that will perform some action at the beginning of each node - optional B - a method name that will perform some action at the end of each node - optional B<[attribute name]> - L attribute names are accepted with temporary attribute settings here. These settings are temporarily set for a single "_process_the_data" call and then the original attribute values are restored. =back B<$conversion_ref> This allows a public method to accept different key names for the various keys listed above and then convert them later to the generic terms used by this class. - optional B $passed_ref ={ print_ref =>{ First_key => [ 'first_value', 'second_value' ], }, match_ref =>{ First_key => 'second_value', }, before_method => '_print_before_method', after_method => '_print_after_method', sorted_nodes =>{ Array => 1 },#One shot attribute setter } $conversion_ref ={ primary_ref => 'print_ref',# generic_name => role_name, secondary_ref => 'match_ref', } =back B the $passed_ref (only) with the key names restored to the ones passed to this method using the $conversion_ref. =back =head3 _build_branch( $seed_ref, @arg_list ) =over B There are times when a role will wish to reconstruct the data branch that lead from the 'zeroth' node to where the data walker is currently at. This private method takes a seed reference and uses data found in the L to recursivly append to the front of the seed until a complete branch to the zeroth node is generated. I B a list of arguments starting with the $seed_ref to build from. The remaining arguments are just the array elements of the 'branch ref'. B $ref = $self->_build_branch( $seed_ref, @{ $passed_ref->{branch_ref}}, ); B a data reference with the current path back to the start pre-pended to the $seed_ref =back =head3 _extracted_ref_type( $test_ref ) =over B In order to manage data types necessary for this class a data walker compliant 'Type' tester is provided. This is necessary to support a few non perl-standard types not generated in standard perl typing systems. First, 'undef' is the UNDEF type. Second, strings and numbers both return as 'SCALAR' (not '' or undef). B B It receives a $test_ref that can be undef. B a data walker type or it confesses. =back =head3 _get_had_secondary =over B during the initial processing of data in L<_process_the_data|/_process_the_data( $passed_ref, $conversion_ref )> the existence of a passed secondary ref is tested and stored in the attribute '_had_secondary'. On occasion a role might need to know if a secondary ref existed at any level if it it is not represented at the current level. B nothing B True|1 if the secondary ref ever existed =back =head3 _get_current_level =over B on occasion you may need for one of the methods to know what level is currently being parsed. This will provide that information in integer format. B nothing B the integer value for the level =back =head2 Public Methods =head3 add_sorted_nodes( NODETYPE => 1, ) =over B This method is used to add nodes to be sorted to the walker by adjusting the attribute L. B Node key => value pairs where the key is the Node name and the value is 1. This method can accept multiple key => value pairs. B nothing =back =head3 has_sorted_nodes =over B This method checks if any sorting is turned on in the attribute L. B Nothing B the count of sorted node types listed =back =head3 check_sorted_nodes( NODETYPE ) =over B This method is used to see if a node type is sorted by testing the attribute L. B the name of one node type B true if that node is sorted as determined by L =back =head3 clear_sorted_nodes =over B This method will clear all values in the attribute L. I. B nothing B nothing =back =head3 remove_sorted_node( NODETYPE1, NODETYPE2, ) =over B This method will clear the key / value pairs in L for the listed items. B a list of NODETYPES to delete B In list context it returns a list of values in the hash for the deleted keys. In scalar context it returns the value for the last key specified =back =head3 set_sorted_nodes( $hashref ) =over B This method will completely reset the attribute L to $hashref. B a hashref of NODETYPE keys with the value of 1. B nothing =back =head3 get_sorted_nodes =over B This method will return a hashref of the attribute L B nothing B a hashref =back =head3 add_skipped_nodes( NODETYPE1 => 1, NODETYPE2 => 1 ) =over B This method adds additional skip definition(s) to the L attribute. B a list of key value pairs as used in 'skipped_nodes' B nothing =back =head3 has_skipped_nodes =over B This method checks if any nodes are set to be skipped in the attribute L. B Nothing B the count of skipped node types listed =back =head3 check_skipped_node( $string ) =over B This method checks if a specific node type is set to be skipped in the L attribute. B a string B Boolean value indicating if the specific $string is set =back =head3 remove_skipped_nodes( NODETYPE1, NODETYPE2 ) =over B This method deletes specificily identified node skips from the L attribute. B a list of NODETYPES to delete B In list context it returns a list of values in the hash for the deleted keys. In scalar context it returns the value for the last key specified =back =head3 clear_skipped_nodes =over B This method clears all data in the L attribute. B nothing B nothing =back =head3 set_skipped_nodes( $hashref ) =over B This method will completely reset the attribute L to $hashref. B a hashref of NODETYPE keys with the value of 1. B nothing =back =head3 get_skipped_nodes =over B This method will return a hashref of the attribute L B nothing B a hashref =back =head3 set_skip_level( $int ) =over B This method is used to reset the L attribute after the instance is created. B an integer (negative numbers and 0 will be ignored) B nothing =back =head3 get_skip_level() =over B This method returns the current L attribute. B nothing B an integer =back =head3 has_skip_level() =over B This method is used to test if the L attribute is set. B nothing B $Bool value indicating if the 'skip_level' attribute has been set =back =head3 clear_skip_level() =over B This method clears the L attribute. B nothing B nothing (always successful) =back =head3 set_skip_node_tests( ArrayRef[ArrayRef] ) =over B This method is used to change (completly) the 'skip_node_tests' attribute after the instance is created. See L for an example. B an array ref of array refs B nothing =back =head3 get_skip_node_tests() =over B This method returns the current master list from the L attribute. B nothing B an array ref of array refs =back =head3 has_skip_node_tests() =over B This method is used to test if the L attribute is set. B nothing B The number of sub array refs there are in the list =back =head3 clear_skip_node_tests() =over B This method clears the L attribute. B nothing B nothing (always successful) =back =head3 add_skip_node_tests( ArrayRef1, ArrayRef2 ) =over B This method adds additional skip_node_test definition(s) to the the L attribute list. B a list of array refs as used in 'skip_node_tests'. These are 'pushed onto the existing list. B nothing =back =head3 set_change_array_size( $bool ) =over B This method is used to (re)set the L attribute after the instance is created. B a Boolean value B nothing =back =head3 get_change_array_size() =over B This method returns the current state of the L attribute. B nothing B $Bool value representing the state of the 'change_array_size' attribute =back =head3 has_change_array_size() =over B This method is used to test if the L attribute is set. B nothing B $Bool value indicating if the 'change_array_size' attribute has been set =back =head3 clear_change_array_size() =over B This method clears the L attribute. B nothing B nothing =back =head3 set_fixed_primary( $bool ) =over B This method is used to change the L attribute after the instance is created. B a Boolean value B nothing =back =head3 get_fixed_primary() =over B This method returns the current state of the L attribute. B nothing B $Bool value representing the state of the 'fixed_primary' attribute =back =head3 has_fixed_primary() =over B This method is used to test if the L attribute is set. B nothing B $Bool value indicating if the 'fixed_primary' attribute has been set =back =head3 clear_fixed_primary() =over B This method clears the L attribute. B nothing B nothing =back =head1 Definitions =head2 node Each branch point of a data reference is considered a node. The possible paths deeper into the data structure from the node are followed 'vertically first' in recursive parsing. The original top level reference is considered the 'zeroth' node. =head2 base node type Recursion 'base' node L are considered to not have any possible deeper branches. Currently that list is SCALAR and UNDEF. =head2 Supported node walking types =over =item ARRAY =item HASH =item SCALAR =item UNDEF I Support for Objects is partially implemented and as a consequence '_process_the_data' won't immediatly die when asked to parse an object. It will still die but on a dispatch table call that indicates where there is missing object support, not at the top of the node. This allows for some of the L to use 'OBJECT' in their definitions. =back =head2 Supported one shot attributes L =over =item sorted_nodes =item skipped_nodes =item skip_level =item skip_node_tests =item change_array_size =item fixed_primary =back =head2 Dispatch Tables This class uses the role L to implement dispatch tables. When there is a decision point, that role is used to make the class extensible. =head1 Caveat utilitor This is not an extention of L The core class has no external effect. All output comes from L. This module uses the 'L' ( //= ) and so requires perl 5.010 or higher. This is a L based data handling class. Many coders will tell you Moose and data manipulation don't belong together. They are most certainly right in speed intensive circumstances. Recursive parsing is not a good fit for all data since very deep data structures will fill up a fair amount of memory! Meaning that as the module recursively parses through the levels it leaves behind snapshots of the previous level that allow it to keep track of it's location. The passed data references are effectivly deep cloned during this process. To leave the primary_ref pointer intact see L =head1 Build/Install from Source B<1.> Download a compressed file with the code B<2.> Extract the code from the compressed file. If you are using tar this should work: tar -zxvf Data-Walk-Extracted-v0.xx.xx.tar.gz B<3.> Change (cd) into the extracted directory B<4.> Run the following commands =over (For Windows find what version of make was used to compile your perl) perl -V:make (then for Windows substitute the correct make function (ex. s/make/dmake/g)) =back >perl Makefile.PL >make >make test >make install # As sudo/root >make clean =head1 SUPPORT =over L =back =head1 TODO =over B<1.> provide full recursion through Objects B<2.> Support recursion through CodeRefs (Closures) B<3.> Add a Data::Walk::Diff Role to the package B<4.> Add a Data::Walk::Top Role to the package B<5.> Add a Data::Walk::Thin Role to the package B<6.> Convert test suite to Test2 direct usage =back =head1 AUTHOR =over =item Jed Lund =item jandrew@cpan.org =back =head1 COPYRIGHT This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself. The full text of the license can be found in the LICENSE file included with this module. This software is copyrighted (c) 2012, 2016 by Jed Lund. =head1 Dependencies =over L L<5.010|http://perldoc.perl.org/perl5100delta.html> (for use of L //) L L L L - confess L - 2.1803 L L L L L - reftype L L L =back =head1 SEE ALSO =over L - Can use to unhide '###InternalExtracteD' tags L - to manage the output of exposed '###InternalExtracteD' lines L L L - Dumper L - Dump L - available Data::Walk::Extracted Role L - available Data::Walk::Extracted Role L - available Data::Walk::Extracted Role L - available Data::Walk::Extracted Role =back =cut