- implement ->count() for storage iterators, since we have to find all
the docs before iterating over them anyway, so there's no performance
overhead... WARNING: be sure to die if given a where_clause!
ski@wwe:~$ perl -le 'print time." starting..."; use XML::Comma; my $it =
XML::Comma::Def->AllAfrica_NewsStory->get_store("post")->iterator(); my
$i=0; while(++$it) { ++$i; print time.": ".$i unless($i % 1000) } print
$i'
1205511459 starting...
1205511477: 1000
1205511482: 2000
1205511487: 3000
18 sec startup time + 4 sec per 1000 docs
- '$doc->bar = 123;' syntax is possible - see:
perldoc perl56delta / perlsub:
Lvalue subroutines
WARNING: Lvalue subroutines are still experimental and the
implementation may change in future versions of Perl.
Compress-Raw-Zlib uses it...
- if first time store and #stores in def == 1, don't complain if store
name isn't specified.
- fixup "get_index->rebuild()" requires stores argument bug
- fix comma-rebuild-index.pl and any other scripts to deal with loading
defs from modules
- make install hack:
in misc/install-extras.pl, exec perl -e 'use XML::Comma' to
force compilation of Inline::C stuff for Parser?
- document select_count in guide.html, etc.
- no doc_id on new (unstored) docs makes creating docs that point at one
another hard
- perhaps implement iterator->to_array($n) where you are forced to
specify a max size will get thru the comittee? other args:
whole_enchilada => 1, reset_it_count (equivalent to it_refresh
immediately after)
doc_is_new is confusing now.
since a new doc can be set an arbitrary number of times w/o need of
get_lock or keep_open, in some sense a new doc is always new until you
load it from disk. currently doc_is_new ceases to be new as soon as you
associate the doc with some storage, which is what brian wants, but you
can write code like this (from multi_store_doctype.t)
my $doc = XML::Comma::Doc->new( type => $def_one_name );
$doc->element( "foo" )->set( "foo_$_" );
$doc->bar( "bar_$_" );
$doc->store( store => "one" ); # updates indices 'one' and 'all'
$doc->store( store => "two" ); # updates indices 'two' and 'all'
resolve this ambiguity.
fix Index->erase() to clear cached data as necessary
then implement rebuild(drop_first => 1);
alter table statements fail with postgres for when we change a def's
sql_type or so:
DBD::Pg::st execute failed: ERROR: syntax error at or near "MODIFY"
LINE 3: MODIFY type VARCHAR(255)
^
[for Statement "
ALTER TABLE TravelPo_0001
MODIFY type VARCHAR(255)
"] at /usr/local/share/perl/5.8.8/XML/Comma/SQL/Base.pm line 293.
would it be worth having some SQL-backend-neutral names for sql_types?
e.g. pg:
varchar text name bytea bool date time - etc.
vs mysql:
VARCHAR CHAR INT TINYINT - etc.
more here:
trying to rebuild an index_only store shouldn't work, but it should give
a better error message than this:
tperl -e 'XML::Comma::Def->Shorten->get_index("main")->rebuild(verbose => 1)'
beginning re-index from store main...
error while rebuilding: UNKNOWN_ACTION -- no method or element
'extension' found in 'DocumentDefinition:store' at -e line 1
if i try to do <field><name>foo</name></field>, and foo is a plural
element, i don't get indexed - nor do i get any error output!
when is the right time to abort?
also, is it possible that we should auto-detect by plurality
if we want a field or collection, and have both mean the
same thing? (perhaps not, the programmer needs to know the
difference to create sql to actually query the thing)
- we have to rejigger the install process to have a seperate
dbconfig step sooner than later, as when we return from cpan currently,
we make and make test, and die slowly w/o that info... do a basic SQL
connect check first, perhaps, and if it fails, skip the rest.
testing cpan with a local tarball:
TARBALL=/media/edgy/home/ski/mysrc/XML-Comma-1.993.tgz
mkdir /root/.cpan/build
tar xfz $TARBALL
cd XML-Comma-*
cpan .
convert all uses of bareword file handles to proper my $foo style
(that is, if this is compat with 5.6)
calling ->erase() when you don't have perms to delete the store doesn't
seem to do the right thing... (no error - try with a TravelPictureLocal)
(just a reflection of the index errors not propagated problem we had?)
a terser, and hopefully quicker to parse data format not based on XML?
- or perhaps use XML with attributes, as
<element name="foo" />
is a hell of a lot terser and easier to read than
<element><name>foo</name></element>
the selection of storage format would come in a layer right next
to storage/location... or perhaps look at how the gzip compression
is implemented and do something like that?
fix sync_deferred_textsearches/Pg the Right Way
- access to convenience methods like AllAfrica::Travel::Post::get_uri
and blurb_text... for now we have to call
AllAfrica::Travel::Post::get_uri($doc), instead of $doc->get_uri;
#it'd also be nice to say things like the below, instead of
#having a method with the same name as:
# <textsearch>
# <name>full_text</name>
# <code><![CDATA[ sub {
# return $_[0]->title."\n".
# join("\n", $_[0]->tags())."\n".
# $_[0]->body_text;
# } ]]></code>
# </textsearch>
shortcut methods skip set_hook (and probably read_hook) !
(at least with a blob_element - try $doc->original("foo")
on a travel...Picture::Local doc)
<location> for BlobElements
this deals with elements on a different FS in a "smart" way,
and lets you move blobs out from the same hierarchy
as the xml docs, so you can grep -r there easily
another way to think of this is as the inverse of
the <extension> field (ie, <prefix>, akin to
PREFIX=/some/path make install)
integrate some variation on Versionable into core
parameterized where_clauses and such.
default mode of destination dir should be $owner:{www-data|web}
and mode 664 on comma.log
introspection cleanup:
add to docs that we deprecate from top-level of DocDef:
<required>, <plural>, <is_i*_from_hash>, these should be in
a properties block now:
<properties>
<required>qw ( foo bar )</required>
<plural>qw ( foo )</plural>
<doc_key>qw ( bar )</doc_key>
</properties>
add support for this syntax:
<element><name>foo</name>
<properties>plural,required</properties></element>
<element><name>bar</name>
<properties>doc_key,required</properties></element>
allow support for persistent setting of this in a UI system
or be somewhere else in a UI-settable system
where can we store this adjunct data?
perhaps this should wait until a file can be both a def and doc?
also clean up the "4 different ways" problem, standardize on the
last (while retaining compatibility):
$def->is_{required|plural|is_ignore_for_hash}
$el->def->is_{nested|blob}
$macros{enum|boolean|range|timestamp|timestamp_created|timestamp_last_modified}
$def->has_property($required_plural_blob_nested_enum_or_etc, $el_name)
also rewrite legacy functions like is_required as calls to has_property
insights from introspection:
1 - a bug: you can load a <TravelPost>...</TravelPost> with doc on disk
that has <tpt>...</tpt> - add a sanity check there.
2 - a possible bug: the below dies if we don't have a listing_fields (?)
<nested_element>
<name>listing_fields</name>
<element><name>foo</name><macro>something_that_installs_validate_hook</macro></element>
<required>qw ( foo )</required>
</nested_element>
comma/postgres bug:
need to use a superuser for now.
later do what it suggests and change COPY to \copy ?
#complete test:
dropdb -U postgres comma ; createdb -U postgres -O comma comma; \
make distclean ; perl Makefile.PL && make && make test && sudo make install
TODO: make sure all new API stuff is in doc/guide.html and Bootstrap.pm
add to comma-2.html/Changelog:
some sort of smart dump of svn commits to chronicle
2.1:
- apply & fix the stuff in weak/ for ->parent, ->top, etc.
- get more info on copyright bounding start dates from comma 1.21
- there is no way to get the size of the store from the index... perhaps
this is a feature? ... but it would be nice to have an
$index->total_count or so
- make an Iteratable.pm or so which Storage::Iterator and
Index::Iterator inherit from ( we probably can't use it for much
but it will look nicer that way at least ... )
$sit->length() vs. $idx->count() ... resolve this inconsisency if poss.
consider iterator_refresh for storage iterators?
$sit->length(max => $n) would be nice for gigantic stores...
dave ideas:
2 - (not sure if this is something we should do via comma, but it
would help convert people who are SQL-addicts):
this common construct is reaaaaaly slow:
for ( @{ $doc->issue_ptr } ) {
push @issues, $index->single( where_clause => "doc_id = $_" )
}
foreach my $issue (@issues) {
...
}
you hit the DB N times. you can hit it just once by building a gigantic
where clause, e.g.
my $wc = join ("OR", map { "doc_id = $_" } @{ $doc->issue_ptr });
my $it = $index->iterator(where_clause => $wc);
while(++$it) {
...
}
or even better, using an IN statement:
my $wc = "doc_id IN (".join (",", @{ $doc->issue_ptr }).")";
my $it = $index->iterator(where_clause => $wc);
while(++$it) {
...
}
Reported by tim, 2/15/2007
This doesn't work... Probably to fix this after we get the weakref
stuff in and right.
$ perl -MXML::Comma -e 'use strict ;
my $def = XML::Comma::Def->read( name=>"AllAfrica_Publisher" );
foreach my $el ( $def->def_sub_elements() ) {
my $tag = $el->name();
print $tag . "\n" if $el->element_is_plural($tag);
}'
This does:
$ perl -MXML::Comma -e 'use strict ;
my $def = XML::Comma::Def->read( name=>"AllAfrica_Publisher" );
foreach my $el ( $def->def_sub_elements() ) {
my $tag = $el->name();
print $tag . "\n" if $def->is_plural($tag);
}'
This is an API bug. THE other is_* have this problem as well.
- standardize/clean up the order of "use" statements, etc. in t/*.t
with postgres, we get lots of spurious warnings starting "NOTICE"
these can be ignored, but are ugly. Pg code in general needs a cleanup
pg for dummies:
To get postgres up and running on my machine, for use with comma, I
just did an ubuntu install, then:
apt-get install postgresql-8.1 postgresql-server-dev-8.1
sudo cpan -i DBD::Pg
sudo bash
$EDITOR /etc/postgresql/8.1/main/pg_hba.conf
## change:
local all postgres ident sameuser
local all all ident sameuser
## to:
local all postgres ident sameuser
local all all password
#local all all ident sameuser
/etc/init.d/postgresql-8.1 restart
su postgres
createuser --pwpompt comma #for now make a superuser
#there is a bug we need to fix with COPY commands with Pg
#relevant tests are in t/indexing.t
createdb -O comma comma
exit #shell as postgres user
psql -U comma comma #make sure you can connect
psql -U postgres #make sure you can connect
#edit Configuration.pm accordingly
- element-specific locking for when whole body lock is unnec. and too
heavy weight?
- maybe use ExtUtils::MakeMaker::prompt for automated testers
- it'd be nice to have "iterators" on nested plural elements
- more test cases for $blob_el->{append|set}() combinations
- make $blob_el->set() more efficient by nuking the fcopy and
instead keep the appendage and the original data seperate
use _set_was_called for this
- t/indexing.t sometimes fails because of an incomplete previous run, or
cruft in the database. make it more robust.
### step 5. - STORE_ERROR/POST_STORE_ERROR/DOC_INDEX_ERROR (0001)
think: table to drop, delete entry in index_tables, delete any comma_locks
(dangerous)
- it'd be nice if we could do some SQL-ish things with storage iterators...
but best to read up on the progress (or is that lack of progress) with\
SQLite first...