Skip to content

PyTables enhancements for selection #1996

Closed
@jreback

Description

@jreback

now

changes to pandas.io.pytables to support more natural selection (from tables):

  1. rename column -> major, index -> minor ( to be more consistent with panel nomenclature)
  2. provide parsable string selection methodology - pretty easy to do - and can be backwards compatible

store.select('mypanel', where = [ 'major>=20120103', 'major<=20120401', dict(minor = ['A','B','C' ]))

rather than existing

store.select('mypanel', where = [ 
dict(field = 'column', op = '>=', value = datetime.datetime(2012,1,3)), 
dict(field = 'column', op = '<=', value = datetime.datetime(2012,4,1)), 
dict(field = 'index', value = ['A','B','C'])  ])

future

not sure that pandas should get really fancy just yet with operations - (e.g. 'or' operations, and actual value selection)

where = [ ( 'major>20120901' & dict(minor = ['A','B','C']) | (minor = ['D']) ]
where = [ item['foo']>2.0 ]

but probably necessary once pandas support 'chunking' type operations on pytables

need to build a full-fledged selection parser to translate to the numexpr type operations (maybe with a patsy backend????)
BUT this may actually be useful to support generic operations in this way on in-memory panels/frames

not sure of use cases here though - I usually just read in 'about' what data I need and sub-select from there
unless you have hundreds of millions of rows I don't know if its necessary to optimize more (in which case it is!)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions