Skip to content

Commit 738b698

Browse files
mdr223tli2chjuncnvitaglianogsivaprasadsudhir
authored
Improved Caching and Support for Local Model Execution w/vLLM and Various Bug Fixes (#287)
* Fix broken dependencies (#227) * Move DataRecord Internal Fields to Have Leading Underscore (#229) * update README * 1. support add_columns in Dataset; 2. support run().to_df(); 3. add demo in df-newinterface.py (#78) * Support add_columns in Dataset. Support demo in df-newinterface.py Currently we have to do records, _ = qr3.run() outputDf = DataRecord.to_df(records) I'll try to make qr3.run().to_df() work in another PR. * ruff check --fix * Support run().to_df() Update run() to DataRecordCollection, so that it will be easier for use to support more features for run() output. We support to_df() in this change. I'll send out following commits to update other demos. * run check --fix * fix typo in DataRecordCollection * Update records.py * fix tiny bug in mab processor. The code will run into issue if we don't return any stats for this function in ``` max_quality_record_set = self.pick_highest_quality_output(all_source_record_sets) if ( not prev_logical_op_is_filter or ( prev_logical_op_is_filter and max_quality_record_set.record_op_stats[0].passed_operator ) ``` * update record.to_df interface update to record.to_df(records: list[DataRecord], project_cols: list[str] | None = None) which is consistent with other function in this class. * Update demo for the new execute() output format * better way to get plan from output.run() * fix getting plan from DataRecordCollection. people used to get plan from execute() of streaming processor, which is not a good practice. I update plan_str to plan_stats, and they need to get physical plan from processor. Consider use better ways to provide executed physical plan to DataRecordCollection, possibly from stats. * Update df-newinterface.py * update code based on comments from Matt. 1. add cardinality param in add_columns 2. remove extra testdata files 3. add __iter__ in DataRecordCollection to help iter over streaming output. * see if copilot just saved me 20 minutes * fix package name * use sed to get version from pyproject.toml * bump project version; keep docs behind to test ci pipeline * bumping docs version to match code version * use new __iter__ method in demos where possible * add type hint for output of __iter__; use __iter__ in unit tests * Update download-testdata.sh (#89) Added enron-tiny.csv * Clean up the retrieve API (#79) * Clean up the retrieve operator interface * fix comments * Update to the new to_df() API * Code update for #84 (#101) * Create chat.rst (#96) * Create chat.rst * Update pyproject.toml Hotfix for chat * Update conf.py Hotfix for chat.rst * code update for #84 This implementation basically resolves #84. One implementation is different from the #84: .add_columns( cols=[ {"name": "sender", "type": "string", "udf": compute_sender}, ... ] ) If add_columns() uses cols, udf, types as params, it will make this function confusing again. Instead, if users need to specify different udfs for different columns, they should just call add_columns() multiple times for different columns. * changed types to make use of Python type system; updated use of types in tests; updated docs and README * update test to match no longer allowing None default --------- Co-authored-by: Gerardo Vitagliano <vitaglianog@gmail.com> Co-authored-by: Matthew Russo <mdrusso@mit.edu> * Skip an operator if this is a duplicate op instead of raise error (#102) * Create chat.rst (#96) * Create chat.rst * Update pyproject.toml Hotfix for chat * Update conf.py Hotfix for chat.rst * Skip an operator when it doesn't need any logicalOP instead of raise error #Final Effects 1. Dataset() init only has one responsibility: wrap a datasource to a Dataset. I think this is a better interface. 2. No extra convert() will be added to the plan. 3. When users add the same op multiple times dataset.convert(File).convert(File), the system will just dedup the same op instead of raise error. #Issue Currently Dataset(src, schema) initiation has 2 responsibilities: 1. read source 2. convert source to schema. When we use default schema for Dataset init(source, schema=DefaultSchema) for users, the code works like: 1. Read source to schema that DataSource provides. This schema is derived by system, so the users don't know (don't need to know). 2. Convert Source schema to DefaultSchema. So everytime, the system will make one more convert call to convert SourceSchema to DefaultSchema, which is definitely wrong. #Solution 1. We use schema from Datasource if exists, which is reasonable. 2. If we do 1, then we'll get a dataset node that no actual op as its input_schema ==output_schema, so I updated a line in optimizer to just skip the node if it doesn't do anything instead raiseerror. #Real Examples ##Before Generated plan: 0. MarshalAndScanDataOp -> PDFFile 1. PDFFile -> LLMConvertBonded -> DefaultSchema (contents, filename, text_conte) -> (value) Model: Model.GPT_4o Prompt Strategy: PromptStrategy.COT_QA 2. DefaultSchema -> MixtureOfAgentsConvert -> ScientificPaper (value) -> (contents, filename, paper_auth) Prompt Strategy: None Proposer Models: [GPT_4o] Temperatures: [0.0] Aggregator Model: Model.GPT_4o Proposer Prompt Strategy: chain-of-thought-mixture-of-agents-proposer Aggregator Prompt Strategy: chain-of-thought-mixture-of-agents-aggregation 3. ScientificPaper -> LLMFilter -> ScientificPaper (contents, filename, paper_auth) -> (contents, filename, paper_auth) Model: Model.GPT_4o Filter: The paper mentions phosphorylation of Exo1 4. ScientificPaper -> MixtureOfAgentsConvert -> Reference (contents, filename, paper_auth) -> (reference_first_author, refere) Prompt Strategy: None Proposer Models: [GPT_4o] Temperatures: [0.8] Aggregator Model: Model.GPT_4o Proposer Prompt Strategy: chain-of-thought-mixture-of-agents-proposer Aggregator Prompt Strategy: chain-of-thought-mixture-of-agents-aggregation ##After Generated plan: 0. MarshalAndScanDataOp -> PDFFile 1. PDFFile -> LLMConvertBonded -> ScientificPaper (contents, filename, text_conte) -> (contents, filename, paper_auth) Model: Model.GPT_4o Prompt Strategy: PromptStrategy.COT_QA 2. ScientificPaper -> LLMFilter -> ScientificPaper (contents, filename, paper_auth) -> (contents, filename, paper_auth) Model: Model.GPT_4o Filter: The paper mentions phosphorylation of Exo1 3. ScientificPaper -> MixtureOfAgentsConvert -> Reference (contents, filename, paper_auth) -> (reference_first_author, refere) Prompt Strategy: None Proposer Models: [GPT_4o] Temperatures: [0.8] Aggregator Model: Model.GPT_4o Proposer Prompt Strategy: chain-of-thought-mixture-of-agents-proposer Aggregator Prompt Strategy: chain-of-thought-mixture-of-agents-aggregation * make equality check for new field names a bit more explicit * fix fixture usage * update all plans within code base to explicitly convert when needed; and removed unnecessary schemas for reading from datasource --------- Co-authored-by: Gerardo Vitagliano <vitaglianog@gmail.com> Co-authored-by: Matthew Russo <mdrusso@mit.edu> * Refactor demos to use .sem_add_columns or .add_columns instead of convert(), remove Schema from demos when possible. (#104) * Create chat.rst (#96) * Create chat.rst * Update pyproject.toml Hotfix for chat * Update conf.py Hotfix for chat.rst * code update for #84 This implementation basically resolves #84. One implementation is different from the #84: .add_columns( cols=[ {"name": "sender", "type": "string", "udf": compute_sender}, ... ] ) If add_columns() uses cols, udf, types as params, it will make this function confusing again. Instead, if users need to specify different udfs for different columns, they should just call add_columns() multiple times for different columns. * use field_values instead of field_types as field_values have the actual values, use field_values instead of field_types as field_values have the actual values, since field_values have the actual key-value pairs, while field_types are just contain fields and their types. records[0].schema is the schema of the output, which doesn't mean we already populate the schema into record. * Remove .convert() and use .sem_add_columns or .add_columns instead This change is based on #101 and #102, please review them first then this change. 1. This is to refactor all demos to use .sem_add_columns or .add_columns, and remove .convert(). 2. Remove Schema from demos, except demos using ValidationDataSource and dataset.retrieve() that need schema now. We can refactor these cases later. * ruff check --fix * fix unittest * demos fixed and unit tests running * fix add_columns --> sem_add_columns in demo * udpate quickstart to reflect code changes; shorten text as much as possible * passing unit tests * remove convert() everywhere * fixes to correct errors in demos; update quickstart and docs --------- Co-authored-by: Gerardo Vitagliano <vitaglianog@gmail.com> Co-authored-by: Matthew Russo <mdrusso@mit.edu> * Simplify Datasource (#103) ## Summary of PR changes **Note 1:** I did not change anything related to val_datasource (including tangential functions like Dataset._set_data_source()) as that will all be modified in a subsequent PR to reflect our discussion re: validation data. **Note 2:** I have completely commented out datamanager.py and config.py; for now I am willing to leave the code around in case we desperately need it for PalimpChat. However, my hope is that PalimpChat can be tweaked to work without the data manager and those files can be deleted before merging dev into main **Note 3:** Despite the branch name, fixing the progress managers will be part of a separate PR. - Collapsed all four `DataSource` classes down to a single `DataReader` class - Limit the number of methods the user needs to implement to just `__len__()` and `__getitem__()` - (Switched from using `get_item() --> __getitem__()` in `DataReader`) - Provided `DataReader` directly to scan operators (also renamed `DataSourcePhysicalOp --> ScanPhysicalOp` - Removed `DataDirectory()` from `src/` entirely; this included commenting out things which made use of the cache (e.g. caching computed `DataRecords` and codegen examples) - Got rid of `dataset_id` everywhere (which tracks with the previous bullet) - Removed the `Config` class which was a relic of a bygone era (and also intertwined with the `DataDirectory()`) - Updated all demos to use `import palimpzest as pz` to make the import statement(s) more welcoming - Fixed one bug resulting from converts now producing union schemas. Instead of including the `output_schema` in an operators' `get_id_params()` we simply report the `generated_fields`. - Changed `source_id --> source_idx` everywhere (this eliminated some weird renaming logic) - Finally, I added a large set of documentation for the DataSource class(es) * Multi-LLM Refinement Pipeline for Query Output Validation (#118) * Multi-LLM Refinement Pipeline for Query Output Validation (#92) ## Summary of PR This PR contains the work to add a new `CriticConvert` physical operator to PZ. At a high-level, this operator runs a bonded convert, and then asks a critic model if the answer produced by the bonded convert can be improved upon. The original output and the critique are then fed into a refinement model, which produces the improved output. The work to implement this includes: 1. Defining the physical operator in `src/palimpzest/query/operators/critique_and_refine_convert.py` 2. Adding an implementation rule for this physical operator in `src/palimpzest/query/optimizer/rules.py` 3. Adding boolean flag(s) to enable allowing / disallowing this physical optimization 4. Adding base prompts for the critique and refinement generations One other change which this work spawned was an attempt to improve the management and construction of our prompts -- and to decouple this logic from the `BaseGenerator` class. On the management side, I split our single `prompts.py` file into a set of files. On the construction side, I created a `PromptFactory` class which templates prompts based on the `prompt_strategy` and input record. The `PromptFactory` is not a perfect solution, but I think it is a step in the right direction. Finally, I fixed an error which previously filtered out `RAGConvert` operators from being considered by the `Optimizer`, and I made 2-3 more miscellaneous small tweaks. --------- Co-authored-by: Yash Agarwal <yash94404@gmail.com> Co-authored-by: Yash Agarwal <yashaga@Yashs-Air.attlocal.net> * MkDocs Site for Palimpzest API Documentation (#116) ## Summary of PR Changes 1. Changed `docs` to use [MkDocs](https://www.mkdocs.org/) instead of Sphinx 2. Created initial `Getting Started` content 3. Created placeholders for `User Guide` content (to follow in a subsequent PR) 4. Added autogenerated docs for our most user-facing code (we will need to add docstrings to our code in a subsequent PR) 5. Made small tweaks to `src/` to allow users to specify policy using kwargs in `.run()` 6. Renamed the `testdata/enron-tiny/` files so that they're not so damn weird --------- Co-authored-by: Yash Agarwal <yash94404@gmail.com> Co-authored-by: Yash Agarwal <yashaga@Yashs-Air.attlocal.net> * remove registration of sources from CI; only check version bump if there is a code change * remove filter for only checking version bump when src files changed * Rename `nocache` --> `cache` everywhere (#128) * first commit * Removed myenv * added to git ignore * addressed the comments in review * flip one minor comment * minor spacing fix * fix spaces in a few more spots --------- Co-authored-by: Bari LeBari <barilebari@dhcp-10-29-207-160.dyn.MIT.EDU> Co-authored-by: muhamed <muhamed@mit.edu> Co-authored-by: Matthew Russo <mdrusso@mit.edu> * adding citation (and making 'others' explicit) (#136) * Make Generator thread-safe (#139) * fix moa prompt * fix moa prompt aggregator * update version * make generator thread-safe * update generator to return messages * address comments * Begin Process of Improving Index Abstraction(s) in PZ (#138) * quick and dirty implementation which tracks retrieve costs * bug fixes and currently unused index code * add default search func which I forgot to implement and add chromadb to pyproject.toml * leaving TODO * hotfix to add cost for retrieve operation * another hotfix to add ragatouille dependency * Add logger for PZ (#134) * add logger for PZ 1. When verbose=True, we save all logs to log_file and print them on console; 2. when verbose=False, we only save ERROR+ log to file and print ERROR+. I just add logging to somewhere I think might be important for the execution, we always can add/remove for more or less. Also I might update the logging message based on my later annotation work. But this PR should setup the logging mechanism for now. * ruff fix * update code based on comments 1. not logging output_records 2. not logging plan_stats 3. make the files to ".pz_logs" --------- Co-authored-by: Matthew Russo <mdrusso@mit.edu> * fix merge bug (#141) * ruff fix * update log dir and fix tiny bug * fix merge bug * Use a singleton API client for operators (#140) * fix moa prompt * fix moa prompt aggregator * update version * make generator thread-safe * update generator to return messages * address comments * create a singleton API client * fix linting * fix logging in generators * also create parent dir. if missing * CUAD benchmark (#143) * fix moa prompt * fix moa prompt aggregator * update version * make generator thread-safe * update generator to return messages * address comments * create a singleton API client * fix linting * fix logging in generators * fix CUAD benchmarlk * fix type * minor fixes * Limit the Scope of Logging within the Optimizer (#144) * making it possible to set log level based on env. variable; adding time limit on seven filters test * deleting instead of commenting out * Remove Conventional LLM Convert; Update Bonded LLM Convert retry logic (#145) * use NullHandler in __init__ and let application control logging config (#146) * use NullHandler in __init__ and let application control logging config * ruff fix * Fix Progress Manager and Simplify `execute_plan` methods (#148) * modifying ProgressManager class to allow for dynamically adding tasks * beginning to use new progress manager * initial rewrite of execute_plan methods with new progress manager * unit tests passing * trim a few lines * unit tests passing; changes applied everywhere; MAB and Random coming in a separate PR * enable final operator to show progress in parallel * address comments * The great deletion (#149) * Adding Preliminary Work on Abacus and MAB Sentinel Execution (#147) * updating models to avoid llama3 * fix parsing bugs and some generation errors * don't require json for proposer and code synth generations; fix prompt format instruction for proposers * fix typo/bug * fix bugs in generator prep for field_answers; fix bug in filter impl.; other improvements * adding new file for abacus workload * fix len * fix errors with dataset copy; prompt construction; and more * remove JSON instruction from MOA proposer * fixed bugs in optimizer configuration, llama 3.3 generation, and filter generation * clean up demos; fix missing base prompt from map * add one more missing base prompt * prepare demo for full run; get embedding cost info from RAGConvert; use reasoning output from Critique * add script to generate text-embedding-3-small reaction embeddings * write to .chroma * run full scale generation * compute embeddings slowly and add progress bar * add sleep * fix import * add total iters * create embeddings before ingesting * fix index start and finish * load embeddings and insert directly * make chroma use cosine sim.; finish initial search fcn. for biodex workload; naming tweak in rag convert * capturing gen stats in Retrieve * added UDF map operator; rewrote biodex pipeline to match docetl impl.; switched to using __name__ for functions instead of str() * add optimizations back in * write data to csv in demo * limit to same model choice(s) as docetl and lotus * fix punctuation error(s) * try run without filter * remove unused demo file * remove print * remove prints * remove costed_phys_op_ids which were used for debugging * try slightly diff. approach * remove temp changes while branch is in PR review * remove depends_on for map * fix iteration bug in sentinel processors * one more hotfix * fix more errors w/SentinelPlanStats and sentinel processors * remove logger lib to reduce confusion (#159) * Update research.md (#160) AISD @ NAACL 2025 * Add Pneuma-Palimpzest Integration Demo (#158) * Add Pneuma demo * Remove dataset semantic column addition * Fix progress managers episode 2 attack of the clones (#156) * modifying ProgressManager class to allow for dynamically adding tasks * beginning to use new progress manager * initial rewrite of execute_plan methods with new progress manager * unit tests passing * trim a few lines * unit tests passing; changes applied everywhere; MAB and Random coming in a separate PR * enable final operator to show progress in parallel * initial work to refactor sentinel processors * passing unit tests * checking in minor changes * remove use of setup_logger inside library * stuff seems to be working * big print * turn off rag for test * try debugging exception * checking in code before changes to scoring * finished initial refactoring of mab sentinel execution strategy * get random sampling execution working with changes * passing unit tests * nosentinel progress looks good * eyeball test is working for progress bars * remove the old gods * revert small change * pull up progress manager logic in parallel execution * catch errors in generating embeddings * fix comments * Merging in Changes for Sentinel Progress Bars; Split Convert (off by default); `demos/enron-demo.py`; and MMQA Benchmark (#163) * modifying ProgressManager class to allow for dynamically adding tasks * beginning to use new progress manager * initial rewrite of execute_plan methods with new progress manager * unit tests passing * trim a few lines * unit tests passing; changes applied everywhere; MAB and Random coming in a separate PR * enable final operator to show progress in parallel * initial work to refactor sentinel processors * passing unit tests * checking in minor changes * remove use of setup_logger inside library * stuff seems to be working * big print * turn off rag for test * try debugging exception * checking in code before changes to scoring * finished initial refactoring of mab sentinel execution strategy * get random sampling execution working with changes * passing unit tests * nosentinel progress looks good * eyeball test is working for progress bars * remove the old gods * revert small change * pull up progress manager logic in parallel execution * adding prints to generator; turn progress off in favor of verbose for now * catch errors in generating embeddings * inspect frontier updates * remove args.workload * fix num_inputs in selectivity computation * pdb in score * fixed score fn issue * use execution cache to avoid unnecessary computation; use sentinel stats for updating frontier * fix progress counter * debug * fix empty stats * only count stats from newly computed results * fix tuple unpacking * only update sample counts for llm ops * de-dup duplicate record * ugh * dont forget to increment * plz * more plz * increment * recycle ops back onto reservoir so they may be reconsidered in the future * remove pdb * add progress to script args * try without rag * use term recall * just check in on term recall * make it easier to turn off progress * remove pdb * try to get re-rank to keep all inputs * try to generate more reactions * track total LLM calls * 10x parallelism * try retrieve directly on fulltext * up max workers * adding enron-demo w/optimization * remove config option * adding recall and precision to output * allow operators to be recycled back onto frontier * revert to using reactions instead of fulltext for similarity * better cycling of off-frontier operators * safety check on reservoir ops * remove pdb * fixing 5 results per query * investigate sampling behavior * check on seeds * remove pdb * test SplitConvert * debug chunking * fix bug in rag and split convert * run with chunks * test chunking logic * fix chunking logic * sum list * remove split merge for now * minor fixes to CUAD script * add embedding scripts for mmqa tables and image titles * address issue with empty titles and title collisions * prepare script for using clip embeddings for images * fix bug * get full space of possible extensions * debug * weird bug fix? * more debug * fix idiotic mistake * handle corrupted images and minor things * add another corrupted image * another one * anotha * more bad images * last disallow file * prepare cuad for runs * specify execution strategy * up samples * add sentinel execution strategy to output name * adding plan str and more stats * specify no prior * verbose=False * fix comment; comment out prints * make split merge optional for now * addressing comments * applying syntax changes to pneuma demo and supporting strings within retrieve * bump version; fix lint; fix docs * more docs tweaks; tweaking dependencies * fix install issues * one more version fix * one more version fix * one more version fix * one more version fix * last try * change runner python version * actually changing runner python version * increase time limit for runners * increase time limit for runners * Merge in Changes From Final Abacus Work (WIP) (#173) * modifying ProgressManager class to allow for dynamically adding tasks * beginning to use new progress manager * initial rewrite of execute_plan methods with new progress manager * unit tests passing * trim a few lines * unit tests passing; changes applied everywhere; MAB and Random coming in a separate PR * enable final operator to show progress in parallel * initial work to refactor sentinel processors * passing unit tests * checking in minor changes * remove use of setup_logger inside library * stuff seems to be working * big print * turn off rag for test * try debugging exception * checking in code before changes to scoring * finished initial refactoring of mab sentinel execution strategy * get random sampling execution working with changes * passing unit tests * nosentinel progress looks good * eyeball test is working for progress bars * remove the old gods * revert small change * pull up progress manager logic in parallel execution * adding prints to generator; turn progress off in favor of verbose for now * catch errors in generating embeddings * inspect frontier updates * remove args.workload * fix num_inputs in selectivity computation * pdb in score * fixed score fn issue * use execution cache to avoid unnecessary computation; use sentinel stats for updating frontier * fix progress counter * debug * fix empty stats * only count stats from newly computed results * fix tuple unpacking * only update sample counts for llm ops * de-dup duplicate record * ugh * dont forget to increment * plz * more plz * increment * recycle ops back onto reservoir so they may be reconsidered in the future * remove pdb * add progress to script args * try without rag * use term recall * just check in on term recall * make it easier to turn off progress * remove pdb * try to get re-rank to keep all inputs * try to generate more reactions * track total LLM calls * 10x parallelism * try retrieve directly on fulltext * up max workers * adding enron-demo w/optimization * remove config option * adding recall and precision to output * allow operators to be recycled back onto frontier * revert to using reactions instead of fulltext for similarity * better cycling of off-frontier operators * safety check on reservoir ops * remove pdb * fixing 5 results per query * investigate sampling behavior * check on seeds * remove pdb * test SplitConvert * debug chunking * fix bug in rag and split convert * run with chunks * test chunking logic * fix chunking logic * sum list * remove split merge for now * minor fixes to CUAD script * add embedding scripts for mmqa tables and image titles * address issue with empty titles and title collisions * prepare script for using clip embeddings for images * fix bug * get full space of possible extensions * debug * weird bug fix? * more debug * fix idiotic mistake * handle corrupted images and minor things * add another corrupted image * another one * anotha * more bad images * last disallow file * prepare cuad for runs * specify execution strategy * up samples * add sentinel execution strategy to output name * adding plan str and more stats * specify no prior * verbose=False * fix comment; comment out prints * make split merge optional for now * addressing comments * applying syntax changes to pneuma demo and supporting strings within retrieve * add prints * debug sample sets * checking in code before tweaks to mab * state of repo after running final Abacus experiments * revert to opt-profiling-data * removing print statement * remove prints * final fixes * removing ragatouille dependency * fix ruff lint checks * bump version * passing tests locally * remove pdb * fix complaint about match * Move Abacus Research Scripts into Separate Folder (#175) * re-organizing abacus research-related scripts * fix model selection and other tweaks * add data download script * bump version * remove scripts from root * removing python files which were merged back in from main * Fixed Issue(s) with Aggregate Operator Computation for Movie Queries (WIP) (#182) * queries 1-4 working for movies * removing RandomSampling * Create `Context` Class + `compute` and `search` operators (#186) * checking in changes * refactored Dataset * checking in * checking in * checking in * queries extract final answer now * checking in changes w/search operator * adding changes to agents * add isinstance checks to all executors * removing script * remove tools; include in future PR * Remove `pz.Schema` in Favor of Using `pydantic.BaseModel` (#188) * made changes throughout codebase and updated unit tests * checking in; debugging failure with image use case * simple demo / paper demos working * eliminate caching features (#195) * removing all code synthesis (#198) * removing all code synthesis * remove unused import * Using LiteLLM to Manage Generator Clients / Completion APIs (#200) * use LiteLLM for generators * remove unused function; add TODO * Added Anthropic Support; Simplified Rules; Removed Redundant Model Helpers (#202) * changes after simplifying rules * passing unit tests; removed unnecessary model helpers * simplified primitives slightly * fixing the assertion which used FieldInfo instead of FieldInfo.annotation (#204) * add support for o4-mini, gemini-2.5-pro, gemini-2.0-flash, llama-4-maverick (#205) * Adding Semantic Join Operator (#206) * initial changes to support validator class; fixed bug in generator for images * adding validator based optimization * validator agent example working * using o1 model; made validation more efficient * added initial nested loops join implementation * passing tests * unit tests passing * unit tests passing * enron-demo.py working * join demos in place * parallel join and other bugfixes (#207) * audio-demo (#208) * remove pdb * adding option to only use gemini models in audio demo * adding parallelism; fixed bug w/unique_logical_op_id (#209) * fixed issue which removed pipelined execution of operators in parallel setting (#210) * Movie bugfixes (#211) * fixed error in cost computation for gemini models; tested join on movie queries * make join count monotonic * removing progress bar updates for join for now * adding reasoning effort (#212) * made progress manager more efficient; made join op calculations accurate (#213) * make groupby ignore None values * make it possible to specify schema for MemoryDataset; reasoning model fixes * adding audio-only match in substitution (#214) * quick fix for audio prompt missing in MoA * support passing in gemini/vertex credentials path; fix minor bugs in audio generation (#216) * adding Distinct operator to PZ (#217) * masking filepaths for sembench; fix audio pricing (#218) * make GroupBySig a pz. import * remove email demo * reproduce abacus results * add notes about deprecation to scripts for generating priors * remove unsupported demos * sem_add_columns -> sem_map * Dev staging (#220) * edit cuad abacus scripts to use loacl data * edit cuad abacus scripts to use local data * edit cuad abacus scripts to use local data * fix: cuad data loader doesn't work via huggingface anymore (#215) * edit cuad abacus scripts to use loacl data * edit cuad abacus scripts to use local data * edit cuad abacus scripts to use local data --------- Co-authored-by: mdr223 <mdrusso@mit.edu> --------- Co-authored-by: Shreya Shankar <ss.shankar505@gmail.com> * adding early support for vllm models * changes to appease linter * remove models now that we have access to gpt-5 * only perform time check on local; CI runners are slow * Support google api and desc (#222) * support shreya models and re-support desc * adding gpt-5-nano to gpt-5 models * bump version * fixed merge error * fixing bug where id column in schema overrides DataRecord.id --------- Co-authored-by: Jun <130543538+chjuncn@users.noreply.github.com> Co-authored-by: Gerardo Vitagliano <vitaglianog@gmail.com> Co-authored-by: Sivaprasad Sudhir <sivaprasad2626@gmail.com> Co-authored-by: Yash Agarwal <yash94404@gmail.com> Co-authored-by: Yash Agarwal <yashaga@Yashs-Air.attlocal.net> Co-authored-by: Bari Bo LeBari <143016395+lilbarbar@users.noreply.github.com> Co-authored-by: Bari LeBari <barilebari@dhcp-10-29-207-160.dyn.MIT.EDU> Co-authored-by: muhamed <muhamed@mit.edu> Co-authored-by: Tranway1 <tranway@qq.com> Co-authored-by: Luthfi Balaka <luthfibalaka@gmail.com> Co-authored-by: Shreya Shankar <ss.shankar505@gmail.com> * Add Optimizations for Filter and Join Operators (#230) * rename files to reflect that they will contain filter and map physical operators * passing map unit tests * passing filter tests * finished tests * adding tests for joins and initial embedding join * adding vllm test * fixed embedding join * filter for filepaths instead of assert * add embedding cost * fixed full hashes bug with deep copy * bump version * undo linting change * Reorder bug (#232) * fixing map/filter/join tests for CI which doesn't have GEMINI access; adding test for real estate bug * added exploration to re-order converts * separate lack of gemini from ci tests * Data Record Refactor (#233) * Refactor DataRecord to hold data in the BaseModel member instead of separately. * Some type fixes * local unit tests passing * enforce data record id uses list of schema fields * remove unused code from copy * use function instead of class internals --------- Co-authored-by: Tianyu Li <litianyu@mit.edu> * Updating Website to Use Docusaurus (#234) * adding docusaurus website; still haven't updated doc content and home page * fix links at bottom of page * updated pages for website; docs are still not auto-rendered * updating ci pipelines * update path to package * update node version * update package * fix build commands * fix trigger * fix runner and import * fix some DataRecord inits * switch to running llms w/separate flag b/c one test can fail due to bad generation * changes to be more flexible on types for abacus scripts * guessing at fix for build path * removing old website * remove commented ci code * remove mkdocs from pyproject * remove prints * fix location of CNAME file * Opt fixes (#236) * fixed errors in optimizer * added palimpchat page * passing unit tests * also relax types on train datasets * bump version * try lowercasing c * fixed route * eliminate slowdown from stringifying sentinel plan(s) * bump version * allow enron demo to swap filters w/convert * remove print statements in validator and fix bug introduced for bytes fields * bump version * adding min and max * fixing assertion error * fix no reasoning prompt templating issue(s) * add semantic aggregation operator * bump version * fix mock call in unit test * add google analytics tracking * Updated Website User Guide(s); Renamed `retrieve()` --> `sem_topk()` (#244) * checking in in-flight changes * adding code for unmatched records in left/right/outer joins * optimization stuck * new mmqa script is functional * minor bugfixes * fix naive estimates with new operators * updated website user guides; renamed retrieve --> top-k * fix defaults for join op * bumping version * fix documentation links * Add Cost-Based Sample Budget; Fix RAGConvert/Filter for `str | Any` Types (#247) * checking in in-flight changes * adding code for unmatched records in left/right/outer joins * optimization stuck * new mmqa script is functional * minor bugfixes * fix naive estimates with new operators * updated website user guides; renamed retrieve --> top-k * add cost-based sample budget; fix rag convert and filter for str | Any fields * Fix missing comma causing vLLM completions to break (#246) * bumping version * Final Changes from Revision for Abacus (#250) * checking in in-flight changes * adding code for unmatched records in left/right/outer joins * optimization stuck * new mmqa script is functional * minor bugfixes * fix naive estimates with new operators * updated website user guides; renamed retrieve --> top-k * add cost-based sample budget; fix rag convert and filter for str | Any fields * pushing local mmqa experiment * try n=20 * preparing final runs for table 2 * fix thread safety issue w/EmbeddingJoin * adding full ablation study * bugfixes in operators * adding final revision work from local * updated readme * adding changes from berners-lee * remove comments * fix linting and bump version * Blebari task 131 (#241) * . * . * minor tweaks * add embedding costs to RecordOpStats * minor tweaks * change comment --------- Co-authored-by: Bari LeBari <barilebari@dhcp-10-29-128-127.dyn.mit.edu> Co-authored-by: Bari LeBari <barilebari@Baris-MacBook-Pro.local> Co-authored-by: Matthew Russo <mdrusso@mit.edu> * adding real-estate-eval-100 to download script * adding real-estate-demo * jczhang add model checks (#254) * adding checks that user has support for models they need * check if available models is empty * trying to resolve dependency * bump version * gemini studio api issue (#257) * recreating the issue * fixing model provider for google AI studio * add try-except back --------- Co-authored-by: Matthew Russo <mdrusso@mit.edu> * bump version * fix model check * Fix no reasoning (#270) * enforce that setting reasoning effort to None turns of reasoning prompts; fix config copy error * bump version * update constants to reflect the cached-input token costs * update GenerationStats * update GenerationStats to include cache token/cost * fix typo * update stats in GenerationStats * prompt cahing implementation * split cache tokens into read and creation * restructure prompt caching into PromptCacheManager class * update CacheManager class * caching demo * add claude sonnet 4.0 (temporary) * fix pretty print error for anthropic * propagate cache-related stats from end-to-end * fix bug for gemini model * claude-3-7 deprecated * fix formatting issues * fix formatting issues * fixing comments * update token/cost logic to be disjoint for input and cache * update demo * Generalize Support for LiteLLM Models #265 (#272) * model_info (Model -> ConfiguredModel in constants) - 265 * predictor function for unknown spec * update full list of API keys * add gemini3 and gpt5.2 to constants * return models based on opt obj when models is None * reorganize functions in model info/helper * add tests and update model references and imports * move validation from config to query processor * add json file for model score/latency and update predictor function * update model references and imports * update dependencies and related test cases * update Model to have both string and enum * model_info -> model_helper * update model usage in query config * rollback import changes for CuratedModel -> Model * ModelProvider class * update all switch cases to ModelProvider when applicable * reverted CuratedModel changes * add test cases * add additional test cases * fix formatting issues * add prompt caching stats for #262 * restructure Model class * fix Model enum issue * add sorting logic to model class * use singular json file for info fetching * expand model list and updates curated_model_info file * restructure model info fetching, update Model class and test cases * script to update pz_models_information and update get_optimal_models * is_deepseek_model * add audio cache read/creation * remove claude sonnet 3.5 (retired) * add deepseek-chat * add .json files to pyproject.toml so that is packaged too * revert uvicorn dependency * some small tweaks * passing tests --------- Co-authored-by: joycequ <joycequ@mit.edu> Co-authored-by: Matthew Russo <mdrusso@mit.edu> * fixed model function calls * clean up duplicate code to help with summing field stats * update fields for classes in models.py, update usage in generators.py * add test generation file * add test generation file * generator messages * update anthropic stats * update input/cache token stats * remove generator messages from github repo * update generator test cases and implement initial gemini wrapper class * delete output audio tokens and update gemini client class * ruff lint for test cases * fix gemini reasoning effort bug * fix cost and image issues * incorporate all pr comments * make anthropic version more flexible * Revert "make anthropic version more flexible" This reverts commit 8eeed67. * floatify everything * all but two tests passing * bump version and relax tests * Local Model Execution (vLLM) #266 (#282) * local vllm execution implementation * update vllm local specs (predictors) * more robust detection of local model capabilities * fix formatting * test script formatting update * adding placeholder for vllm cache tokens * remove prints * remove print * reverted type * fix type annotation * tests passing --------- Co-authored-by: Matthew Russo <mdrusso@mit.edu> * Allowing other provider than OpenAI for embeddings (#283) * Removing hard-coded TEXT_EMBEDDING_3_SMALL in RAG and JOIN operators * remove whitespace * fixed embedding access in RAGFilter * fix id/op_params for RAG ops and EmbeddingJoin; update rules to enforce CLIP cannot be used for text-only * fix value * unit tests passing --------- Co-authored-by: Matthew Russo <mdrusso@mit.edu> * fixed issue #286 and bumped version * fix linter errors --------- Co-authored-by: Tianyu Li <litianyu@mit.edu> Co-authored-by: Jun <130543538+chjuncn@users.noreply.github.com> Co-authored-by: Gerardo Vitagliano <vitaglianog@gmail.com> Co-authored-by: Sivaprasad Sudhir <sivaprasad2626@gmail.com> Co-authored-by: Yash Agarwal <yash94404@gmail.com> Co-authored-by: Yash Agarwal <yashaga@Yashs-Air.attlocal.net> Co-authored-by: Bari Bo LeBari <143016395+lilbarbar@users.noreply.github.com> Co-authored-by: Bari LeBari <barilebari@dhcp-10-29-207-160.dyn.MIT.EDU> Co-authored-by: muhamed <muhamed@mit.edu> Co-authored-by: Tranway1 <tranway@qq.com> Co-authored-by: Luthfi Balaka <luthfibalaka@gmail.com> Co-authored-by: Shreya Shankar <ss.shankar505@gmail.com> Co-authored-by: Griffin Roupe <31631417+frostyfan109@users.noreply.github.com> Co-authored-by: Bari LeBari <barilebari@dhcp-10-29-128-127.dyn.mit.edu> Co-authored-by: Bari LeBari <barilebari@Baris-MacBook-Pro.local> Co-authored-by: Jerry Zhang <122544742+xqlcn@users.noreply.github.com> Co-authored-by: joycequ <joycequ2016@gmail.com> Co-authored-by: joycequu <65379523+joycequu@users.noreply.github.com> Co-authored-by: joycequ <joycequ@mit.edu> Co-authored-by: SoTrx <11771975+SoTrx@users.noreply.github.com>
1 parent 71de936 commit 738b698

58 files changed

Lines changed: 7008 additions & 1114 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.gitignore

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,9 @@ paper-imgs/
1919
testdata/enron-tiny.csv
2020
testdata/*/
2121
testdata/*.tar.gz
22+
tests/pytest/data/generator_messages/
23+
scripts/provider_stats/
24+
scripts/litellm_stats/
2225

2326
# python artifacts
2427
*.egg-info
@@ -53,8 +56,14 @@ testdata/enron-eval/*.txt
5356
pyrightconfig.json
5457

5558
myenv/
59+
pz-env/
5660

5761
# abacus-research data
5862
abacus-research/cuad-data/*
5963
abacus-research/opt-profiling-data/*
6064
abacus-research/parse-answer-errors/*
65+
66+
# stats
67+
scripts/litellm_stats/
68+
scripts/provider_stats/
69+
tests/pytest/data/generator_messages/

abacus-research/helper-scripts/mmqa-baseline.py

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@
77
import numpy as np
88
from openai import OpenAI
99

10-
from palimpzest.constants import MODEL_CARDS, Cardinality, Model
10+
from palimpzest.constants import Cardinality, Model
1111
from palimpzest.query.generators.generators import get_json_from_answer
1212

1313

@@ -109,8 +109,9 @@ def f1(preds: list | None, targets: list):
109109
completion = client.chat.completions.create(**payload)
110110

111111
# compute total cost
112-
usd_per_input_token = MODEL_CARDS[model_name]["usd_per_input_token"]
113-
usd_per_output_token = MODEL_CARDS[model_name]["usd_per_output_token"]
112+
model = Model(model_name)
113+
usd_per_input_token = model.get_usd_per_input_token()
114+
usd_per_output_token = model.get_usd_per_output_token()
114115
input_tokens = completion.usage.prompt_tokens
115116
output_tokens = completion.usage.completion_tokens
116117
total_cost += input_tokens * usd_per_input_token + output_tokens * usd_per_output_token

demos/caching-demo.py

Lines changed: 318 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,318 @@
1+
#!/usr/bin/env python3
2+
"""
3+
Realistic Demo showcasing prompt caching capabilities in Palimpzest.
4+
5+
This demo processes multiple employee travel requests against a comprehensive
6+
Corporate Travel Policy. The policy text (~2000 tokens) is included in the
7+
system prompt, creating a realistic scenario for prompt caching where a large
8+
static context is reused across multiple dynamic inputs.
9+
10+
Workload:
11+
- Context: A lengthy 10-page Corporate Travel & Expense Policy.
12+
- Input: Short email requests from employees.
13+
- Task: Analyze each request for policy compliance, identifying violations and reimbursable amounts.
14+
15+
Supported caching providers:
16+
- OpenAI (GPT-4o, GPT-4o-mini): Automatic prefix caching
17+
- Anthropic (Claude 3.5 Sonnet/Haiku): Explicit cache_control markers
18+
- Gemini: Implicit caching
19+
"""
20+
21+
import argparse
22+
import os
23+
import time
24+
from typing import List
25+
26+
from dotenv import load_dotenv
27+
28+
import palimpzest as pz
29+
from palimpzest.constants import Model
30+
from palimpzest.core.lib.schemas import TextFile
31+
32+
load_dotenv()
33+
34+
# =============================================================================
35+
# MOCK DATA: CORPORATE TRAVEL POLICY (Static Context > 1024 tokens)
36+
# =============================================================================
37+
CORPORATE_TRAVEL_POLICY = """
38+
GLOBAL CORP TRAVEL & EXPENSE POLICY (v2024.1)
39+
40+
SECTION 1: OVERVIEW AND PHILOSOPHY
41+
Global Corp expects employees to act responsibly and professionally when incurring and submitting costs.
42+
The company will reimburse employees for reasonable and necessary expenses incurred during approved business travel.
43+
This policy applies to all employees, contractors, and consultants.
44+
45+
SECTION 2: AIR TRAVEL
46+
2.1 Booking Window: All domestic flights must be booked at least 14 days in advance. International flights must be booked 21 days in advance.
47+
2.2 Class of Service:
48+
- Economy Class: Required for all domestic flights under 6 hours.
49+
- Premium Economy: Allowed for domestic flights over 6 hours or international flights under 8 hours.
50+
- Business Class: Allowed for international flights exceeding 8 hours duration.
51+
- First Class: Strictly prohibited unless approved by the CEO.
52+
2.3 Ancillary Fees:
53+
- Checked Bags: Up to two bags reimbursed for trips > 3 days. One bag for trips <= 3 days.
54+
- Wi-Fi: Reimbursed only if business justification is provided (e.g., "urgent client deadline").
55+
- Seat Selection: Fees > $50 require VP approval.
56+
57+
SECTION 3: LODGING
58+
3.1 Hotel Caps (Nightly Rates excluding taxes):
59+
- Tier 1 Cities (NY, London, Tokyo, SF, Zurich): $350 USD
60+
- Tier 2 Cities (Chicago, Paris, Berlin, Austin): $250 USD
61+
- All Other Locations: $175 USD
62+
3.2 Room Type: Standard single rooms only. Suites are prohibited.
63+
3.3 Laundry: Reasonable laundry expenses reimbursed for trips exceeding 5 consecutive nights.
64+
65+
SECTION 4: MEALS AND ENTERTAINMENT
66+
4.1 Daily Meal Allowance (Per Diem):
67+
- Tier 1 Cities: $100/day
68+
- Tier 2 Cities: $75/day
69+
- Others: $60/day
70+
4.2 Client Entertainment:
71+
- Must include at least one current or prospective client.
72+
- Cap is $150 per person (including employees).
73+
- Names and affiliations of all attendees must be documented.
74+
4.3 Alcohol:
75+
- Reimbursable only with dinner.
76+
- Moderate consumption allowed (max 2 drinks per person).
77+
- "Top Shelf" liquors prohibited.
78+
79+
SECTION 5: GROUND TRANSPORTATION
80+
5.1 Ride Share/Taxi: Preferred mode for travel between airport and hotel.
81+
5.2 Car Rentals:
82+
- Class: Intermediate/Mid-size or smaller.
83+
- Insurance: Decline CDW/LDW (covered by corporate policy).
84+
- Fuel: Pre-paid fuel options are prohibited; cars must be returned full.
85+
5.3 Rail: Economy/Standard class only. Acela Business Class permitted for Northeast Corridor travel.
86+
87+
SECTION 6: MISCELLANEOUS
88+
6.1 Tipping:
89+
- Meals: 15-20%
90+
- Taxis: 10-15%
91+
- Bellhop: $1-2 per bag
92+
6.2 Non-Reimbursable Items:
93+
- Personal grooming/toiletries.
94+
- Fines (parking, speeding).
95+
- Airline club memberships.
96+
- In-room movies.
97+
- Lost luggage/property.
98+
99+
SECTION 7: SUBMISSION PROCESS
100+
Expenses must be submitted within 30 days of trip completion. Receipts required for all expenses > $25.
101+
"""
102+
103+
# =============================================================================
104+
# MOCK DATA: EMPLOYEE REQUESTS (Dynamic Inputs)
105+
# =============================================================================
106+
EMPLOYEE_REQUESTS = [
107+
# Request 1: Compliant
108+
"""Subject: Trip to London
109+
I booked a flight to London (8.5 hours) in Business Class for the client summit.
110+
Hotel is $320/night. Meal expenses were about $90/day.
111+
Receipts attached.""",
112+
# Request 2: Violation (Booking window & First Class)
113+
"""Subject: Urgent NY Trip
114+
I need to fly to New York tomorrow. Booked First Class because it was the only seat left.
115+
Hotel is the Ritz at $500/night.
116+
Also expensed $40 for in-flight Wi-Fi to finish the Q3 report.""",
117+
# Request 3: Violation (Car Rental & Alcohol)
118+
"""Subject: Austin Conference
119+
Rented a luxury SUV for the team in Austin.
120+
Dinner with the team (no clients) came to $800 ($200/person) including 3 bottles of wine.
121+
Hotel was $240/night.""",
122+
# Request 4: Compliant (Tier 2 City)
123+
"""Subject: Berlin Site Visit
124+
Flew Economy to Berlin. Hotel was $220/night.
125+
Took a taxi from TXL ($45 + $5 tip).
126+
Daily meals averaged $70.""",
127+
# Request 5: Violation (Misc items)
128+
"""Subject: Tokyo Tech Symposium
129+
Trip duration: 4 days.
130+
Expensed:
131+
- Flight (Premium Econ, 11 hours)
132+
- Hotel ($340/night)
133+
- Laundry service ($60)
134+
- Forgotten toothbrush replacement ($15)
135+
- Parking ticket ($50)
136+
""",
137+
]
138+
139+
# Output Schema
140+
OUTPUT_SCHEMA = [
141+
{"name": "status", "type": str, "desc": "One of: 'COMPLIANT', 'PARTIAL_VIOLATION', 'MAJOR_VIOLATION'"},
142+
{
143+
"name": "violations",
144+
"type": str,
145+
"desc": "A list of specific policy violations found, referencing the specific section numbers (e.g., 'Violation of Section 2.2'). If compliant, return 'None'.",
146+
},
147+
{
148+
"name": "reimbursable_summary",
149+
"type": str,
150+
"desc": "A concise summary of what should be reimbursed vs rejected based on the policy text.",
151+
},
152+
{
153+
"name": "flag_for_review",
154+
"type": bool,
155+
"desc": "True if the request requires manual review by a manager (e.g. for high amounts or ambiguous justifications).",
156+
},
157+
]
158+
159+
TASK_DESC = f"""
160+
You are an AI auditor for Global Corp. Your job is to review employee travel expense descriptions against the Corporate Travel Policy.
161+
The full policy text is provided below.
162+
163+
{CORPORATE_TRAVEL_POLICY}
164+
165+
Analyze the input email and determine if the expenses adhere to the policy.
166+
"""
167+
168+
169+
class TravelRequestDataset(pz.IterDataset):
170+
"""Custom dataset that provides travel requests as text records."""
171+
172+
def __init__(self, requests: List[str]):
173+
super().__init__(id="travel_requests", schema=TextFile)
174+
self.requests = requests
175+
176+
def __len__(self):
177+
return len(self.requests)
178+
179+
def __getitem__(self, idx: int):
180+
return {
181+
"filename": f"request_{idx + 1}.txt",
182+
"contents": self.requests[idx],
183+
}
184+
185+
186+
# Model mapping (Same as original)
187+
MODEL_MAPPING = {
188+
"gpt-4o": Model.GPT_4o,
189+
"gpt-4o-mini": Model.GPT_4o_MINI,
190+
"claude-4-0-sonnet": Model.CLAUDE_4_SONNET,
191+
# "claude-3-7-sonnet": Model.CLAUDE_3_7_SONNET, # deprecated model testing
192+
"claude-4-5-haiku": Model.CLAUDE_4_5_HAIKU,
193+
"gemini-2.5-flash": Model.GOOGLE_GEMINI_2_5_FLASH,
194+
# "deepseek-v3": Model.DEEPSEEK_V3,
195+
}
196+
197+
198+
def get_model_from_string(model_str: str) -> Model:
199+
if model_str.lower() in MODEL_MAPPING:
200+
return MODEL_MAPPING[model_str.lower()]
201+
for model in Model:
202+
if model.value.lower() == model_str.lower():
203+
return model
204+
raise ValueError(f"Unknown model: {model_str}")
205+
206+
207+
def print_cache_stats(execution_stats):
208+
"""Print cache-related statistics from execution."""
209+
print("\n" + "=" * 60)
210+
print(" CACHE STATISTICS & COST ANALYSIS")
211+
print("=" * 60)
212+
213+
# Token counts are now disjoint:
214+
# - input_text_tokens: regular (non-cached) input tokens
215+
# - cache_read_tokens: tokens read from cache (hits)
216+
# - cache_creation_tokens: tokens written to cache
217+
regular_input = execution_stats.input_text_tokens
218+
cache_read = execution_stats.cache_read_tokens
219+
cache_creation = execution_stats.cache_creation_tokens
220+
total_output = execution_stats.output_text_tokens
221+
total_embedding = execution_stats.embedding_input_tokens
222+
223+
# Logical total = regular + cache read + cache creation
224+
logical_total_input = regular_input + cache_read + cache_creation
225+
226+
print(f"{'Metric':<35} | {'Count':<15}")
227+
print("-" * 55)
228+
print(f"{'Logical Total Input Tokens':<35} | {logical_total_input:,}")
229+
print(f"{' - Regular Input (full rate)':<35} | {regular_input:,}")
230+
print(f"{' - Cache Read (discounted)':<35} | {cache_read:,}")
231+
print(f"{' - Cache Creation':<35} | {cache_creation:,}")
232+
print("-" * 55)
233+
print(f"{'Total Output Tokens':<35} | {total_output:,}")
234+
if total_embedding > 0:
235+
print(f"{'Total Embedding Input Tokens':<35} | {total_embedding:,}")
236+
print("-" * 55)
237+
print(f"{'Total Execution Cost':<35} | ${execution_stats.total_execution_cost:.6f}")
238+
239+
# Calculate and display cache hit rate
240+
# Hit rate = cache_read / (regular_input + cache_read)
241+
total_cacheable = regular_input + cache_read
242+
if total_cacheable > 0:
243+
hit_rate = (cache_read / total_cacheable) * 100
244+
print(f"\nCache Hit Rate: {hit_rate:.1f}%")
245+
246+
247+
def main():
248+
parser = argparse.ArgumentParser(description="Demo showcasing prompt caching in Palimpzest")
249+
parser.add_argument("--model", type=str, default="gpt-4o-mini", help="Model to use")
250+
parser.add_argument("--num-records", type=int, default=5, help="Number of requests to process")
251+
parser.add_argument("--verbose", action="store_true", help="Enable verbose output")
252+
parser.add_argument("--profile", action="store_true", help="Save profiling data")
253+
254+
args = parser.parse_args()
255+
model = get_model_from_string(args.model)
256+
257+
# Validate env vars (Simplified for brevity)
258+
if model.is_provider_openai() and not os.getenv("OPENAI_API_KEY"):
259+
print("ERROR: OPENAI_API_KEY not set")
260+
return
261+
if model.is_provider_anthropic() and not os.getenv("ANTHROPIC_API_KEY"):
262+
print("ERROR: ANTHROPIC_API_KEY not set")
263+
return
264+
if (model.is_provider_google_ai_studio() or model.is_provider_vertex_ai()) and not os.getenv("GOOGLE_API_KEY"):
265+
print("ERROR: GOOGLE_API_KEY not set")
266+
return
267+
268+
print("=" * 60)
269+
print(" PZ CACHING DEMO: CORPORATE AUDIT")
270+
print("=" * 60)
271+
print(f"Model: {model.value}")
272+
print(
273+
f"Policy Context Size: ~{len(CORPORATE_TRAVEL_POLICY.split())} words (~{int(len(CORPORATE_TRAVEL_POLICY.split()) * 1.3)} tokens)"
274+
)
275+
276+
# Repeat the request list if user wants more records than we have mocks
277+
base_requests = EMPLOYEE_REQUESTS
278+
requests = []
279+
while len(requests) < args.num_records:
280+
requests.extend(base_requests)
281+
requests = requests[: args.num_records]
282+
283+
print(f"Processing {len(requests)} travel requests...")
284+
285+
# Build Plan
286+
dataset = TravelRequestDataset(requests)
287+
288+
# The 'desc' field incorporates the huge CORPORATE_TRAVEL_POLICY string.
289+
# This ensures the System Prompt is large (>1024 tokens) and identical for all records.
290+
plan = dataset.sem_map(OUTPUT_SCHEMA, desc=TASK_DESC)
291+
292+
config = pz.QueryProcessorConfig(
293+
policy=pz.MaxQuality(),
294+
verbose=args.verbose,
295+
execution_strategy="sequential", # Sequential often easier to debug caching behavior initially
296+
available_models=[model],
297+
)
298+
299+
start_time = time.time()
300+
result = plan.run(config)
301+
end_time = time.time()
302+
303+
# Output Results
304+
print("\n" + "=" * 60)
305+
print(" AUDIT RESULTS")
306+
print("=" * 60)
307+
for i, record in enumerate(result.data_records):
308+
print(f"\n[Request {i + 1}]")
309+
print(f"Status: {record.status}")
310+
print(f"Violations: {record.violations}")
311+
print(f"Summary: {record.reimbursable_summary}")
312+
313+
print_cache_stats(result.execution_stats)
314+
print(f"\nWall Clock Time: {end_time - start_time:.2f}s")
315+
316+
317+
if __name__ == "__main__":
318+
main()

0 commit comments

Comments
 (0)