Our review
This skill provides a structured reference map and operational rules for using DataFusion and DeltaLake in a repository, enabling accurate API usage through local probes and documentation lookup.
Strengths
- Reduces API guesswork by relying on local references
- Leverages existing code patterns in the repository
- Covers many aspects (Rust, Python, SQL, planning, integration)
- Encourages methodical environment probing
Limitations
- Requires the reference files to be present in the repository
- May not cover all edge cases or entirely new APIs
- Depends on local DataFusion/DeltaLake versions
When developing or debugging DataFusion/DeltaLake code in a repository that includes these reference files.
When the repository lacks the reference files or when exploring completely new undocumented APIs.
Security analysis
SafeThe skill only describes how to look up information in local references and probe the environment with safe Bash commands (e.g., version checks). No destructive actions, external network calls, or obfuscated payloads are instructed.
No concerns found
Examples
Open the DeltaLake integration reference and find how to register a Delta table from a given path using DataFusion's catalog.Search the repo for existing Rust UDF implementations, then open the Rust UDF reference to check the current contracts for Scalar UDFs.Run a local probe to get the DataFusion version, then open the planning deep dive to understand how predicate pushdown works with DeltaLake scan providers.name: dfdl_ref description: DataFusion + DeltaLake operations manual for this repo. DataFusion is the core query engine; DeltaLake provides the storage layer and integrates tightly via scan providers, schema bridging, and predicate pushdown. Use lookup + local probes; do not guess APIs. allowed-tools: Read, Grep, Glob, Bash
Operating rule: never guess DataFusion/DeltaLake/PyArrow/UDF APIs
When uncertain:
- Probe local environment (versions + available methods).
- Search the repo for how we already use it.
- Open the relevant reference file below (only the section you need).
- Implement using existing local patterns unless the plan says otherwise.
Reference map (open these files as needed)
- Core DataFusion Python surfaces (IO, catalog, SQL, DataFrame API): reference/datafusion.md
- "Best-in-class deployment gaps" (caching, stats, observability, planning knobs): reference/datafusion_addendum.md
- Planning deep dive (logical/physical plan pipeline, introspection, optimization rules): reference/datafusion_planning.md
- Rust UDF contracts (Scalar/UDAF/UDWF/Async/named args): reference/datafusion_rust_UDFs.md
- Schema management + schema pitfalls: reference/datafusion_schema.md
- DeltaLake ↔ DataFusion integration details: reference/deltalake_datafusion_integration.md
- Advanced Rust integration (PyO3 packaging, wheels, CI, native module distribution): reference/datafusion_deltalake_advanced_rust_integration.md
- DataFusionMixins trait (Delta snapshot schema + predicate parsing helpers): reference/deltalake_datafusionmixins.md
- Plan combination (composing DataFusion plans via joins/unions/CTEs, Delta integration, parameterized queries, plan serialization): reference/datafusion_plan_combination.md
- Rust LogicalPlan programmatic construction (LogicalPlanBuilder, Expr, schema/DFSchema, plan rewriting via TreeNode, extensibility, serialization): reference/Datafusion_logicplan_rust.md
- DataFusion tracing (Rust community extension: execution spans, metrics capture, partial-result previews, rule-phase instrumentation, OpenTelemetry export): reference/datafusion-tracing.md
- DeltaLake core (format/protocol, client APIs, 3-layer model): reference/deltalake.md
Prompt Engineering
Data & AI
Prompt engineering best practices and templates to maximize AI outputs.
Data Visualization
Data & AI
Generates data visualizations and charts tailored to your data.
RAG Architecture Setup
Data & AI
Setup guide for RAG (Retrieval-Augmented Generation) architectures.