DataFusion + DeltaLake Operations Reference

VerifiedSafe

Supplies a comprehensive reference map for DataFusion and DeltaLake APIs, with strict rules to avoid guessing interfaces. Helps navigate local reference files and apply existing patterns in this repository when using these tools.

Sby Skills Guide Bot
Data & AIIntermediate
906/2/2026
Claude CodeCursorWindsurf
#datafusion#deltalake#query-engine#storage-layer#reference

Recommended for

Our review

This skill provides a structured reference map and operational rules for using DataFusion and DeltaLake in a repository, enabling accurate API usage through local probes and documentation lookup.

Strengths

  • Reduces API guesswork by relying on local references
  • Leverages existing code patterns in the repository
  • Covers many aspects (Rust, Python, SQL, planning, integration)
  • Encourages methodical environment probing

Limitations

  • Requires the reference files to be present in the repository
  • May not cover all edge cases or entirely new APIs
  • Depends on local DataFusion/DeltaLake versions
When to use it

When developing or debugging DataFusion/DeltaLake code in a repository that includes these reference files.

When not to use it

When the repository lacks the reference files or when exploring completely new undocumented APIs.

Security analysis

Safe
Quality score90/100

The skill only describes how to look up information in local references and probe the environment with safe Bash commands (e.g., version checks). No destructive actions, external network calls, or obfuscated payloads are instructed.

No concerns found

Examples

Look up Delta table registration
Open the DeltaLake integration reference and find how to register a Delta table from a given path using DataFusion's catalog.
Find UDF patterns
Search the repo for existing Rust UDF implementations, then open the Rust UDF reference to check the current contracts for Scalar UDFs.
Probe environment and check planning
Run a local probe to get the DataFusion version, then open the planning deep dive to understand how predicate pushdown works with DeltaLake scan providers.

name: dfdl_ref description: DataFusion + DeltaLake operations manual for this repo. DataFusion is the core query engine; DeltaLake provides the storage layer and integrates tightly via scan providers, schema bridging, and predicate pushdown. Use lookup + local probes; do not guess APIs. allowed-tools: Read, Grep, Glob, Bash

Operating rule: never guess DataFusion/DeltaLake/PyArrow/UDF APIs

When uncertain:

  1. Probe local environment (versions + available methods).
  2. Search the repo for how we already use it.
  3. Open the relevant reference file below (only the section you need).
  4. Implement using existing local patterns unless the plan says otherwise.

Reference map (open these files as needed)

  • Core DataFusion Python surfaces (IO, catalog, SQL, DataFrame API): reference/datafusion.md
  • "Best-in-class deployment gaps" (caching, stats, observability, planning knobs): reference/datafusion_addendum.md
  • Planning deep dive (logical/physical plan pipeline, introspection, optimization rules): reference/datafusion_planning.md
  • Rust UDF contracts (Scalar/UDAF/UDWF/Async/named args): reference/datafusion_rust_UDFs.md
  • Schema management + schema pitfalls: reference/datafusion_schema.md
  • DeltaLake ↔ DataFusion integration details: reference/deltalake_datafusion_integration.md
  • Advanced Rust integration (PyO3 packaging, wheels, CI, native module distribution): reference/datafusion_deltalake_advanced_rust_integration.md
  • DataFusionMixins trait (Delta snapshot schema + predicate parsing helpers): reference/deltalake_datafusionmixins.md
  • Plan combination (composing DataFusion plans via joins/unions/CTEs, Delta integration, parameterized queries, plan serialization): reference/datafusion_plan_combination.md
  • Rust LogicalPlan programmatic construction (LogicalPlanBuilder, Expr, schema/DFSchema, plan rewriting via TreeNode, extensibility, serialization): reference/Datafusion_logicplan_rust.md
  • DataFusion tracing (Rust community extension: execution spans, metrics capture, partial-result previews, rule-phase instrumentation, OpenTelemetry export): reference/datafusion-tracing.md
  • DeltaLake core (format/protocol, client APIs, 3-layer model): reference/deltalake.md
Related skills