Notre avis
Fournit une API Python pour interroger et traiter les données événementielles du lac de données Pointline, conçue pour la recherche quantitative avec Polars et Delta Lake.
Points forts
- Alignement temporel exact avec jointures de métadonnées de symboles.
- Gestion efficace des entiers à virgule fixe et des fenêtres temporelles.
- Prise en charge de multiples types de données d'échange (ordres, cotations, carnet d'ordres, marchés CN).
- Ingestion idempotente et garanties de relecture déterministe.
Limites
- Traitement hors ligne sur une seule machine, non distribué.
- Nécessite une familiarité avec Delta Lake et Polars.
- Limité aux schémas de données événementielles prédéfinis dans la couche Silver.
Lorsque vous devez effectuer une recherche quantitative hors ligne sur des données de trading haute fréquence avec une exactitude ponctuelle précise.
Pour le traitement de données en temps réel ou lorsque vous travaillez avec des données dans des formats autres que Delta Lake sans respecter les conventions de cette bibliothèque.
Analyse de sécurité
SûrThe skill is a static API reference for a local data processing library. It contains only Python code examples for reading/writing local files using Polars and Delta Lake, with no network, shell, or obfuscated operations. No execution risk is present.
Aucun point d'attention détecté
Exemples
Load trades for BTCUSDT on binance-futures from 2024-05-01 to 2024-05-02 from the Pointline Silver data lake, decode the scaled columns, and display the first 10 rows.Use Pointline to search for symbols with 'BTC' in the name on binance-futures exchange, returning the top 10 results.Build a one-minute clock spine for BTCUSDT on binance-futures from 2024-05-01 to 2024-05-02 using Pointline, then align trade events to that spine and show the resulting DataFrame.name: pointline-researcher description: >- Pointline data lake research API reference for quant researchers. Use when: (1) querying event data from the pointline Silver layer (trades, quotes, orderbook, CN L2/L3 tables), (2) discovering or resolving symbols via dim_symbol, (3) building or aligning to time-series spines (clock, trades, volume, dollar bars), (4) decoding fixed-point scaled integers to floats, (5) attaching PIT-correct symbol metadata via as-of joins, (6) filtering CN A-share data by trading phase, (7) ingesting Bronze files into Silver, (8) writing or reviewing code that reads from or writes to the pointline data lake, (9) understanding pointline schemas, table specs, or storage layout.
Pointline Data Lake — Research API
PIT-accurate offline data lake for quant research. Polars + Delta Lake, single-machine.
Core invariants: PIT correctness, deterministic replay, idempotent ingestion, lineage traceability via file_id + file_seq.
Quick Start
from pathlib import Path
from pointline.research import (
load_events, discover_symbols, load_symbol_meta,
decode_scaled_columns, join_symbol_meta,
build_spine, align_to_spine, ClockSpineConfig,
TradingPhase, filter_by_phase, add_phase_column,
)
silver = Path("./data/silver")
# 1. Discover symbols
syms = discover_symbols(silver_root=silver, exchange="binance-futures", q="BTC")
# 2. Load events — time window is [start, end) on ts_event_us
trades = load_events(
silver_root=silver, table="trades",
exchange="binance-futures", symbol="BTCUSDT",
start="2024-05-01", end="2024-05-02",
)
# 3. Decode fixed-point → float (only at research output, never mid-pipeline)
trades = decode_scaled_columns(trades, "trades") # adds price_decoded, qty_decoded
# 4. Attach PIT symbol metadata
trades = join_symbol_meta(trades, silver_root=silver, columns=["base_asset", "tick_size"])
# 5. Build spine and align
spine = build_spine(
silver_root=silver, exchange="binance-futures", symbol="BTCUSDT",
start="2024-05-01", end="2024-05-02",
builder="clock", config=ClockSpineConfig(step_us=60_000_000), # 1-min bars
)
aligned = align_to_spine(events=trades, spine=spine)
Timestamp & Fixed-Point Rules
- All timestamps: Int64 UTC microseconds (
*_ts_us). Event time =ts_event_us. TimestampInputaccepts:int(us),str(ISO),date,datetime. Naive datetimes treated as UTC.trading_date: derived fromts_event_usin exchange-local tz (crypto=UTC, CN=Asia/Shanghai).- Prices/quantities: Int64 scaled by
PRICE_SCALE = QTY_SCALE = 1_000_000_000. - Decode only at final research output via
decode_scaled_columns(). Never mid-pipeline.
Research API
load_events()
load_events(
silver_root: Path, table: str, exchange: str, symbol: str,
start: TimestampInput, end: TimestampInput,
columns: list[str] | None = None, include_lineage: bool = False,
) -> pl.DataFrame
- Window:
[start, end)onts_event_us. No implicit dim_symbol join. table:"trades","quotes","orderbook_updates","cn_order_events","cn_tick_events","cn_l2_snapshots".include_lineage=Trueto includefile_id,file_seq.- Returns sorted by table's tie-break keys.
discover_symbols() / load_symbol_meta()
discover_symbols(silver_root, exchange, q=None, as_of=None, include_meta=False, limit=50)
load_symbol_meta(silver_root, exchange, symbols=None, as_of=None, columns=None)
as_of=None: returns current rows only. Withas_of: rows valid at that timestamp.q: text search acrossexchange_symbol,canonical_symbol,base_asset.
decode_scaled_columns()
decode_scaled_columns(df, table, columns=None, in_place=False, suffix="_decoded")
- Default: adds
<col>_decodedcolumns, preserves originals. in_place=True: replaces original columns.
join_symbol_meta()
join_symbol_meta(df, silver_root, columns, ts_col="ts_event_us")
- PIT as-of join:
valid_from_ts_us <= ts_col < valid_until_ts_us. - Requires
exchange,symbol_id,ts_colin df. - Available metadata columns:
exchange_symbol,canonical_symbol,market_type,base_asset,quote_asset,tick_size,lot_size,contract_size,is_current,updated_at_ts_us.
Spine API
build_spine(silver_root, exchange, symbol, start, end, builder, config)
align_to_spine(events, spine, ts_col="ts_event_us", by=("exchange", "symbol"))
Builders:
| Builder | Config | Description |
|---|---|---|
| "clock" | ClockSpineConfig(step_us=N) | Regular time intervals |
| "trades" | TradesSpineConfig() | One point per unique trade timestamp |
| "volume" | VolumeSpineConfig(volume_threshold_scaled=N) | Cumulative volume bucket crossings |
| "dollar" | DollarSpineConfig(dollar_threshold_scaled=N) | Cumulative notional bucket crossings |
Volume/dollar thresholds use scaled integers (multiply by QTY_SCALE).
align_to_spine() uses forward as-of join (PIT-safe): events at spine boundary map to next bar.
CN Trading Phases
add_phase_column(df, exchange="szse", ts_col="ts_event_us", market_type=None)
filter_by_phase(df, exchange="szse", phases=[TradingPhase.MORNING, TradingPhase.AFTERNOON])
Phases: CLOSED, PRE_OPEN (09:15-09:25), MORNING (09:30-11:30), NOON_BREAK (11:30-13:00), AFTERNOON (13:00-14:57), CLOSING (14:57-15:00, SZSE only), AFTER_HOURS (15:05-15:30, STAR/ChiNext only).
Ingestion
from pointline.ingestion.pipeline import ingest_file
from pointline.protocols import BronzeFileMetadata
result = ingest_file(
meta=BronzeFileMetadata(vendor="tardis", data_type="trades", ...),
silver_root=silver, bronze_root=bronze,
)
Idempotent via manifest (vendor, data_type, bronze_path, file_hash).
Schema Reference
For complete table schemas (all columns, types, scales, partition/tie-break keys), see references/schemas.md.
Key Import Paths
from pointline import TRADES, QUOTES, ORDERBOOK_UPDATES, DIM_SYMBOL
from pointline.schemas import get_table_spec, list_table_specs
from pointline.schemas.types import PRICE_SCALE, QTY_SCALE, TableSpec, ColumnSpec
from pointline.ingestion.pipeline import ingest_file
from pointline.protocols import BronzeFileMetadata
from pointline.storage.delta import DeltaEventStore, DeltaDimensionStore
from pointline.storage.delta.layout import table_path
Critical Gotchas
- Never decode mid-pipeline. Fixed-point integers preserve precision; decode only at output.
- As-of joins, not exact. Symbol metadata is SCD2 with validity windows.
- Lunch break discontinuity. Never compute features across 11:30-13:00 CST for CN markets.
- No lookahead.
align_to_spinemaps forward. PIT coverage check quarantines orphans. - Deterministic ordering. Rely on tie-break keys, not insertion order.
- Schema-as-code. Always read canonical spec from
pointline/schemas/before touching tables.
Ingénierie de Prompts
Data & IA
Bonnes pratiques et templates de prompt engineering pour maximiser les résultats IA.
Visualisation de Données
Data & IA
Génère des visualisations de données et graphiques adaptés à vos données.
Architecture RAG
Data & IA
Guide de configuration d'architectures RAG (Retrieval-Augmented Generation).