Function scanParquet

  • Lazily read from a local or cloud-hosted parquet file (or files).

    This function allows the query optimizer to push down predicates and projections to the scan level, typically increasing performance and reducing memory overhead.

    This allows the query optimizer to push down predicates and projections to the scan level, thereby potentially reducing memory overhead.

    Parameters

    • source: string

      Path(s) to a file. If a single path is given, it can be a globbing pattern.

    • options: ScanParquetOptions = {}

      Options for scanParquet

      • OptionalallowMissingColumns?: boolean
      • Optionalcache?: boolean
      • OptionalcloudOptions?: Map<string, string>
      • Optionalglob?: boolean
      • OptionalhivePartitioning?: boolean
      • OptionalhiveSchema?: unknown
      • OptionalincludeFilePaths?: string
      • OptionallowMemory?: boolean
      • OptionalnRows?: number
      • Optionalparallel?: "auto" | "columns" | "row_groups" | "none"
      • Optionalrechunk?: boolean
      • Optionalretries?: number
      • OptionalrowIndexName?: string
      • OptionalrowIndexOffset?: number
      • OptionaltryParseHiveDates?: boolean
      • OptionaluseStatistics?: boolean

    Returns LazyDataFrame