DataFrameStatFunctions

Calculates the approximate quantiles of a numerical column.

Value parameters

col: the column to compute quantiles for.
probabilities: quantile probabilities, each in [0.0, 1.0] (e.g. 0.5 is the median).
relativeError: the relative target precision; 0.0 yields exact quantiles (at high cost).

Attributes

Returns: the approximate quantiles, one per probability.

Calculates the approximate quantiles of numerical columns.

Value parameters

cols: the columns to compute quantiles for.
probabilities: quantile probabilities, each in [0.0, 1.0].
relativeError: the relative target precision; 0.0 yields exact quantiles (at high cost).

Attributes

Returns: an array of quantile arrays, one inner array per column.

Builds a Bloom filter over the given column, sized for expectedNumItems items with a target false-positive probability fpp.

Attributes

Builds a Bloom filter over the given column for expectedNumItems and target fpp.

Attributes

Builds a Bloom filter over the given column with an explicit number of bits.

Attributes

Calculates the Pearson correlation coefficient of two columns.

Attributes

Returns: the correlation of col1 and col2.

Calculates the correlation of two columns.

Value parameters

method: the correlation method; currently only "pearson" is supported.

Attributes

Returns: the correlation of col1 and col2.

Builds a Count-Min Sketch over the given column with the given relative error (eps), confidence and random seed. The sketch is computed by a server-side aggregate and deserialized on the client.

Attributes

Builds a Count-Min Sketch over the given column.

Attributes

Calculates the sample covariance of two numerical columns.

Attributes

Returns: the sample covariance of col1 and col2.

Computes a pair-wise frequency table (contingency table) of the given columns.

Attributes

Returns: a Dataset containing the contingency table.

Finds frequent items for the given columns, with the default support 0.01.

Attributes

Returns: a Dataset of frequent items per column.

Finds frequent items for the given columns.

Value parameters

support: the minimum frequency for an item to be considered frequent, in (0.0, 1.0].

Attributes

Returns: a Dataset of frequent items per column.

Returns a stratified sample without replacement, keyed by the values in col.

Value parameters

col: the column defining the strata.
fractions: a stratum -> sampling fraction mapping; fractions are in [0.0, 1.0].
seed: the random seed.

Attributes

Returns: a Dataset containing the stratified sample.

DataFrameStatFunctions

Attributes

Members list

Value members

Concrete methods

Value parameters

Attributes

Value parameters

Attributes

Attributes

Attributes

Attributes

Attributes

Value parameters

Attributes

Attributes

Attributes

Attributes

Attributes

Attributes

Value parameters

Attributes

Value parameters

Attributes