F2AI API

Submodules

aie_feast.definitions module

class aie_feast.definitions.BaseView(*, name: str, description: Optional[str] = None, entities: List[str] = [], schema: List[FeatureSchema] = [], batch_source: Optional[str] = None, ttl: Optional[Period] = None, tags: Dict[str, str] = {})

Bases: BaseModel

Abstraction of common part of FeatureView and LabelView.

batch_source: Optional[str]
description: Optional[str]
entities: List[str]
name: str
schemas: List[FeatureSchema]
tags: Dict[str, str]
ttl: Optional[Period]
class aie_feast.definitions.Entity(*, name: str, description: Optional[str] = None, join_keys: List[str] = [])

Bases: BaseModel

An entity is a key connection between different feature views. Under the hook, we use join_keys to join the feature views. If join key is empty, we will take the name as a default join key

description: Optional[str]
join_keys: List[str]
name: str
class aie_feast.definitions.Feature(*, name: str, dtype: FeatureDTypes, period: Optional[str] = None, schema_type: SchemaType, view_name: str)

Bases: BaseModel

A Feature which include all necessary information which F2AI should know.

classmethod create_feature_from_schema(schema: FeatureSchema, view_name: str) Feature
classmethod create_from_schema(schema: FeatureSchema, view_name: str, schema_type: SchemaType) Feature
classmethod create_label_from_schema(schema: FeatureSchema, view_name: str) Feature
dtype: FeatureDTypes
name: str
period: Optional[str]
schema_type: SchemaType
view_name: str
class aie_feast.definitions.FeatureSchema(*, name: str, description: Optional[str] = None, dtype: FeatureDTypes)

Bases: BaseModel

A FeatureSchema is used to describe a data column but no table information included.

description: Optional[str]
dtype: FeatureDTypes
is_numeric()
name: str
class aie_feast.definitions.FeatureView(*, name: str, description: Optional[str] = None, entities: List[str] = [], schema: List[FeatureSchema] = [], batch_source: Optional[str] = None, ttl: Optional[Period] = None, tags: Dict[str, str] = {})

Bases: BaseView

batch_source: Optional[str]
description: Optional[str]
entities: List[str]
get_feature_names() List[str]
get_feature_objects(is_numeric=False) Set[Feature]
name: str
schemas: List[FeatureSchema]
tags: Dict[str, str]
ttl: Optional[Period]
class aie_feast.definitions.FileSource(*, name: str, description: Optional[str] = None, timestamp_field: Optional[str] = None, created_timestamp_column: Optional[str] = None, tags: Dict[str, str] = {}, file_format: FileFormatEnum = FileFormatEnum.CSV, path: str)

Bases: Source

file_format: FileFormatEnum
path: str
read_file(str_cols: List[str] = [], keep_cols: List[str] = [])
class aie_feast.definitions.LabelView(*, name: str, description: Optional[str] = None, entities: List[str] = [], schema: List[FeatureSchema] = [], batch_source: Optional[str] = None, ttl: Optional[Period] = None, tags: Dict[str, str] = {}, request_source: Optional[str] = None)

Bases: BaseView

get_label_names()
get_label_objects(is_numeric=False) Set[Feature]
request_source: Optional[str]
class aie_feast.definitions.OfflineStore(*, type: OfflineStoreType)

Bases: BaseModel

An abstraction of what functionalities a OfflineStore should implements. If you want to be a one of the offline store contributor. This is the core.

abstract get_offline_source(service: Service) Source

get offline materialized source with a specific service

Parameters

service (Service) – an instance of Service

Returns

Source

abstract query(query: str, *args, **kwargs) Any
Run a query with specific offline store. egg:

if you are using pgsql, this will run a query via psycopg2 if you are using spark, this will run a query via sparksql

type: OfflineStoreType
class aie_feast.definitions.OfflineStoreType(value)

Bases: str, Enum

A constant numerate choices which is used to indicate how to initialize OfflineStore from configuration. If you want to add a new type of offline store, you definitely want to modify this.

FILE = 'file'
PGSQL = 'pgsql'
SPARK = 'spark'
class aie_feast.definitions.Period(*, n: int = 1, unit: AvailablePeriods = AvailablePeriods.DAYS)

Bases: BaseModel

A wrapper of different representations of a time range. Useful to convert to underline utils like pandas DateOffset, Postgres interval strings.

classmethod from_str(s: str)

Construct a period from str, egg: 10 years, 1day, -1 month.

Parameters

s (str) – string representation of a period

property is_neg
n: int
to_pandas_dateoffset()
to_pgsql_interval()
to_py_timedelta()
unit: AvailablePeriods
class aie_feast.definitions.SchemaAnchor(*, view_name: str, schema_name: str, period: Optional[str] = None)

Bases: BaseModel

SchemaAnchor links a view to a group of FeatureSchemas with period information included if it has.

classmethod from_str(cfg: str) SchemaAnchor

Construct from a string.

Parameters

cfg (str) – a string with specific format, egg: {feature_view_name}:{feature_name}:{period}

Returns

SchemaAnchor

classmethod from_strs(cfgs: List[str]) List[SchemaAnchor]

Construct from a list of strings.

Parameters

cfgs (List[str]) –

Returns

List[SchemaAnchor]

get_features_from_views(views: Dict[str, BaseView], is_numeric=False) List[Feature]

With given views, construct a series of features based on this SchemaAnchor.

Parameters
  • views (Dict[str, BaseView]) –

  • is_numeric (bool, optional) – If only return numeric features. Defaults to False.

Returns

List[Feature]

period: Optional[str]
schema_name: str
view_name: str
class aie_feast.definitions.Service(*, name: str, description: Optional[str] = None, features: List[SchemaAnchor] = [], labels: List[SchemaAnchor] = [], ttl: Optional[str] = None, materialize: Optional[str] = 'materialize_table', type: Optional[str] = 'file', dbt: Optional[str] = 'dbt_path')

Bases: BaseModel

A Service is a combination of a group of feature views and label views, which usually directly related to a certain AI model. Tbe best practice which F2AI suggested is, treating services are immutable. Egg: if you want to train different combinations of features for a specific AI model, you may want to create multiple Services, like: linear_reg_v1, linear_reg_v2.

dbt_path: Optional[str]
description: Optional[str]
features: List[SchemaAnchor]
classmethod from_yaml(cfg: Dict) Service

Construct a Service from parsed yaml config file.

get_entities(feature_views: Dict[str, FeatureView], label_views: Dict[str, LabelView]) Set[str]

Get all entities which appeared in this service and without duplicate entity.

Parameters
Returns

Set[str]

get_feature_entities(feature_views: Dict[str, FeatureView]) Set[Entity]

Get all entities which appeared in related feature views to this service and without duplicate entity.

Parameters

feature_views (Dict[str, FeatureView]) –

Returns

Set[Entity]

get_feature_names(feature_views: Dict[str, FeatureView], is_numeric=False) Set[Feature]
get_feature_objects(feature_views: Dict[str, FeatureView], is_numeric=False) Set[Feature]

get all the feature objects which included in this service based on features’ schema anchor.

Parameters
  • feature_views (Dict[str, FeatureView]) – A group of FeatureViews.

  • is_numeric (bool, optional) – If only include numeric features. Defaults to False.

Returns

Set[Feature]

get_feature_views(feature_views: Dict[str, FeatureView]) List[FeatureView]

Get FeatureViews of this service. This will automatically filter out the feature view not given by parameters.

Parameters

feature_views (Dict[str, FeatureView]) –

Returns

List[FeatureView]

get_label_entities(label_views: Dict[str, LabelView]) Set[str]

Get all entities which appeared in related label views to this service and without duplicate entity.

Parameters

label_views (Dict[str, LabelView]) –

Returns

Set[str]

get_label_names(label_views: Dict[str, FeatureView], is_numeric=False) Set[Feature]
get_label_objects(label_views: Dict[str, LabelView], is_numeric=False) Set[Feature]

get all the label objects which included in this service based on labels’ schema anchor.

Parameters
  • feature_views (Dict[str, LabelView]) – A group of LabelViews.

  • is_numeric (bool, optional) – If only include numeric labels. Defaults to False.

Returns

Set[Feature]

get_label_view(label_views: Dict[str, LabelView]) LabelView
get_label_views(label_views: Dict[str, LabelView]) List[LabelView]

Get LabelViews of this service. This will automatically filter out the label view not given by parameters.

Parameters

label_views (Dict[str, LabelView]) –

Returns

List[LabelView]

labels: List[SchemaAnchor]
materialize_path: Optional[str]
materialize_type: Optional[str]
name: str
ttl: Optional[str]
class aie_feast.definitions.Source(*, name: str, description: Optional[str] = None, timestamp_field: Optional[str] = None, created_timestamp_column: Optional[str] = None, tags: Dict[str, str] = {})

Bases: BaseModel

An abstract class which describe the common part of a Source. A source usually defines where to access data and what the time semantic it has. In F2AI, we have 2 kinds of time semantic:

  1. timestamp_field: the event timestamp which represent when the record happened, which is the main part of point-in-time join.

  2. created_timestamp_field: the created timestamp which represent when the record created, which usually happened in multi cycles of feature generation scenario.

created_timestamp_field: Optional[str]
description: Optional[str]
name: str
tags: Dict[str, str]
timestamp_field: Optional[str]
class aie_feast.definitions.SqlSource(*, name: str, description: Optional[str] = None, timestamp_field: Optional[str] = None, created_timestamp_column: Optional[str] = None, tags: Dict[str, str] = {}, query: str)

Bases: Source

query: str
aie_feast.definitions.init_offline_store_from_cfg(cfg: Dict[Any]) OfflineStore
aie_feast.definitions.parse_source_yaml(o: Dict, offline_store_type: OfflineStoreType) Source

aie_feast.featurestore module

class aie_feast.featurestore.FeatureStore(project_folder=None, url=None, token=None, projectID=None)

Bases: object

get_dataset(service_name: str, sampler: Optional[callable] = None) Dataset

get from start to end length data for training from views

Parameters
  • service_name (str) – name of SERVICE to use

  • sampler (callable, optional) – sampler

get_features(feature_view: Union[str, FeatureView, Service], entity_df: Union[DataFrame, str], features: Optional[list] = None, include: bool = True, **kwargs) DataFrame

non-series prediction use: get features of entity_df from feature_views

Parameters
  • feature_view – Single FeatureViews or Service(after materialzed) name to lookup.

  • entity_df (pd.DataFrame) – condition.

  • features (List, optional) – features to return. Defaults to None means all features.

  • include (bool, optional) – include timestamp defined in entity_df or not. Defaults to True.

get_labels(label_view: Union[str, LabelView, Service], entity_df: DataFrame, include: bool = True, **kwargs) DataFrame

non-time series prediction use: get labels of entity_df from label_views

Parameters
  • label_views – Single LabelViews or Service(after materialzed) name to lookup. Defaults to None.

  • entity_df (pd.DataFrame) – condition

  • include (bool, optional) – include timestamp defined in entity_df or not. Defaults to False.

get_latest_entities(view: Union[str, LabelView, Service, FeatureView], entity: Optional[DataFrame] = None) DataFrame

get latest entity and its timestamp from a single FeatureViews/LabelViews or a materialzed Service entity can either be None(all joined-entities in view), entity names or entity value(specific entities)

Parameters

views (List) – view to look up

get_period_features(feature_view: Union[str, FeatureView, Service], entity_df: DataFrame, period: str, features: Optional[List[str]] = None, include: bool = True, **kwargs) DataFrame

time_series prediction use: get past period length features of entity_df from feature_views

Parameters
  • feature_views – Single FeatureViews or Service(after materialzed) to lookup. Defaults to None.

  • entity_df (pd.DataFrame) – condition

  • period (str) – length of look_back

  • features (List, optional) – features to return. Defaults to None means all features.

  • include (bool, optional) – include timestamp defined in entity_df or not. Defaults to True.

get_period_labels(label_view: Union[str, LabelView, Service], entity_df: DataFrame, period: str, include: bool = False, **kwargs) DataFrame

time series prediction use: get from start to end length labels of entity_df from label_views

Parameters
  • label_views – Single LabelViews or Service(after materialzed) name to lookup. Defaults to None.

  • entity_df (pd.DataFrame) – condition

  • period (str) – length of look_forward, can be negative, egg, -1 days

  • include (bool, optional) – include timestamp defined in entity_df or not. Defaults to False.

materialize(service_name: str, start: Optional[str] = None, end: Optional[str] = None, fromnow: Optional[str] = None)

incrementally join views to generate tables

Parameters
  • service_name (str) – name of service to materialize

  • start (str) – begin of materialization

  • end (str) – end of materialization

  • fromnow (str) – time interval from now

query(*args, **kwargs) DataFrame

Run a query though different types of offline store. The usecase of this method is highly depending on different types of offline store.

schedule_local_dbt_container(profile_name: str, vars: Dict, dbt_path: str)
stats(view: Union[str, LabelView, Service, FeatureView], entity_df: Optional[DataFrame] = None, features: Optional[List[str]] = None, group_key: Optional[List[str]] = None, fn: str = 'mean', start: Optional[str] = None, end: Optional[str] = None, include: str = 'both', keys_only: bool = False) DataFrame

get from start to end statistical fn results of entity_df from views, only work for numeric features varied with time

Parameters
  • views (List) – name of view to look up

  • entity_df (pd.DataFrame,optional) –

  • group_key (list) – joined-columns to do stats, only works when entity_df is None. if None, means do stats on joined-entities, also accept [] means no grouping.

  • fn (str, optional) – statistical method, min, max, std, avg, mode, median. Defaults to “mean”.

  • start (str, optional) – start_time. Defaults to None, works and only works when entity_df is None.

  • end (str, optional) – end_time. Defaults to None, works and only works when entity_df is None.

  • include (str, optional) – whether to include start or end timestamp

  • keys_only (bool, optional) – whether to take action on keys, only available when fn=unique, return a list

Module contents

class aie_feast.FeatureStore(project_folder=None, url=None, token=None, projectID=None)

Bases: object

get_dataset(service_name: str, sampler: Optional[callable] = None) Dataset

get from start to end length data for training from views

Parameters
  • service_name (str) – name of SERVICE to use

  • sampler (callable, optional) – sampler

get_features(feature_view: Union[str, FeatureView, Service], entity_df: Union[DataFrame, str], features: Optional[list] = None, include: bool = True, **kwargs) DataFrame

non-series prediction use: get features of entity_df from feature_views

Parameters
  • feature_view – Single FeatureViews or Service(after materialzed) name to lookup.

  • entity_df (pd.DataFrame) – condition.

  • features (List, optional) – features to return. Defaults to None means all features.

  • include (bool, optional) – include timestamp defined in entity_df or not. Defaults to True.

get_labels(label_view: Union[str, LabelView, Service], entity_df: DataFrame, include: bool = True, **kwargs) DataFrame

non-time series prediction use: get labels of entity_df from label_views

Parameters
  • label_views – Single LabelViews or Service(after materialzed) name to lookup. Defaults to None.

  • entity_df (pd.DataFrame) – condition

  • include (bool, optional) – include timestamp defined in entity_df or not. Defaults to False.

get_latest_entities(view: Union[str, LabelView, Service, FeatureView], entity: Optional[DataFrame] = None) DataFrame

get latest entity and its timestamp from a single FeatureViews/LabelViews or a materialzed Service entity can either be None(all joined-entities in view), entity names or entity value(specific entities)

Parameters

views (List) – view to look up

get_period_features(feature_view: Union[str, FeatureView, Service], entity_df: DataFrame, period: str, features: Optional[List[str]] = None, include: bool = True, **kwargs) DataFrame

time_series prediction use: get past period length features of entity_df from feature_views

Parameters
  • feature_views – Single FeatureViews or Service(after materialzed) to lookup. Defaults to None.

  • entity_df (pd.DataFrame) – condition

  • period (str) – length of look_back

  • features (List, optional) – features to return. Defaults to None means all features.

  • include (bool, optional) – include timestamp defined in entity_df or not. Defaults to True.

get_period_labels(label_view: Union[str, LabelView, Service], entity_df: DataFrame, period: str, include: bool = False, **kwargs) DataFrame

time series prediction use: get from start to end length labels of entity_df from label_views

Parameters
  • label_views – Single LabelViews or Service(after materialzed) name to lookup. Defaults to None.

  • entity_df (pd.DataFrame) – condition

  • period (str) – length of look_forward, can be negative, egg, -1 days

  • include (bool, optional) – include timestamp defined in entity_df or not. Defaults to False.

materialize(service_name: str, start: Optional[str] = None, end: Optional[str] = None, fromnow: Optional[str] = None)

incrementally join views to generate tables

Parameters
  • service_name (str) – name of service to materialize

  • start (str) – begin of materialization

  • end (str) – end of materialization

  • fromnow (str) – time interval from now

query(*args, **kwargs) DataFrame

Run a query though different types of offline store. The usecase of this method is highly depending on different types of offline store.

schedule_local_dbt_container(profile_name: str, vars: Dict, dbt_path: str)
stats(view: Union[str, LabelView, Service, FeatureView], entity_df: Optional[DataFrame] = None, features: Optional[List[str]] = None, group_key: Optional[List[str]] = None, fn: str = 'mean', start: Optional[str] = None, end: Optional[str] = None, include: str = 'both', keys_only: bool = False) DataFrame

get from start to end statistical fn results of entity_df from views, only work for numeric features varied with time

Parameters
  • views (List) – name of view to look up

  • entity_df (pd.DataFrame,optional) –

  • group_key (list) – joined-columns to do stats, only works when entity_df is None. if None, means do stats on joined-entities, also accept [] means no grouping.

  • fn (str, optional) – statistical method, min, max, std, avg, mode, median. Defaults to “mean”.

  • start (str, optional) – start_time. Defaults to None, works and only works when entity_df is None.

  • end (str, optional) – end_time. Defaults to None, works and only works when entity_df is None.

  • include (str, optional) – whether to include start or end timestamp

  • keys_only (bool, optional) – whether to take action on keys, only available when fn=unique, return a list