F2AI API
Submodules
aie_feast.definitions module
- class aie_feast.definitions.BaseView(*, name: str, description: Optional[str] = None, entities: List[str] = [], schema: List[FeatureSchema] = [], batch_source: Optional[str] = None, ttl: Optional[Period] = None, tags: Dict[str, str] = {})
Bases:
BaseModelAbstraction of common part of FeatureView and LabelView.
- batch_source: Optional[str]
- description: Optional[str]
- entities: List[str]
- name: str
- schemas: List[FeatureSchema]
- tags: Dict[str, str]
- class aie_feast.definitions.Entity(*, name: str, description: Optional[str] = None, join_keys: List[str] = [])
Bases:
BaseModelAn entity is a key connection between different feature views. Under the hook, we use join_keys to join the feature views. If join key is empty, we will take the name as a default join key
- description: Optional[str]
- join_keys: List[str]
- name: str
- class aie_feast.definitions.Feature(*, name: str, dtype: FeatureDTypes, period: Optional[str] = None, schema_type: SchemaType, view_name: str)
Bases:
BaseModelA Feature which include all necessary information which F2AI should know.
- classmethod create_feature_from_schema(schema: FeatureSchema, view_name: str) Feature
- classmethod create_from_schema(schema: FeatureSchema, view_name: str, schema_type: SchemaType) Feature
- classmethod create_label_from_schema(schema: FeatureSchema, view_name: str) Feature
- dtype: FeatureDTypes
- name: str
- period: Optional[str]
- schema_type: SchemaType
- view_name: str
- class aie_feast.definitions.FeatureSchema(*, name: str, description: Optional[str] = None, dtype: FeatureDTypes)
Bases:
BaseModelA FeatureSchema is used to describe a data column but no table information included.
- description: Optional[str]
- dtype: FeatureDTypes
- is_numeric()
- name: str
- class aie_feast.definitions.FeatureView(*, name: str, description: Optional[str] = None, entities: List[str] = [], schema: List[FeatureSchema] = [], batch_source: Optional[str] = None, ttl: Optional[Period] = None, tags: Dict[str, str] = {})
Bases:
BaseView- batch_source: Optional[str]
- description: Optional[str]
- entities: List[str]
- get_feature_names() List[str]
- name: str
- schemas: List[FeatureSchema]
- tags: Dict[str, str]
- class aie_feast.definitions.FileSource(*, name: str, description: Optional[str] = None, timestamp_field: Optional[str] = None, created_timestamp_column: Optional[str] = None, tags: Dict[str, str] = {}, file_format: FileFormatEnum = FileFormatEnum.CSV, path: str)
Bases:
Source- file_format: FileFormatEnum
- path: str
- read_file(str_cols: List[str] = [], keep_cols: List[str] = [])
- class aie_feast.definitions.LabelView(*, name: str, description: Optional[str] = None, entities: List[str] = [], schema: List[FeatureSchema] = [], batch_source: Optional[str] = None, ttl: Optional[Period] = None, tags: Dict[str, str] = {}, request_source: Optional[str] = None)
Bases:
BaseView- get_label_names()
- request_source: Optional[str]
- class aie_feast.definitions.OfflineStore(*, type: OfflineStoreType)
Bases:
BaseModelAn abstraction of what functionalities a OfflineStore should implements. If you want to be a one of the offline store contributor. This is the core.
- abstract get_offline_source(service: Service) Source
get offline materialized source with a specific service
- Parameters
service (Service) – an instance of Service
- Returns
Source
- abstract query(query: str, *args, **kwargs) Any
- Run a query with specific offline store. egg:
if you are using pgsql, this will run a query via psycopg2 if you are using spark, this will run a query via sparksql
- type: OfflineStoreType
- class aie_feast.definitions.OfflineStoreType(value)
Bases:
str,EnumA constant numerate choices which is used to indicate how to initialize OfflineStore from configuration. If you want to add a new type of offline store, you definitely want to modify this.
- FILE = 'file'
- PGSQL = 'pgsql'
- SPARK = 'spark'
- class aie_feast.definitions.Period(*, n: int = 1, unit: AvailablePeriods = AvailablePeriods.DAYS)
Bases:
BaseModelA wrapper of different representations of a time range. Useful to convert to underline utils like pandas DateOffset, Postgres interval strings.
- classmethod from_str(s: str)
Construct a period from str, egg: 10 years, 1day, -1 month.
- Parameters
s (str) – string representation of a period
- property is_neg
- n: int
- to_pandas_dateoffset()
- to_pgsql_interval()
- to_py_timedelta()
- unit: AvailablePeriods
- class aie_feast.definitions.SchemaAnchor(*, view_name: str, schema_name: str, period: Optional[str] = None)
Bases:
BaseModelSchemaAnchor links a view to a group of FeatureSchemas with period information included if it has.
- classmethod from_str(cfg: str) SchemaAnchor
Construct from a string.
- Parameters
cfg (str) – a string with specific format, egg: {feature_view_name}:{feature_name}:{period}
- Returns
SchemaAnchor
- classmethod from_strs(cfgs: List[str]) List[SchemaAnchor]
Construct from a list of strings.
- Parameters
cfgs (List[str]) –
- Returns
List[SchemaAnchor]
- get_features_from_views(views: Dict[str, BaseView], is_numeric=False) List[Feature]
With given views, construct a series of features based on this SchemaAnchor.
- Parameters
views (Dict[str, BaseView]) –
is_numeric (bool, optional) – If only return numeric features. Defaults to False.
- Returns
List[Feature]
- period: Optional[str]
- schema_name: str
- view_name: str
- class aie_feast.definitions.Service(*, name: str, description: Optional[str] = None, features: List[SchemaAnchor] = [], labels: List[SchemaAnchor] = [], ttl: Optional[str] = None, materialize: Optional[str] = 'materialize_table', type: Optional[str] = 'file', dbt: Optional[str] = 'dbt_path')
Bases:
BaseModelA Service is a combination of a group of feature views and label views, which usually directly related to a certain AI model. Tbe best practice which F2AI suggested is, treating services are immutable. Egg: if you want to train different combinations of features for a specific AI model, you may want to create multiple Services, like: linear_reg_v1, linear_reg_v2.
- dbt_path: Optional[str]
- description: Optional[str]
- features: List[SchemaAnchor]
- get_entities(feature_views: Dict[str, FeatureView], label_views: Dict[str, LabelView]) Set[str]
Get all entities which appeared in this service and without duplicate entity.
- Parameters
feature_views (Dict[str, FeatureView]) –
label_views (Dict[str, LabelView]) –
- Returns
Set[str]
- get_feature_entities(feature_views: Dict[str, FeatureView]) Set[Entity]
Get all entities which appeared in related feature views to this service and without duplicate entity.
- Parameters
feature_views (Dict[str, FeatureView]) –
- Returns
Set[Entity]
- get_feature_names(feature_views: Dict[str, FeatureView], is_numeric=False) Set[Feature]
- get_feature_objects(feature_views: Dict[str, FeatureView], is_numeric=False) Set[Feature]
get all the feature objects which included in this service based on features’ schema anchor.
- Parameters
feature_views (Dict[str, FeatureView]) – A group of FeatureViews.
is_numeric (bool, optional) – If only include numeric features. Defaults to False.
- Returns
Set[Feature]
- get_feature_views(feature_views: Dict[str, FeatureView]) List[FeatureView]
Get FeatureViews of this service. This will automatically filter out the feature view not given by parameters.
- Parameters
feature_views (Dict[str, FeatureView]) –
- Returns
List[FeatureView]
- get_label_entities(label_views: Dict[str, LabelView]) Set[str]
Get all entities which appeared in related label views to this service and without duplicate entity.
- Parameters
label_views (Dict[str, LabelView]) –
- Returns
Set[str]
- get_label_names(label_views: Dict[str, FeatureView], is_numeric=False) Set[Feature]
- get_label_objects(label_views: Dict[str, LabelView], is_numeric=False) Set[Feature]
get all the label objects which included in this service based on labels’ schema anchor.
- Parameters
feature_views (Dict[str, LabelView]) – A group of LabelViews.
is_numeric (bool, optional) – If only include numeric labels. Defaults to False.
- Returns
Set[Feature]
- get_label_views(label_views: Dict[str, LabelView]) List[LabelView]
Get LabelViews of this service. This will automatically filter out the label view not given by parameters.
- Parameters
label_views (Dict[str, LabelView]) –
- Returns
List[LabelView]
- labels: List[SchemaAnchor]
- materialize_path: Optional[str]
- materialize_type: Optional[str]
- name: str
- ttl: Optional[str]
- class aie_feast.definitions.Source(*, name: str, description: Optional[str] = None, timestamp_field: Optional[str] = None, created_timestamp_column: Optional[str] = None, tags: Dict[str, str] = {})
Bases:
BaseModelAn abstract class which describe the common part of a Source. A source usually defines where to access data and what the time semantic it has. In F2AI, we have 2 kinds of time semantic:
timestamp_field: the event timestamp which represent when the record happened, which is the main part of point-in-time join.
created_timestamp_field: the created timestamp which represent when the record created, which usually happened in multi cycles of feature generation scenario.
- created_timestamp_field: Optional[str]
- description: Optional[str]
- name: str
- tags: Dict[str, str]
- timestamp_field: Optional[str]
- class aie_feast.definitions.SqlSource(*, name: str, description: Optional[str] = None, timestamp_field: Optional[str] = None, created_timestamp_column: Optional[str] = None, tags: Dict[str, str] = {}, query: str)
Bases:
Source- query: str
- aie_feast.definitions.init_offline_store_from_cfg(cfg: Dict[Any]) OfflineStore
- aie_feast.definitions.parse_source_yaml(o: Dict, offline_store_type: OfflineStoreType) Source
aie_feast.featurestore module
- class aie_feast.featurestore.FeatureStore(project_folder=None, url=None, token=None, projectID=None)
Bases:
object- get_dataset(service_name: str, sampler: Optional[callable] = None) Dataset
get from start to end length data for training from views
- Parameters
service_name (str) – name of SERVICE to use
sampler (callable, optional) – sampler
- get_features(feature_view: Union[str, FeatureView, Service], entity_df: Union[DataFrame, str], features: Optional[list] = None, include: bool = True, **kwargs) DataFrame
non-series prediction use: get features of entity_df from feature_views
- Parameters
feature_view – Single FeatureViews or Service(after materialzed) name to lookup.
entity_df (pd.DataFrame) – condition.
features (List, optional) – features to return. Defaults to None means all features.
include (bool, optional) – include timestamp defined in entity_df or not. Defaults to True.
- get_labels(label_view: Union[str, LabelView, Service], entity_df: DataFrame, include: bool = True, **kwargs) DataFrame
non-time series prediction use: get labels of entity_df from label_views
- Parameters
label_views – Single LabelViews or Service(after materialzed) name to lookup. Defaults to None.
entity_df (pd.DataFrame) – condition
include (bool, optional) – include timestamp defined in entity_df or not. Defaults to False.
- get_latest_entities(view: Union[str, LabelView, Service, FeatureView], entity: Optional[DataFrame] = None) DataFrame
get latest entity and its timestamp from a single FeatureViews/LabelViews or a materialzed Service entity can either be None(all joined-entities in view), entity names or entity value(specific entities)
- Parameters
views (List) – view to look up
- get_period_features(feature_view: Union[str, FeatureView, Service], entity_df: DataFrame, period: str, features: Optional[List[str]] = None, include: bool = True, **kwargs) DataFrame
time_series prediction use: get past period length features of entity_df from feature_views
- Parameters
feature_views – Single FeatureViews or Service(after materialzed) to lookup. Defaults to None.
entity_df (pd.DataFrame) – condition
period (str) – length of look_back
features (List, optional) – features to return. Defaults to None means all features.
include (bool, optional) – include timestamp defined in entity_df or not. Defaults to True.
- get_period_labels(label_view: Union[str, LabelView, Service], entity_df: DataFrame, period: str, include: bool = False, **kwargs) DataFrame
time series prediction use: get from start to end length labels of entity_df from label_views
- Parameters
label_views – Single LabelViews or Service(after materialzed) name to lookup. Defaults to None.
entity_df (pd.DataFrame) – condition
period (str) – length of look_forward, can be negative, egg, -1 days
include (bool, optional) – include timestamp defined in entity_df or not. Defaults to False.
- materialize(service_name: str, start: Optional[str] = None, end: Optional[str] = None, fromnow: Optional[str] = None)
incrementally join views to generate tables
- Parameters
service_name (str) – name of service to materialize
start (str) – begin of materialization
end (str) – end of materialization
fromnow (str) – time interval from now
- query(*args, **kwargs) DataFrame
Run a query though different types of offline store. The usecase of this method is highly depending on different types of offline store.
- schedule_local_dbt_container(profile_name: str, vars: Dict, dbt_path: str)
- stats(view: Union[str, LabelView, Service, FeatureView], entity_df: Optional[DataFrame] = None, features: Optional[List[str]] = None, group_key: Optional[List[str]] = None, fn: str = 'mean', start: Optional[str] = None, end: Optional[str] = None, include: str = 'both', keys_only: bool = False) DataFrame
get from start to end statistical fn results of entity_df from views, only work for numeric features varied with time
- Parameters
views (List) – name of view to look up
entity_df (pd.DataFrame,optional) –
group_key (list) – joined-columns to do stats, only works when entity_df is None. if None, means do stats on joined-entities, also accept [] means no grouping.
fn (str, optional) – statistical method, min, max, std, avg, mode, median. Defaults to “mean”.
start (str, optional) – start_time. Defaults to None, works and only works when entity_df is None.
end (str, optional) – end_time. Defaults to None, works and only works when entity_df is None.
include (str, optional) – whether to include start or end timestamp
keys_only (bool, optional) – whether to take action on keys, only available when fn=unique, return a list
Module contents
- class aie_feast.FeatureStore(project_folder=None, url=None, token=None, projectID=None)
Bases:
object- get_dataset(service_name: str, sampler: Optional[callable] = None) Dataset
get from start to end length data for training from views
- Parameters
service_name (str) – name of SERVICE to use
sampler (callable, optional) – sampler
- get_features(feature_view: Union[str, FeatureView, Service], entity_df: Union[DataFrame, str], features: Optional[list] = None, include: bool = True, **kwargs) DataFrame
non-series prediction use: get features of entity_df from feature_views
- Parameters
feature_view – Single FeatureViews or Service(after materialzed) name to lookup.
entity_df (pd.DataFrame) – condition.
features (List, optional) – features to return. Defaults to None means all features.
include (bool, optional) – include timestamp defined in entity_df or not. Defaults to True.
- get_labels(label_view: Union[str, LabelView, Service], entity_df: DataFrame, include: bool = True, **kwargs) DataFrame
non-time series prediction use: get labels of entity_df from label_views
- Parameters
label_views – Single LabelViews or Service(after materialzed) name to lookup. Defaults to None.
entity_df (pd.DataFrame) – condition
include (bool, optional) – include timestamp defined in entity_df or not. Defaults to False.
- get_latest_entities(view: Union[str, LabelView, Service, FeatureView], entity: Optional[DataFrame] = None) DataFrame
get latest entity and its timestamp from a single FeatureViews/LabelViews or a materialzed Service entity can either be None(all joined-entities in view), entity names or entity value(specific entities)
- Parameters
views (List) – view to look up
- get_period_features(feature_view: Union[str, FeatureView, Service], entity_df: DataFrame, period: str, features: Optional[List[str]] = None, include: bool = True, **kwargs) DataFrame
time_series prediction use: get past period length features of entity_df from feature_views
- Parameters
feature_views – Single FeatureViews or Service(after materialzed) to lookup. Defaults to None.
entity_df (pd.DataFrame) – condition
period (str) – length of look_back
features (List, optional) – features to return. Defaults to None means all features.
include (bool, optional) – include timestamp defined in entity_df or not. Defaults to True.
- get_period_labels(label_view: Union[str, LabelView, Service], entity_df: DataFrame, period: str, include: bool = False, **kwargs) DataFrame
time series prediction use: get from start to end length labels of entity_df from label_views
- Parameters
label_views – Single LabelViews or Service(after materialzed) name to lookup. Defaults to None.
entity_df (pd.DataFrame) – condition
period (str) – length of look_forward, can be negative, egg, -1 days
include (bool, optional) – include timestamp defined in entity_df or not. Defaults to False.
- materialize(service_name: str, start: Optional[str] = None, end: Optional[str] = None, fromnow: Optional[str] = None)
incrementally join views to generate tables
- Parameters
service_name (str) – name of service to materialize
start (str) – begin of materialization
end (str) – end of materialization
fromnow (str) – time interval from now
- query(*args, **kwargs) DataFrame
Run a query though different types of offline store. The usecase of this method is highly depending on different types of offline store.
- schedule_local_dbt_container(profile_name: str, vars: Dict, dbt_path: str)
- stats(view: Union[str, LabelView, Service, FeatureView], entity_df: Optional[DataFrame] = None, features: Optional[List[str]] = None, group_key: Optional[List[str]] = None, fn: str = 'mean', start: Optional[str] = None, end: Optional[str] = None, include: str = 'both', keys_only: bool = False) DataFrame
get from start to end statistical fn results of entity_df from views, only work for numeric features varied with time
- Parameters
views (List) – name of view to look up
entity_df (pd.DataFrame,optional) –
group_key (list) – joined-columns to do stats, only works when entity_df is None. if None, means do stats on joined-entities, also accept [] means no grouping.
fn (str, optional) – statistical method, min, max, std, avg, mode, median. Defaults to “mean”.
start (str, optional) – start_time. Defaults to None, works and only works when entity_df is None.
end (str, optional) – end_time. Defaults to None, works and only works when entity_df is None.
include (str, optional) – whether to include start or end timestamp
keys_only (bool, optional) – whether to take action on keys, only available when fn=unique, return a list