Umělá inteligence a strojové učení 46 - offer of personal tuition

 

I am offering personal tuition for AI, DL, ML , statistics, data analyses, predictions and much much more. For free and forever.

leopodstanicky@seznam.cz

sk

leo podstanicky

 

 

DNSO AI, ML, DL


terms - ML - machine learning, AI, DL - deep learning, data flow, Spark, Kafka, streaming, predictive analytics, orchestration, pipelines, harvesting, serverless, containerization, Docker, notebook, Ansible, Jupyter, Python, modelling, artifacts, affinity cards, Pandas, ScikitLearn, R, OML4Py, feature, fit, overfitting, adaptive sampling, model tuning, classification, regression, Naïve Bayes, random forest, decision tree, neural network, general linear model, GLM - Ridge regression, suport vector machine - linear or Gaussian, precision, recall, F1, ROC AUC, confusion matrix

Based on:

Course : Oracle Cloud Infrastructure Foundations

https://mylearn.oracle.com/ou/component/-/108432/142100

https://docs.oracle.com/en/database/oracle/machine-learning/

https://docs.oracle.com/en/cloud/paas/autonomous-database/oml-tour/

 

The aim of this DNSO is to create a working environment for turning a cloud database into fully predictive analytics engine.
Using Data integration techniques - ETL, data flow tools - Spark, streaming - Kafka, and containerization - Docker. Usign DevOps data orchestration pipelines needed for clear layout and deep understanding of data.


What Is Machine Learning?
Machine learning is a technique that discovers previously unknown relationships in data.

ML automatically searches potentially large stores of data to discover patterns and trends that go beyond simple statistical analysis using sophisticated algorithms that identify patterns in data creating models. Those models can be used to make predictions and forecasts, and categorize data.

The key features are:

Automatic discovery of patterns

Prediction of likely outcomes

Creation of actionable information

Ability to analyze potentially large volumes of data

ML can answer questions that cannot be addressed through traditional deductive query and reporting techniques.

Benefits
A powerful technology that can help you find patterns and relationships within your data.

Find trends and patterns - discovers hidden information in your data. DB admins are many times already aware of important patterns as a result of working with your data over time, ML can confirm or qualify such empirical observations in addition to finding new patterns that are not immediately distinguishable through simple observation, discovering predictive relationships that are not causal, can handle large volume of data, unstructured data, data in datalakes.

Make data driven decisions - Many companies have big data and extracting meaningful information from that data is important in making data driven business decisions. By leveraging ML algorithms, admins are able to transform data into knowledge and actionable intelligence. With the changing demands you are able to make better decisions faster by using ML techniques.

To summarize, ML can:

easily identify trends and patterns
facilitate early anomaly detection
minimize manual intervention by "learning"
handle multidimensional data

Define your business problem

When having problems such as classifying documents, predicting the financial outcomes, detecting hidden patterns and anomalies, and so on, ML can help solve such problems provided that you have clear understanding of the business problem with enough data and learn to ask the right questions to obtain meaningful results. Note - if there is "not enough data" available use two techniques for making predictions - divide existing dataset into two sets and/or - use random selection of data that is many times more successful that exhaustive big table scans saving at the same time resources.

The patterns you find may be very different depending on how you formulate the problem.

Predictions have an associated probability (How likely is this prediction to be true?). Prediction probabilities are also known as confidence (How confident can I be of this prediction?). Some forms of predictive ML generate rules, which are conditions that imply a given outcome.

Other forms of ML identify groupings in the data.

What Do You Want to Do?
Multiple ML techniques, also referred to as "mining function", are available through Oracle Database and Oracle Autonomous Database. Depending on your business problem, you can identify the appropriate mining function, or combination of mining functions, and select the algorithm/s that may best support the solution.

For some mining functions, you can choose from among multiple algorithms. For specific problems, one technique or algorithm may be a better fit than the other or more than one algorithm can be used to solve the problem. Remember the pitfall of overfitting - having too many features selected making data model "noisy" and forcing it to follow the same paths that in fact obscure the sound data model, ie following the noise.

OML provides ML capabilities within Oracle Database by offering a broad set of in-database algorithms to perform a variety of ML techniques such as Classification, Regression, Clustering, Feature Extraction, Anomaly Detection, Association (Market Basket Analysis), and Time Series. Others include Attribute Importance, Row Importance, and Ranking. OML uses built-in features of Oracle Database to maximize scalability, improved memory, and performance. OML is also integrated with open source languages such as Python and R. Through the use of open source packages from R and Python, users can extend this set of techniques and algorithms in combination with embedded execution from OML4Py and OML4R.

Discover More Through Interfaces
Oracle supports programming language interfaces for SQL, R, and Python, and no-code user interfaces such as OML AutoML UI and Oracle Data Miner, and REST model management and deployment through OML Services.

Oracle Machine Learning Notebooks (OML Notebooks) is based on Apache Zeppelin technology enabling you to perform ML in Oracle Autonomous Database (Autonomous Data Warehouse (ADW), Autonomous Transactional Database (ATP), and Autonomous JSON Database (AJD)). OML Notebooks helps users explore, visualize, and prepare data, and develop and document analytical methodologies.

AutoML User Interface (AutoML UI) provides you no-code automated ML. When you create and run an experiment in AutoML UI, it automatically performs algorithm and feature selection, as well as model tuning and selection, thereby enhancing productivity as well as model accuracy and performance. Business users without extensive data science background can use AutoML UI to create and deploy models.

Oracle Machine Learning Services (OML Services) extends OML functionality to support model deployment and model lifecycle management for both in-database OML models and third-party Open Neural Networks Exchange (ONNX) format ML models through REST APIs. The REST API for Oracle Machine Learning Services provides REST API endpoints hosted on Oracle Autonomous Database. These endpoints enable you to store ML models along with its metadata, and create scoring endpoints for the model.

Oracle Machine Learning for Python (OML4Py) enables you to run Python commands and scripts for data transformations and for statistical, ML, and graphical analyses on data stored in or accessible through Oracle Autonomous Database service using a Python API. OML4Py is a Python module that enables Python users to manipulate data in database tables and views using Python syntax. OML4Py functions and methods transparently translate a select set of Python functions into SQL for in-database execution. OML4Py users can use Automated Machine Learning (AutoML) to enhance user productivity and ML results through automated algorithm and feature selection, as well as model tuning and selection. Users can use Embedded Python Execution to run user-defined Python functions in Python engines spawned by the Autonomous Database environment.

Oracle Machine Learning for R (OML4R) provides a database-centric environment for end-to-end analytical processes in R

Oracle Machine Learning for SQL (OML4SQL) provides SQL access to powerful, in-database ML algorithms.

Oracle Data Miner (ODMr) is an extension to Oracle SQL Developer. Oracle Data Miner is a graphical user interface to discover hidden patterns, relationships, and insights in data. ODMr provides a drag-and-drop workflow editor to define and capture the steps that users take to explore and prepare data and apply ML technology.

Oracle Machine Learning for Spark (OML4Spark) provides scalable ML algorithms through R API for Spark and Hadoop environments to explore and prepare data and build and deploy ML models. OML4Spark is a component of the Oracle Big Data Connectors and included with Oracle Big Data Service.


https://docs.oracle.com/en/database/oracle/machine-learning/oml4sql/21/mlsql/process-overview.html#GUID-628EF12F-57D4-476A-844B-0461C47918DF

Process Overview
The lifecycle of a ML project is divided into six phases. The process begins by defining a business problem and restating the business problem in terms of a ML objective. The end goal of a ML process is to produce accurate results for solving your business problem.

Workflow
ML process workflow illustration is based on the CRISP-DM methodology. Each stage in the workflow is illustrated with points that summarize the key tasks. The CRISP-DM methodology is the most commonly used methodology for ML.

The following are the phases of the ML process:
Define business goals
Understand data
Prepare data
Develop models
Evaluate
Deploy

Additional links

https://www.datasciencecentral.com/profiles/blogs/crisp-dm-a-standard-methodology-to-ensure-a-good-outcome

https://www.sv-europe.com/crisp-dm-methodology/

x

Define Business Goals
The first phase of ML process is to define business objectives. This initial phase of a project focuses on understanding the project objectives and requirements.

Once you have specified the problem from a business perspective, you can formulate it and develop a preliminary implementation plan. Identify success criteria to determine if the ML results meet the business goals defined.

Specify objectives
Determine ML goals
Define success criteria
Produce project plan
Understand Data

The data understanding phase involves data collection and exploration which includes loading the data and analyzing the data for your business problem.

Harvesting data

It is the first step, it is happening from wide variety of sources, these are enriched with annotations and attributes, then compare data versus Data Catalog and Asset inventory.

Assess the various data sources and formats. Load data, explore relationships in data so it can be properly integrated. Query and visualize the data to address specific data mining questions such as distribution of attributes, relationship between pairs or small number of attributes, cardinality and perform simple statistical analysis. As you take a closer look at the data, you can determine how well it can be used to addresses the business problem. You can then decide to remove some of the data or add additional data. This is also the time to identify data quality problems such as:
Is the data complete?
Are there missing values in the data?
What types of errors exist in the data and how can they be corrected?

To summarize, in this phase, you will:
Access and collect data
Explore data
Assess data quality
Prepare Data
The preparation phase involves finalizing the data and covers all the tasks involved in making the data in a format that you can use to build the model.

Data preparation tasks are likely to be performed multiple times, iteratively, and not in any prescribed order. Tasks can include column (attributes) selection as well as selection of rows in a table. You may create views to join data or materialize data as required, especially if data is collected from various sources. To cleanse the data, look for invalid values, foreign key values that don't exist in other tables, and missing and outlier values. To refine the data, you can apply transformations such as aggregations, normalization, generalization, and attribute constructions needed to address the ML problem. For example, you can filter out rows representing outliers in the data or filter columns that have too many missing or identical values.

Additionally you can add new computed attributes in an effort to tease information closer to the surface of the data. This process is referred as Feature Engineering. This is similar to creating virtual columns by computing values from real ones.

x

Thoughtful data preparation and feature engineering that capture domain knowledge can significantly improve the patterns discovered through ML. Enabling the data professional to perform data assembly, data preparation, data transformations, and feature engineering inside the Oracle Database is a significant distinction for Oracle.

Note:Oracle Machine Learning supports Automatic Data Preparation (ADP), which greatly simplifies the process of data preparation.

To summarize, in this phase, you will:
Clean, join, and select data
Transform data
Engineer new features

Develop Models
In this phase, you select and apply various modeling techniques and tune the algorithm parameters, called hyperparameters, to desired values.

If the algorithm requires specific data transformations, then you need to step back to the previous phase to apply them to the data. For example, some algorithms allow only numeric columns such that string categorical data must be "exploded" using one-hot encoding prior to modeling. In preliminary model building, it often makes sense to start with a sample of the data since the full data set might contain millions or billions of rows. Getting a feel for how a given algorithm performs on a subset of data can help identify data quality issues and algorithm setting issues sooner in the process reducing time-to-initial-results and compute costs. For supervised LM, data is typically split into train (build) and test data sets using an 80-20% or 60-40% distribution. After splitting the data, build the model with the desired model settings. Use default settings or customize by changing the model setting values. Settings can be specified through OML's PL/SQL, R and Python APIs. Evaluate model quality through metrics appropriate for the technique. For example, use a confusion matrix, precision, and recall for classification models; RMSE for regression models; cluster similarity metrics for clustering models and so on.

Automated Machine Learning (AutoML) features may also be employed to streamline the iterative modeling process, including algorithm selection, attribute (feature) selection, and model tuning and selection.

To summarize, in this phase, you will:
Explore different algorithms
Build, evaluate, and tune models


x

Evaluate
At this stage of the project, it is time to evaluate how well the model satisfies the originally-stated business goal.

During this stage, you will determine how well the model meets your business objectives and success criteria, trade-offs shown in the confusion matrix, costs associated with false positives or false negatives.

Perform a thorough review of the process and determine if important tasks and steps are not overlooked. This is quality check based on which you can determine the next steps such as deploying the project or initiate further iterations, or test the project in a pre-production environment constraints permitting.

To summarize, in this phase:
Review business objectives
Assess results against success criteria
Determine next steps

Deploy
Deployment is the use of ML within a target environment. In the deployment phase you can derive data driven insights and actionable information.

Deployment can involve scoring (applying a model to new data), extracting model details (for example the rules of a decision tree), or integrating ML models within applications, data warehouse infrastructure, or query and reporting tools.

Because Oracle Machine Learning builds and applies ML models inside Oracle Database, the results are immediately available. Reporting tools and dashboards can easily display the results. Additionally, scoring single cases or records at a time with dynamic, batch, or real-time scoring is supported. Data can be scored and the results returned within a single database transaction.

To summarize:
Plan enterprise deployment
Integrate models with application for business needs
Monitor, refresh, retire, and archive models
Report on model effectiveness

Serverless

The data science an ML should concentrate on compute and hence serverless approach is used, then data are put into build phase, trained on dataset and managed according to their source and relevance by different ML techniques.

Tools

Notebooks

Create and Run AutoML Experiment

No-code automation for running experiments.

Steps of AutoML UI pipeline:

OML AutoML UI Interactive Learning

Data source - Auto ML experiment - prediction type - select ML model - algoritm selection - adaptive sampling - to speed up feature selection - feature selection - selection subset of features most predictive for target - goal is to reduce the number of features used for future sampling - model tuning - aims to increase model quality based on selected metrics - feature prediction impact

Create AutoML UI Experiment
To use the Oracle Machine Learning AutoML UI, you start by creating an experiment. An experiment is a unit of work that minimally specifies the data source, prediction target, and prediction type. After experiment runs successfully, it presents you a list of ML models in order of model quality according to metric selected. You can select any of models for deployment or to generate a notebook. The generated notebook contains Python code using OML4Py and specific settings AutoML used to produce the model.

In Schema column, select a schema.

Note: While you select the data source, statistics are displayed in the Features grid at the bottom of the experiment page.

In the Predict drop-down list, select the column from the selected table. This is the target for your prediction.
In the Prediction Type field, the prediction type is automatically selected based on your data definition. However, you can override the prediction type from the drop-down list, if data type permits.

x

Supported Prediction Types are:

Classification: For non-numeric data type, Classification is selected by default.
Regression: For numeric data type, Regression is selected by default.

Maximum Top Models: Select the maximum number of top models to create. The default is 5 models. You can reduce the number of top models to 2 or 3 since tuning models to get the top one for each algorithm requires additional time. If you want to get the initial results even faster, consider the top recommended algorithm. For this, set the Maximum Top Models to 1. This will tune the model for that algorithm.
Maximum Run Duration: This is the maximum time for which the experiment will be allowed to run. If you do not enter a time, then the experiment will be allowed to run for up to the default, which is 8 hours.

Database Service Level: This is database connection service level and query parallelism level. Default is Low. This results in no parallelism and sets a high runtime limit. You can create many connections with Low database service level. You can also change your database service level to Medium or High.

High level gives the greatest parallelism but significantly limits the number of concurrent jobs.
Medium level enables some parallelism but allows greater concurrency for job processing.

Model Metric: Select a metric to choose the winning models. The following metrics are supported by AutoML UI:
For Classification, the supported metrics are:
Balanced Accuracy
ROC AUC
F1 (with weighted options). The weighted options are weighted, binary, micro and macro.
Precision (with weighted options)
Recall (with weighted options)
For Regression, the supported metrics are:
R2 (default)
Negative mean squared error
Negative mean absolute error
Negative median absolute error
Algorithm: The supported algorithms depend on Prediction Type that you have selected. Click the corresponding checkbox against the algorithms to select it. By default, all the candidate algorithms are selected for consideration as the experiment runs. The supported algorithms for the two
Prediction Types:
For Classification, the supported algorithms are:
Decision Tree
Generalized Linear Model
Generalized Linear Model (Ridge Regression)
Neural Network
Random Forest
Support Vector Machine (Gaussian)
Support Vector Machine (Linear)
For Regression, the supported algorithms are:
Generalized Linear Model
Generalized Linear Model (Ridge Regression)
Neural Network
Support Vector Machine (Gaussian)
Support Vector Machine (Linear)

Note:You can remove algorithms from being considered if you have preferences for particular algorithms, or have specific requirements. For example, if model transparency is essential, then excluding models such as Neural Network would make sense. Note that some algorithms are more compute intensive than others. For example, Naïve Bayes and Decision Tree are normally faster than Support Vector Machine or Neural Network.
Expand the Features grid to view the statistics of the selected table. The supported statistics are Distinct Values, Minimum, Maximum, Mean, and Standard Deviation. The supported data sources for Features are tables, views and analytic views. The target column that you selected in Predict is highlighted here. After an experiment run is completed, the Features grid displays an additional column Importance. Feature Importance indicates the overall level of sensitivity of prediction to a particular feature.

Features

Start Experiment Options
Click Start to run the experiment and start the AutoML UI workflow, which is displayed in the progress bar. Here, you have the option to select:

Faster Results: Select this option if you want to get candidate models sooner, possibly at the expense of accuracy. This option works with a smaller set of the hyperparamter combinations, and hence yields faster result.

Better Accuracy: Select this option if you want more pipeline combinations to be tried for possibly more accurate models. A pipeline is defined as an algorithm, selected data feature set, and set of algorithm hyperparameters.

Note:This option works with the broader set of hyperparameter options recommended by the internal meta-learning model. Selecting Better Accuracy will take longer to run your experiment.

Once you start an experiment, the progress bar appears displaying different icons to indicate the status of each stage of the ML workflow in the AutoML experiment. The progress bar also displays the time taken to complete the experiment run.

x

View an Experiment
In the AutoML UI Experiments page, all the experiments that you have created are listed. Each experiment will be in one of the following stages: Completed, Running, and Ready.

To view an experiment, click the experiment name. The Experiment page displays the details of the selected experiment. It contains the following sections:

Experiment

In this section, you can edit the selected experiment.

Metric Chart

The Model Metric Chart depicts the best metric value over time as the experiment runs. It shows improvement in accuracy as the running of the experiment progresses. The display name depends on the selected model metric when you create the experiment.

Leader Board

When an experiment runs, it starts to show the results in the Leader Board. The Leader Board displays the top performing models relative to the model metric selected along with the algorithm and accuracy. You can view the model details and perform the following tasks:

View Model Details: Click on the Model Name to view the details. The model details are displayed in the Model Details dialog box. You can click multiple models on the Leader Board, and view the model details simultaneously. The Model Details window depicts the following:

Prediction Impact: Displays the importance of the attributes in terms of the target prediction of the models.

Confusion Matrix: Displays the different combination of actual and predicted values by the algorithm in a table. Confusion Matrix serves as a performance measurement of the ML algorithm.

Deploy: Select any model on the Leader Board and click Deploy to deploy the selected model. Deploy Model.

x


Create Notebook: Select any model on the Leader Board and click Create Notebooks from AutoML UI Models to recreate the selected model from code.
Metrics: Click Metrics to select additional metrics to display in the Leader Board. The additional metrics are:

For Classification

https://www.bmc.com/blogs/confusion-precision-recall/

Accuracy: Calculates the proportion of correctly classified cases - both Positive and Negative. For example, if there are a total of TP (True Positives)+TN (True Negatives) correctly classified cases out of TP+TN+FP+FN (True Positives+True Negatives+False Positives+False Negatives) cases, then the formula is: Accuracy = (TP+TN)/(TP+TN+FP+FN)

Balanced Accuracy: Evaluates how good a binary classifier is. It is especially useful when the classes are imbalanced, that is, when one of the two classes appears a lot more often than the other. This often happens in many settings such as Anomaly Detection etc.

Recall: Calculates the proportion of actual Positives that is correctly classified. Precision = TP/ (TP + FN)

Precision: Calculates the proportion of predicted Positives that is True Positive. Precision = TP/(TP + FP)

F1 Score: Combines precision and recall into a single number. F1-score is computed using harmonic mean which is calculated by the formula: F1-score = 2 × (precision × recall)/(precision + recall)
For Regression:

R2 (Default): A statistical measure that calculates how close the data are to the fitted regression line. In general, the higher the value of R-squared, the better the model fits your data. The value of R2 is always between 0 to 1, where:
0 indicates that the model explains none of the variability of the response data around its mean.
1 indicates that the model explains all the variability of the response data around its mean.

Negative Mean Squared Error: This is the mean of the squared difference of predicted and true targets.

Negative Mean Absolute Error: This is the mean of the absolute difference of predicted and true targets.

Negative Median Absolute Error: This is the median of the absolute difference between predicted and true targets.

x

Features

The Features grid displays the statistics of the selected table for the experiment.The supported statistics are Distinct Values, Minimum, Maximum, Mean, and Standard Deviation. The supported data sources for Features are tables, views and analytic views. The target column that you selected in Predict is highlighted here. After an experiment run is completed, the Features grid displays an additional column Importance. Feature Importance indicates the overall level of sensitivity of prediction to a particular feature. Hover your cursor over the graph to view the value of Importance. The value is always depicted in the range 0 to 1, with values closer to 1 being more important.

Features Section
Create Notebooks from AutoML UI Models
You can create notebooks using OML4Py code that will recreate the selected model using the same settings. It also illustrates how to score data using the model. This option is helpful if you want to use the code to re-create a similar ML model.

To create a notebook from an AutoML UI model:
Select the model on the Leader Board based on which you want to create your notebook, and click Create Notebook. The Create Notebook dialog opens.


Create Notebook
In the Notebook Name field, enter a name for your notebook.
The REST API endpoint derives the experiment metadata, and determines the following settings as applicable:
Data Source of the experiment (schema.table)
Case ID. If the Case ID for the experiment is not available, then the appropriate message is displayed.
A unique model name based on the current model name is generated
Information related to scoring paragraph:
Case ID: If available, then it merges the Case ID column into the scoring output table
Generate unique predict output table name based on build data source and unique suffix
Prediction column name: PREDICTION
Prediction probability column name: PROBABILITY (applicable only for Classification)
Click OK. The generated notebook is listed in the Notebook page. Click to open the notebook
The generated notebook displays paragraph titles for each paragraph along with the python codes. Once you run the notebook, it displays information related to the notebook as well as the AutoML experiment such as the experiment name, workspace and project in which the notebook is present, the user, data, prediction type and prediction target, algorithm, and the time stamp when the notebook is generated. AutoML UI Generated Notebook
Parent topic: View an Experiment

x
tuna

https://www.oracle.com/a/tech/docs/otn-batch1/oml-automl-ui-tech-brief.pdf


Summary
Oracle Machine Learning (OML) in Oracle Autonomous Database provides access to in-database ML
(ML) algorithms and functionality. OML AutoML User Interface (UI) makes ML easy—providing an easy
to use interface that automates repetitive, time-consuming tasks normally taken by expert data scientists, while
simplifying ML for non-expert users. OML AutoML UI accelerates the ML process from
model building to model deployment.

Specify a data table and the target attribute and OML AutoML UI builds several models
for you to consider. OML AutoML UI automatically preprocesses the data, picks the best in-database candidate
algorithm(s) for the experiment, selects the right input data samples and features to improve model quality, and
speeds up model tuning. OML AutoML UI builds OML models using the selected algorithms, tunes their
hyperparameters, and displays accuracy metrics so users can select the model that best meets their needs.
Oracle Machine Learning AutoML User Interface is an easy-to-use tool that automates routine ML steps, including algorithm and feature selection, model tuning and deployment using in-database OML algorithms. OML AutoML UI enables citizen data scientist to build and select ML models, increases data scientist productivity, and accelerates model deployment.


What is Different about OML AutoML User Interface?
Oracle Machine Learning AutoML User Interface makes it easy to build and deploy ML models. OML AutoML UI, a new component of Oracle Machine Learning on Oracle Autonomous Database, provides a no-code browser-based interface that automates the ML modeling process and simplifies deployment.


OML AutoML UI delivers significant productivity improvements for data scientists as a modeling accelerator—allowing automation to produce an initial model, but then get the specific hyperparameters to continue tuning or to augment the model directly in a notebook generated by OML AutoML UI that reproduces OML4Py code for the specified model.

OML AutoML UI simplifies ML, automates many routine but time-consuming steps, and increases data
scientist productivity while reducing the overall compute time required to deliver ML models. With OML

AutoML UI:
 quickly built ML models and generate notebooks that can be extended and scheduled to run while facilitating data science team collaborating
 Data scientists increase their productivity and easily create, explore, and evaluate multiple models that automate ML best practices while leveraging OML in-database algorithms and ML functionality
 Application developers accelerate ML model deployment via in-database scoring, modifying generated OML notebooks, and using REST endpoints with OML Services for application integration.

x
tuna

Automated Machine Learning Pipeline

OML AutoML UI uses the concept of an experiment to create a ML pipeline that automates several time consuming and repetitive tasks taken by data scientists: algorithm and feature selection, data sampling, model building and evaluation, and hyperparameter tuning.

OML AutoML User Interface automates four major time consuming and tedious steps in the ML modeling process: algorithm selection, data sampling, feature selection and model tuning to deliver significant productivity improvements for data scientists and citizen data scientists.

Algorithm Selection
In each experiment, AutoML identifies the most promising algorithms for building models for the specified data and target. Using models built from a wide range of datasets, automated algorithm selection uses meta-learning – where based on the distribution of values or meta-features in the data, a pre-built model predicts which algorithms are most likely to produce the best results. The algorithms with the highest scores are later used for model tuning. This helps data scientists and non-expert users to find the best algorithm candidates faster than with exhaustive search. This can also reduce compute costs.

Adaptive Sampling
OML AutoML UI uses adaptive sampling to determine the right sampling percentage such that increasing sample size does not further improve model quality. This also speeds up model building. Further, adaptive sampling detects unbalanced datasets that can cause poor models to be built and employs stratified sampling as necessary to create balanced datasets for building better models.

Feature Selection
Attributes that have no correlation with the target attribute, have too many constants or missing values, or have too high cardinality can reduce model quality, while increasing model building and data scoring time. OML AutoML UI pre-processes the input data and automatically removes those attributes that contain little information, or worse, noise. OML AutoML UI first ranks the features and evaluates subsets based
on these rankings, using several techniques:

 Correlation score computed by Pearson's correlation coefficient (SQL CORRrelated functions)

 Attribute importance from a trained Random Forest model

 Coefficients form a trained General Linear model

 Attribute importance from the Minimum Description Length model
To support faster response times, automated features selection uses meta-learning internally to prune the number of possible rankings and feature subsets to be evaluated. This reduces model building and hyperparameter tuning time. OML AutoML UI employs this multifaceted approach for ranking and selecting candidate model input features. Users can manually override to add or remove input features.

Model Tuning
After feature selection, OML AutoML UI builds and tunes multiple ML models – taking advantage of OML’s in-database processing. OML AutoML UI performs model tuning for each selected algorithm to find hyperparameter settings that significantly improve the model, but without resorting to exhaustive search. For example, in a decision tree, AutoML explores and optimizes the maximum tree depth and minimum spit percentage for the algorithm. In a neural network, AutoML explores a range of hidden layers and neurons per layer. OML AutoML UI offers the user the option to select Better Accuracy which may take longer or Faster Results which will strive to build a good model more quickly. When building models, OML leverages in-database Automatic Data Preparation (ADP) for dealing with missing values, data binning, data normalization, and data scaling.

Feature Prediction Impact
For each model built, OML AutoML UI uses a model-agnostic global explanation method that provides insights into the model's behavior. It displays the top attributes and their relative influence on the target attribute. After a model has been built, Feature Prediction Impact shuffles the current values of each attribute assigning them to different records and estimates and ranks feature importance based on the impact each feature when shuffled has on the trained machine learning model's predictions.

Supported Machine Learning Functions, Algorithms and Model Metrics
OML AutoML UI supports the following ML functions, in-database algorithms, and score metrics:
Mining Function Example Use Cases Algorithms Model Metrics
Classification
(binary and multiclass)
Naïve Bayes
Generalized Linear Models
Generalized Linear Models (Ridge
Regression)
Support Vector Machines (Linear)
Support Vector Machine
(Gaussian)
Decision Tree
Random Forest
Neural Networks
Accuracy
Balanced Accuracy
F1
Precision
Recall
ROC AUC
Recall Micro
Recall Macro
Recall Weighted
Precision Micro
Precision Macro
Precision Weighted
Confusion Matrix
Prediction Impacts
Regression
Generalized Linear Models
Generalized Linear Models (Ridge
Regression)
Support Vector Machines (Linear)
Support Vector Machine
(Gaussian)
Neural Network
R-squared
Negative Mean Squared Error
Negative Mean Absolute Error
Negative Median Absolute
Error
Prediction Impacts

Note: OML AutoML UI does not support unstructured text data, nested data resulting from aggregations of
transactional data, or partitioned models.


Once OML AutoML UI has built the ML models, data scientists and application developers can use multiple strategies to deploy OML models.
Generate notebooks for use in Oracle
Machine Learning Notebooks
When OML AutoML UI has finished building models, users can generate a notebook in OML Notebooks from each model produced in the experiment. Such notebooks can be customized, scheduled to run automatically as jobs, and used for data scoring. Data scientists and application developers can modify and extend the generated notebook using OML4Py and OML4SQL interfaces.

Model Repository
OML on Oracle Autonomous Database also supports a Models interface to enable model management and deployment for in-database models.

Model Deployment
Users can deploy in-database models in several ways. As stated above, with OML AutoML UI users can generate OML notebooks from experiment-generated models. In addition, in-database models can be used directly in SQL queries to score data from tools such as SQL Developer, SQL
Developer Web, Oracle Applications Express (APEX), and Oracle Analytics Cloud. Use models in the same database where the model was built or in a different Oracle Database, whether on premises or cloud, for productionizing OML models using model export and import capabilities.
Additionally OML AutoML UI users can deploy models to Oracle Machine Learning Services, which supports model management and deployment with REST endpoints for OML models.

Summary
OML AutoML UI provides an easy-to-use interface that automates repetitive, time-consuming tasks typically taken by
expert data scientists, while at the same time simplifying ML for non-expert users. OML AutoML UI
accelerates the ML process from model building to model deployment for all users. OML AutoML UI
delivers significant new advances in ML simplicity, power, and functionality for enterprises to transform
data into actionable insights quickly.

x


Oracle Machine Learning for SQL

Through PL/SQL and SQL APIs, Oracle Machine Learning for SQL (OML4SQL) provides scalable in-database ML algorithms. The parallelized algorithms in the database keep data under database control. There is no need to extract data to separate ML engines, which adds latency to data access and raises concerns about data security, storage, and recency.
The algorithms are fast and scalable, support algorithm-specific automatic data preparation, and can score in batch or real-time. OML4SQL provides explanatory prediction details when scoring data, so you can understand why an individual prediction is made. Furthermore, Oracle's Exadata Smart Scan technology moves scoring processing to the data storage tier, resulting in significant performance gains when scoring data.

In-database ML models are first-class database objects. You can manage access by granting and revoking permissions, auditing user actions, and exporting and importing ML models across databases. With in-database models, deployment is instantaneous through SQL queries that use SQL prediction operators. OML4SQL reduces solution complexity significantly.

Links

https://docs.oracle.com/en/database/oracle/machine-learning/oml4sql/23/books.html
https://docs.oracle.com/en/database/oracle/machine-learning/oml4sql/23/dmprg/machine-learning-SQL.html#GUID-46E7420D-17D0-4683-B9F3-6AB42E9B1EB6
https://docs.oracle.com/en/database/oracle/machine-learning/oml4sql/23/dmprg/oracle-machine-learning-API-highlights.html#GUID-7D00AFBD-EDED-418C-81FB-576A83CA9536
https://docs.oracle.com/en/database/oracle/machine-learning/oml4sql/23/dmprg/preface.html#GUID-AEEC3CFB-AC99-475B-8589-EFE8BF5B6B20

 

 

 

DNSO Devops run cloud architecture

 

terms - bucket, artifact, manifest, Kubernetes=K8s, Docker, kubectl,  CI/CD - continuous integration, delivery/deployment, Ansible, Terraform, GitHub, GitLab, containers, lifecycle, versioning, functions, HA - high availability, Cloudshell, tenancy, compartment, Flask, OKE - Oracle Cloud Engine for Kubernetes, Green-Bluen deployment strategy, Canary deployment strategy, storefront, yaml spec file, CM - configuration management, IaC - Infrastructure as code, IAM - identity access management, pods, isolation, pipeline, trigger, PAT - personal access token, control plane, config map, hub and spoke, stack, vault, secrets, volumes, HSM - hardware security module

 

Based on

 

 

https://mylearn.oracle.com/ou/component/-/111402/157972

 

Course : Oracle Cloud Infrastructure DevOps Professional

 

The aim to is to automate the whole cloud infrastructure.

 

Prerequisites - experience with application development and OCI.

 

 

Aim of this DNSO idea is managing infrastructure and configuring it, designing SW to absorb changes efficiently, creating loosely coupled entities grouped into microservices finding ways to deploy and automate them. Securing all in layers with proper tools - double encryption which is permanent. Do instant and continuous HW/SW changes, patching and upgrades. Implementing it must be done in entirety, hence the best is to absorb the whole Oracle Devops course as presented on link above. No part of it can be excluded as it works as an orchestrated architectural design.

 

Using DevOps lifecycle in eight stages - plan, code, build, test, release, deploy, operate and monitor. Forcing and nurturing CI and CD - continuous integration and delivery/deployment using power of versioning and of buckets like GitHub or GitLab. Using containers and artifacts, and tools like Ansible, Terraform, Grafana and Kubernetes with their respective plug-ins. Using security for OKE - Oracle Kubernetes engine as IAM and RBAC rules,  Kubernetes secrets, pod security policies, network access rules, multi-tenancy usage models  and Oracle functions as IAM policies, network source for restricting access, private network access via Oracle functions support etc.

 

Advantage of containers over VMs - much lower overhead as all containers share the same underlying OS so restart of the infrastructure is done in miliseconds instead of seconds or minutes.

 

Main aim - to be able to continuously make changes addressing CM - configuration management and IaC - infrastructure as code. CM makes sure that all systems have the same dependencies using Ansible while Terraform is used for IaC.

ie.

CM is Ansible

IaC is Terraform

 

Both are versioned on platforms like Github and GitLab from where they can be pulled. Terraform uses hub and spoke technology where spokes can run specified infrastructure settings using the same main script on hub.

 

OCI Resource manager - OCIRM

It is used for version control and planning of output. The cycle is State-Plan-Refresh-State-Diff-Apply-Infrastructure

RM is cloud based Terraform flow for central management. A stack as set of resources to manage infrastructure is created. Plan with job queue is created, it locks the state and then gets executed generating diff file. Then the plan is Applied on infrastructure with logs generated.

RM is then synced over the infrastructure to resolve drifts running drift detection first. Drift reports only about resources that Terraform knew about already. For new resources to provision a new stack is created.

 

Templates are made to contain needed infrastructure, variables, constraints and schema.yaml file.

 

Microservices and container orchestration using container registry and OKE, loose coupling and isolation of microservices into Kubernetes model easily coupable and highly available, failure resistant.

 

API Gateway is the beginning then there are microservices than can use different program languages. Microservices architecture stands here against monolithic architecture with multiple advantages.

 

Methodology of microservices using 12-factors - codebase, dependencies, configuration, backing services, Build, release and run, processes, port binding, concurrency, disposability, development production and parity, admin processes

 

Benefits of microservices - easier to build and maintain, organised around business capabilities, improved productivity and speed of deployment, increased fault tolerance and isolation, greater scalability and flexibility, simplified security monitoring, autonomous, cross-functional teams

 

Containerization

Reason is that containers are lightweight - common OS is used for all containers/pods, portable and secure.

 

Docker - enables to run lightweight container instance sharing the same kernel, it uses images in repository. Image is then containerized using docker run command.

 

OCI Registry - OCIR - maintains a consistent set of Docker oracle images enforcing access rights and security policies.

Benefits - integration, security, HA, anywhere access. One OCIR can accomodate 100 000 images so granularity can be extreme without any issues.

 

At registry creation auth token must be generated to be able to connect using docker login command in Cloudshell. From there you pull/push images. Images can be tagged for versioning. Then there is retention policy to be set.

 

Kubernetes - container orchestration tool automating scaling of workloads. It groups containers into logical units called pods - there may be the same pods assigned to the same or other worker nodes that are all aligned under one master node or control plane. Containers are within the pods and comprise images. A node is the smallest worker unit and encapsulates applications as containers. You deploy pods using Config Map - this uses key-value pairs, keeps your application code separate from configurations making applications portable storing connection strings, variables, public credentials, host names and URLs using secrets.

 

Kubernetes=K8s - abbreviation comes from 8 letters between "K" and "s" - use volumes which are directories accessible to containers in pods. 

 

Control plane - has kube control manager, kube API server and kube scheduler. They control kubelets and K8s worker nodes. 

 

Features of K8s:

 

Health checks - for readiness to send traffic to the pods, networking, service discovery, load balancing, logging, rolling updates with minimal downtime, managing multiple containers, automatic bin packing, container replication, container autoscaling, volume manager, resourse usage monitoring

 

OKE - provides highly available control plane, three versions of K8s are always supported, when new is rolled out, the oldest is supported for 30 days. 

 

Prerequisites to create OKE cluster

 

Acces to OCI tenancy, sufficient quota on resource - service limit, ready-to-deploy compartment, configuring network resources, policies, kubectl and kubeconfig.

 

OKE needs at least two subnets to function with its own SLs - security lists and RTs - route tables. Within private subnets formed worker nodes you use along with internet gateway and also NAT GW allowing only traffic out and service GW for on-prem customer facilities.  

 

Add required policies in format Allow group etc.

 

Then create cluster with node types, workload characteristics, operational considerations and cost optimization and scaling.

 

Deploy an app in OKE - create a compartment, create K8s cluster within it, build Python app Flask framework, create Docker image, push it to OCI container registry, use Cloudshell for deploying Docker app to cluster, connect to app from internet.

 

To work with Docker images in Cloudshell you need to create a Docker manifest which is Ansible python yaml file. Then it is deployed using kubectl create command

 

Monitoring OKE cluster

 

- health, capacity, performance of cluster and worker nodes using metrics, alarms and notifications

 

Work requests are logs that help you monitor the processes on any level.

 

Accessing OKE Dashboard - helps you assess status of containerized apps. Cannot be used from Cloudshell though. Command to prove you are logged in OKE cluster is CLI kubectl get nodes

 

Then apply recommended descriptor file using kubectl apply -f http_path.yaml - ie a manifest - command

 

K8s dashboard is a web based app giving you visualization of your environment.

 

K8s cluster is scaled for optimization of resources - for this you enable autoscaling feature. For this K8s metric server scans resources periodically. This is horizontal autoscaling.

 

Vertical scaling maintains ratios between limits and usage of resources. Scaling is done in Node pools pane.

 

For verification use kubectl get nodes -o wide command.

 

CI/CD - continuous integration/delivery-deployment

 

It is based of code repo, trigers,  build pipeline, deliver artifacts, deploy pipeline, pull manifest, pull image, deploy targets - OKE, instance group, function, monitoring, notifications, logging, vault, IAM

CI

Frequent small code changes, error fixes continuously integrated, automatic build and step tests.

CD

Cdelivery - final manual approval and deployment - all automated

 

IAM policies

Assign users to appropriate groups. Creating gynamic groups and policies granting users privilegies.

 

 

Basic lineup of accessing resources is tenancy and compartment. Dynamic groups are created to manage each resource, add rules to manage resources, then create policies to perfom operations within OCI. Policy rules are always written in parent compartment. Policy builder offers preconfigured policies but also manual policy build is possible.

 

Devops project and code repos

 

Stages - Code repos, Artifacts/containter registry, environment, build pipeline, deploy pipeline, IAM policies settting, notifications - create topics and subscriptions, logging 

 

Benefits - keep track of resources, faster SW delivery, enhanced security and reduced risk in delivery, enable loging, monitoring and notifications

 

Code repositories - centralized code creation, localized copies of code isolated from one another, strong version control 

 

Mirror is done every 15 minutes by default.

 

Devops project - needs a topic, then enable logging.

 

You can clone GitHUb repository to OCI repo using Cloudshell, in it create new OCI repo then push the copied files to it. A build_spec.yaml is created that is later used to manage building pipelines.

 

Mirror Code repos

 

Generate personal access token - PAT - from GitHub, create vault secret for it, create dynamic groups and IAM policies to access secret family in compartment, mirror Code repo using OCI repos.

 

Artifact registry

 

For SW deployment, ZIP files and library binaries and others. You use versioning and rollback to previous versions.

These are container image repos, deployment configurations - yaml files, K8s manifest describing desired state of an object and general artifacts

 

Build pipelines

 

parts - OCI devops project, code repo, build spec file, logging

 

Build is associated with trigger that triggers it from GitHub or OCIR on predefined parametres like push or pull. It is built in stages in OCI console - Build Storefront using yaml spec file and run it, after review logs for checking correctness.

 

Getting artifacts

 

Passing params into build pipeline, setting output artifacts, running pipeline, examining resulting artifacts.

 

Deployment pipeline

 

A sequence of steps for delivering and deploying artifacts to target environments.

 

Continuous Delivery -  uses manual approve in Code-Build-Test-Approve-Deploy-Release cycle

 

Continuous Deployment - all steps are automatic

 

Advantages  of Deployment pipeline: - automate global rollout across OCI platforms

- execute deployments in multiple regions parallel or serial

- automate deployment to include testing and delivery

 

You can deploy into OKE, instance groups or functions

 

You first test then use staging environment then canary release then manually approve and release to production.

 

Deployment strategies

 

BG Deployment strategy

 

By using two identical environments when one of them is active at a given time. Active env is current, new - STBY - is test env.

 

After testing is done in green env, production is gradually transfered to blue env which is also used in case of rollbacks or disaster recovery. The swith is done via LB. This deployment can be done to OKE and to instance group. Validation of deployment of new release is done via Functions.

 

Benefits - risk free, quick rollback, testing, no downtime

 

Drawback - resource intensive - resource duplication, errors encountered when changing user routing, possible relogin to apps needed, service outages, managing dependencies is harder - schema changed etc.

 

 

Canary deployment

 

Enable decreasing of problems to rolling up changes starting with small subset of users and gradually increasing the set of production users using the verified updates. First the changes are tested in testing environment and then gradually transferred to canary environment. Same as BG Deployment strategy is used against OKE and Instance Group environments using Functions to validate deployment.

 

Benefits - You can test two app versions at a time while users are online, rollbacks are easy, risk free, zero downtime when updating and/or releasing new version.

 

Drawback - greater overhead, slow rollout, observability is more difficult to complex environment, users still exposed to SW issues.

 

HELM

https://helm.sh/

Used to applying/deploy bulk changes in parallel in different environments, it help to manage K8s apps, define, upgrade and install most complex ones. It is part of Cloud Native Computing Foundation. Helm uses Helm chart - a yaml based package - service, deployment and values yaml files. Helm creates releases, instances of charts run in K8s cluster.

 

Helm deployment to OKE

Using cluster access in OKE to deploy Helm chart using PAS or PAT - personal access secret/token in vault using Cloudshell. Need to create Devops project with topics for sending notifications. Mirror repository from Github. Build pipeline, provide spec yaml file - will be used from container image transfer, provide primary code repository. You will push Helm chart artifacts into Helm repository.

 

For security see the chapter DevSecOps. It involves secrets, putting them in vault, vault integration with OCI services, cluster, nodepool and network security, multitenancy and functions security.

 

Observability - see part 6 - how to set up monitoring, notification, logging and events.

 

Other tools used

 

Flask

https://flask.palletsprojects.com/

Configuration management complex tool

 

Grafana

https://grafana.com/

Complex graphical interface

 

 

  

 

 

 

Create Your Own Website With Webador