Okta Enhances Cybersecurity Event Detection with Autoencoders
/ 4 min read
Quick take - Okta is enhancing its cybersecurity event detection capabilities by integrating unsupervised learning techniques, specifically autoencoders, to improve the analysis of user behavior patterns and address the limitations of current log analysis methods.
Fast Facts
- Okta is enhancing its cybersecurity event detection by integrating unsupervised learning techniques, specifically autoencoders, to improve log analysis and reduce false positives.
- Current log analysis methods are limited by a narrow look-back period, often missing potential threats and increasing workload for security teams.
- The proposed autoencoder approach identifies significant deviations in user behavior by transforming and simplifying log data, establishing a baseline for anomaly detection.
- Key anomalies include unusual login locations and changes in application access patterns, with the Okta System Log capturing over 800 unique event types.
- Future experiments will refine detection methods, explore additional features, and monitor model performance, including the concept of “impossible travel” for further research.
Okta Enhances Cybersecurity Event Detection
Okta, a prominent identity and access management company, is advancing its cybersecurity event detection capabilities by addressing the limitations of current log analysis methods.
Limitations of Current Log Analysis
Currently, Okta logs are used in a rule-based model with a limited look-back period, typically analyzing only an average of 20 authentication attempts. This restriction can lead to the oversight of potential threats occurring outside this narrow timeframe. It also increases the incidence of false positives, creating unnecessary alerts and additional workload for security teams.
To enhance these existing methods, the integration of unsupervised learning techniques, specifically autoencoders, is proposed to bolster detection capabilities. Autoencoders are neural networks designed to replicate their input, allowing for the identification of significant deviations from typical user behavior. This approach requires the transformation and simplification of log data to effectively analyze patterns of user behavior.
Behavior Detection and Anomaly Identification
The Okta Behavior Detection function currently analyzes user behavior patterns to identify potential cybersecurity events. However, its rules-based engine may miss various threat scenarios. The article emphasizes the need to use the Behavior Detection tool in conjunction with other security measures for comprehensive threat identification. Establishing a baseline understanding of typical user behavior is critical to accurately detecting meaningful anomalies. These anomalies can include unusual login locations, changes in application access patterns, and abnormal behaviors within short time frames.
The Okta System Log captures event-level data in raw JSON format, documenting over 800 unique event types primarily linked to user logins and interactions. The configuration of an organization’s Okta environment can significantly influence the types of events captured. Security teams often monitor specific events like user account lockouts and login rate limits for anomaly detection.
Preparing Data for Autoencoding
To prepare data for autoencoding, several essential steps are outlined, including filtering the dataset to relevant events. Geohashing is employed to manage location variability by encoding latitude and longitude coordinates into bounded boxes. Frequency analysis is utilized to flag application logins as anomalous. Categorical variables are processed using the String Indexer feature in Spark for statistical analysis. Bootstrap sampling ensures that the training dataset accurately represents user behavior. Validation sets are created by injecting anomalies to assess the model’s performance.
Anomaly detection techniques are divided into supervised, unsupervised, and semi-supervised methods. Autoencoders are particularly highlighted for their suitability in unsupervised anomaly detection due to the lack of available annotations. The architecture of autoencoders is elaborated on, detailing the functions of the encoder and decoder. The importance of the loss function, which measures the similarity between original and reconstructed inputs, is emphasized. The article specifically discusses the use of Dice Loss as the loss function for the autoencoder model.
Preliminary results indicate that location anomalies can be identified with high precision. Further refinement is needed for factors such as event hour and day of the week. Future experiments are planned to explore additional features for anomaly detection and to scale the production workflow. The model’s performance metrics will be monitored and registered in MLFLOW, with retraining occurring as necessary. Additionally, the concept of “impossible travel,” where users appear to log in from geographically distant locations in an implausible timeframe, is mentioned as a potential area for further research. Acknowledgments are given to reviewers and contributors for their insights and suggestions that shaped the development of these strategies.
Original Source: Read the Full Article Here