K-mean clustering and its real use-case in the security domain
What is meant by K-means Clustering?
K-means clustering is a type of Unsupervised Learning, which is used when you have unlabeled data (i.e., data without defined categories or groups). The goal of this algorithm is to find groups in the data, with the number of groups represented by the variable K. The algorithm works iteratively to assign each data point to one of K groups based on the features that are provided. Data points are clustered based on feature similarity.
The results of the K-means clustering algorithm are:
1. The centroids of the K clusters, which can be used to label new data
2. Labels for the training data (each data point is assigned to a single cluster)
Algorithm steps Of K Means:
Step-1: Select the value of K, to decide the number of clusters to be formed.
Step-2: Select random K points which will act as centroids.
Step-3: Assign each data point, based on their distance from the randomly selected points (Centroid), to the nearest/closest centroid which will form the predefined clusters.
Step-4: place a new centroid of each cluster.
Step-5: Repeat step no.3, which reassigns each datapoint to the new closest centroid of each cluster.
Step-6: If any reassignment occurs, then go to step-4 else go to Step 7.
Step-7: FINISH
K-Means Real use-case in the security Domain:
Detecting bots or anomalies:
· Separate valid activity groups from bots
· Group valid activity to clean up outlier detection
Intrusion Detection Model:
Four general intrusion detection model is set up, the first to use collection system, guarantee the connection records in the process of use, and can get clustering analysis of data sets, and then with the help of clustering algorithm distribution connection records, distinguish normal and abnormal connection records. In this study, a k-means algorithm was used to complete cluster analysis. The clustering algorithm results in more clustering, so there are some connection records in each cluster. According to the properties of a given connection record, the properties can be used to determine the two kinds of abnormal clustering and normal clustering. The exception clustering represents the clustering of the abnormal connection records, and the normal clustering represents the clustering of the normal connection records.
In system applications, if you can’t use tagged data, you can’t clearly determine the normal or abnormal condition of the connection record, and then make the clustering tag. Typically, a threshold is used to record the record of the connection above the threshold for the normal clustering, whereas the other is exception clustering. Using cluster analysis result intrusion methods that connection records, first carry on the standardization, and then from the cluster aggregation clustering, to find the right to his central value close to the distance, complete classification operation according to the tag.
Privacy-Preserving Two-Party K-Means Clustering via Secure Approximation:
K-means clustering is a powerful and frequently used technique in data mining. However, privacy breaching is a serious problem if the k-means clustering is used without any security treatment, while privacy is a real concern in many practical applications. Recently, four privacy-preserving solutions based on cryptography have been proposed by different researchers. Unfortunately, none of these four schemes can achieve both security and completeness with good efficiency. In this paper, we present a new scheme to overcome the problems that occurred previously. Our scheme deals with data standardization in order to make the result more reasonable. We show that our scheme is secure and complete with good efficiency.