K-mean clustering and its real use-case in the security domain

What is meant by K-means Clustering?

K-means clustering is a type of Unsupervised Learning, which is used when you have unlabeled data (i.e., data without defined categories or groups). The goal of this algorithm is to find groups in the data, with the number of groups represented by the variable K. The algorithm works iteratively to assign each data point to one of K groups based on the features that are provided. Data points are clustered based on feature similarity.

The results of the K-means clustering algorithm are:

1. The centroids of the K clusters, which can be used to label new data

2. Labels for the training data (each data point is assigned to a single cluster)

Algorithm steps Of K Means:

Step-1: Select the value of K, to decide the number of clusters to be formed.

Step-2: Select random K points which will act as centroids.

Step-3: Assign each data point, based on their distance from the randomly selected points (Centroid), to the nearest/closest centroid which will form the predefined clusters.

Step-4: place a new centroid of each cluster.

Step-5: Repeat step no.3, which reassigns each datapoint to the new closest centroid of each cluster.

Step-6: If any reassignment occurs, then go to step-4 else go to Step 7.

Step-7: FINISH

K-Means Real use-case in the security Domain:

· Separate valid activity groups from bots

· Group valid activity to clean up outlier detection

Four general intrusion detection model is set up, the first to use collection system, guarantee the connection records in the process of use, and can get clustering analysis of data sets, and then with the help of clustering algorithm distribution connection records, distinguish normal and abnormal connection records. In this study, a k-means algorithm was used to complete cluster analysis. The clustering algorithm results in more clustering, so there are some connection records in each cluster. According to the properties of a given connection record, the properties can be used to determine the two kinds of abnormal clustering and normal clustering. The exception clustering represents the clustering of the abnormal connection records, and the normal clustering represents the clustering of the normal connection records.

In system applications, if you can’t use tagged data, you can’t clearly determine the normal or abnormal condition of the connection record, and then make the clustering tag. Typically, a threshold is used to record the record of the connection above the threshold for the normal clustering, whereas the other is exception clustering. Using cluster analysis result intrusion methods that connection records, first carry on the standardization, and then from the cluster aggregation clustering, to find the right to his central value close to the distance, complete classification operation according to the tag.

K-means clustering is a powerful and frequently used technique in data mining. However, privacy breaching is a serious problem if the k-means clustering is used without any security treatment, while privacy is a real concern in many practical applications. Recently, four privacy-preserving solutions based on cryptography have been proposed by different researchers. Unfortunately, none of these four schemes can achieve both security and completeness with good efficiency. In this paper, we present a new scheme to overcome the problems that occurred previously. Our scheme deals with data standardization in order to make the result more reasonable. We show that our scheme is secure and complete with good efficiency.




I always curious about Everything & Try to Discover new things. I like to share my thoughts which some people told me my thought is more unique

Love podcasts or audiobooks? Learn on the go with our new app.

Datanieuws binnen en buiten het Rijk 29–05-2018:

Catching a Welcher: Classifying a Credit Card Defaulter

Machine Learning for Sales Forecast


Don’t Tell the ROI Story for Your Data Team

Selection Bias in Political Polling

The Daily Life of a Health Data Scientist

Estimation — A simple way to do it well

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store


I always curious about Everything & Try to Discover new things. I like to share my thoughts which some people told me my thought is more unique

More from Medium

Supervised VS. Unsupervised

Fetal Arc: Predicting Fetal Health, and Birth-Weight of the fetus using Machine Learning

New born baby lying down

Rate a single performance — League of Legends and other competitive games

Generating iPhone13 mini cases using StyleGAN2-ADA