why do we use Count Encoder?
What is count encoder?
I found this encoder during the micro-course of Kaggle. I wondered how this encoder work and where the situation we use it. First of all, I will use the categorical_encoders package. I didn't find the other that has count encoder. I feel it is enough.It is easy to understand. As you can see, it tells you how many times each unique exists in your data. Above data, we have two colors of red and one color of green and yellow. So we could know how many times of the color are in the data by seeing color_cnt. the less number color_cnt has, the more the color of the row is unique. This is the gist of this encoder. Well then, in which type of problem do we want to use it?
why do we use Count encoder?
Count encoder can be useful when we need to get information in terms of the frequency. below would be helpful to understand.As you can see, using Count encoder, we can the correlation between the frequency of how many times the customer came and whether they bought the lottery.
There is a possibility for the other reason we use Count Encoder. While I was posting this article, I've searched a lot about why we use this encoder. But I couldn't find the clear answer. So if somebody finds the error in my inference, Feel free to tell me.
reference
[1] https://www.kaggle.com/matleonard/categorical-encodings
Comments
Post a Comment