Hubert Bryłkowski: Who's the king; who's the ruler. How to answer the question before it was posted.

فرامرز
منتشر شده در 16 اسفند 1396

I will share with you a story about hashes, what they're good at and what they're bad at. Most importantly how to use them in a not-so-typical way.

I was faced with a challenge to search a database of questions (about 2 million records) and find duplicates among them.

It may look like a pretty simple problem, but doing this efficiently was not trivial. I will explain the algorithms used, discuss their benefits, and show you how I tweaked them to our needs.

My main topic will be MinHash and LSH, with a little reminder about general hashing algorithms.

دیدگاه کاربران