Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update documentation for log_action table about hash and name #14883

Closed
MichaelRoosz opened this issue Sep 10, 2019 · 1 comment
Closed

Update documentation for log_action table about hash and name #14883

MichaelRoosz opened this issue Sep 10, 2019 · 1 comment
Labels
answered For when a question was asked and we referred to forum or answered it.
Milestone

Comments

@MichaelRoosz
Copy link
Contributor

MichaelRoosz commented Sep 10, 2019

While doing some research on the log_action table, I discovered that the chance of a collision of a crc32 hash is pretty high:

https://preshing.com/20110504/hash-collision-probabilities/ (see at the bottom "Small Collision Probabilities")

In my setup we have about 182 million rows in log_action which means the chance of a collision is already way higher than 1 out of 2 if I am not mistaken.

So it seems like it is important that sql queries always match on hash and name.
Maybe this information should be added to the documentation somewhere (https://developer.matomo.org/guides/persistence-and-the-mysql-backend)

Maybe it also makes sense to add some info about always querying for type, hash and name at the same time so that the index can be used.

@MichaelRoosz MichaelRoosz changed the title Should log_action hash type of crc32 be changed? Update documentation for log_action table about hash and name Sep 10, 2019
@tsteur
Copy link
Member

tsteur commented Sep 10, 2019

I think it's more showing that there's eg a 50% chance that you have a collision in general after 80K rows being in there. Which is fine though in this case. It is definitely expected to have a lot of collisions in the table. It should still be quite fast. You can do a group by hash with a count() order by count() desc to get a feel for it how often an individual hash is used max.

@tsteur tsteur closed this as completed Sep 10, 2019
@tsteur tsteur added the answered For when a question was asked and we referred to forum or answered it. label Sep 10, 2019
@mattab mattab added this to the 3.12.0 milestone Oct 27, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
answered For when a question was asked and we referred to forum or answered it.
Projects
None yet
Development

No branches or pull requests

3 participants