New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bulk insert reports when archiving #2243
Comments
(In [4239]) Refs #2243
|
Only unit tests. I'll address this when we have parameterized builds running on Jenkins. |
OK cool I actually meant unit tests, that's perfect |
(In [4253]) Fixes #2243
|
Can you use MySQL's native HEX() and UNHEX() functions? I would think this would give us backward compatibility on blobs while also reducing php's runtime memory usage (because the string conversion happens in the db server). |
To use HEX we would have to use LOAD DATA INFILE ....... SET value = HEX(value) but the SET feature is only available as of 5.0.3 For the select part, I gave it a quick try, but it doesn't seem to work. Attached patch for reference. |
Attachment: |
That wasn't what I meant. Let me test something. |
I'm still investigating an alternative to hex2bin/bin2hex because the current method doubles the blob size, but I want to log some observations first: re: r4259. Not sure why it passed (before commenting it out) on your box. The CI server runs 5.3.6, so it's falling back to databaseInsertIterate. I suspect the test fails here because it inserts binary data (e.g., containing NULs) into a varchar column. MySQL documentation says LOAD DATA INFILE uses the database's character set but we haven't explicitly set this in the CREATE DATABASE. Also, the "CHARACTER SET" clause wasn't added to the LOAD DATA INFILE syntax until MySQL 5.0.38. (At minimum, we should set the database charset for consistency.) One thought is conditionals to use the newer features if available, e.g., Zend_Db_Adapters have a getServerVersion() method. |
(In [4263]) refs #2243 - set database's default charset explicitly |
(In [4264]) refs #2243 - use Piwik_Exec() for MYSQLI compatibility, i.e., can't prepare LOAD DATA INFILE when using Piwik_Query(). Tested with 5.1.3, 5.1.6, 5.2.0, and 5.2.17. |
I'll add a test case that uses a table with a blob column It appears the bin2hex/hex2bin is unnecessary, so long as the column's datatype is blob. |
You says it doubles the blob size, but my tests shows that in fact it doesn't. OK for charset, good thinking. I'm just slightly concerned about the added escaping function, but tests are passing which should be fine. When you test, it is good to also not only rely on unit tests: the blob sizes & scope they generate is quite limited. Try to generate loads of data using Visito Generator, then archive the data with the old & new algorithm to see if both work as excpected. You can also easily compare piwik_archive_* db size.
|
But that doesn't disable the bin2hex/hex2bin. When I say "double the size", I'm comparing the old blobs (octets) vs the new blobs (nybbles as hexits). So, DEBUG_DO_NOT_USE_BULK_INSERT doesn't affect the size as it still uses hexits. With the escaping function (used only when writing to the infile), blobs go back to being binary data. |
(In [4272]) refs #2243 - looks good so far |
oh oh I messed up! and I thought that the results were strange, well I should have thought a bit more indeed The tests are failing for missing table: http://qa.piwik.org:8080/webtest/003_UnitTests/001_response_invoke.html when you are confident all is working and tested properly, please ping me and I will test on the demo :) |
Yeah, I forgot to drop the database before running my unit test changes. I'll fix this a.m. |
Fixing the unit tests is taking longer than I thought ... encountering more test runner side-effects (similar to #896). |
(In [4285]) refs #2243 - move test code around |
The non-integration tests are working but the integration tests are broken. I'm getting different errors depending on whether I run Main.test.php or all_tests.php. I've been hacking this since r4287. I'm off for dinner, and will revisit later tonight, unless someone beats me to it. ;) |
blink blink Main.test.php passes from command line and browser. Only failing in the browser when run as part of all_tests.php. |
(In [4294]) fixes #2255
refs #2243
|
Reopening as on Windows at least there are some issues: ArchiveProcessing.test.php
Also, some Main.test.php fail with "no data" when there should be some, which I think is because the blobs don't uncompress properly and Piwik defaults to "no data" rather than throwing the error. Maybe the archiveprocessing failure above can help: it looks like utf8 characters. COuld it be some kind of charset issue? or windows specific issues? Alternatively, we could disable bulk loading on Windows, but only when we are 100% sure that the failing tests are due to windows. |
I hope it's trivial ... I'll take a quick look in the a.m. |
Actually, only ArchiveProcessing.test.php is failing, Main.test.php seems fine |
We may need some more users to test on Windows. ArchiveProcessing.test.php passes for me with r4314 on Windows using:
Can you run On my Windows box, my.ini has:
This is the same as my Ubuntu dev box:
|
Runnning the query on the piwik_tests database, I get:
and the values from mysql.exe
|
FYI the code is now running on demo.piwik.org I had to DROP the archive tables for March & April since the ones built with the previous RC were HEXed and therefore not read properly. After dropping the tables, I ran archiving. The problem is that, apparently the load data infile isn't used, I saw this in the SQL prfiler output
investigating... |
Attachment: Hack for charset mismatch |
matt: can you try the attached patch? |
On demo.piwik.org: There is just an issue that somehow the Piwik_Exec doesn't log the query execution in the profiler like it does with other functions After adding lots of logging the bulk insert code is executed as expected, cool!! |
I applied the patch locally on windows andit fixes the ArchiveProcessing.test.php !!! |
(In [4320]) fixes #2243 - charset mismatch hack; also handle case where dbhost is blank (eg socket), or a pseudo-domain was added to the hostname |
Extract from Custom date range processing with sql profiler enabled (see global.ini.php)
What happens is that the Page URL / Page Titles have many children, ie. thousands, and Piwik will INSERT one at a time, in this case 5313 times. Instead, Piwik should log all these into a file and use LOAD DATA INFILE
I'm pretty sure we'll hit some bugs with some user configurations, so plan for a setting in config file to disable the bulk insert.
The text was updated successfully, but these errors were encountered: