@jorgeuos opened this Issue on August 24th 2022

I have recently detected that I might have personal data within some page url's and I need to delete them from the archived blobs.

Is there any solution for this already?

Thes URL's was removed from the site but we waited too long too long to remove them 🤦🏻‍♂️.

  • Matomo Version: >=4.7.1-stable,<5.0.0-b1"
  • PHP Version: >=8.0
  • Server Operating System: Linux/Alpine
  • Additionally installed plugins: PrivacyManager

As far as I know, the GDPR tool only works when I still have raw data.

I noticed that the blobs in the archive tables are serialized with PHP functions and Mysql works a bit differently. So I'm suspecting that it needs to be done in PHP to be able to unserialize and then serialized back again.

Any advice would be appreciated.

This is what I got so far(work in progress):

    public function checkForPiiInBlob()
        $dataRows = $this->db->getBlob();
        echo "<pre>";
        foreach ($dataRows as $row) {
            $needle = "?IDENTIFIER=";
            $chunks = gzuncompress($row['value']);
            if (strpos($chunks,$needle)){
                $unserialized_multi = Common::safe_unserialize($chunks);
                foreach ($unserialized_multi as $serialized) {
                    if (strpos($serialized, $needle)){
                        $chunk = Common::safe_unserialize($serialized);
                        echo "\n";
        echo "</pre>";

Obviously I'm running this locally only to practice and learning about the archiving semantics.

Best regards, Jorge

@sgiehl commented on August 29th 2022 Member

Hi @jorgeuos
As far as I know it is currently not possible to remove certain record from archieved reports.
But in terms of GDPR it might indeed be useful in some cases to be able to do that even after raw data already was removed.
I'll hand this over to our product team for a priority decision.

Btw. the data serializing is a bit more complex in Matomo. Simply unserializing and manipulating the data might end up in a broken state. Guess to be safe you would need to build a datatable from it, manipulate it and serialize it again.

@jorgeuos commented on August 30th 2022

Thank you for your reply.

I've done some trial and errors with promising results. I'll update my findings here when I had some more time to try out my PoC.

Powered by GitHub Issue Mirror