Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Delete page URL in a archived report that doesn't have raw logs. #19667

Open
jorgeuos opened this issue Aug 24, 2022 · 2 comments
Open

Delete page URL in a archived report that doesn't have raw logs. #19667

jorgeuos opened this issue Aug 24, 2022 · 2 comments
Labels
Enhancement For new feature suggestions that enhance Matomo's capabilities or add a new report, new API etc.

Comments

@jorgeuos
Copy link

I have recently detected that I might have personal data within some page url's and I need to delete them from the archived blobs.

Is there any solution for this already?

Thes URL's was removed from the site but we waited too long too long to remove them 🤦🏻‍♂️.

  • Matomo Version: >=4.7.1-stable,<5.0.0-b1"
  • PHP Version: >=8.0
  • Server Operating System: Linux/Alpine
  • Additionally installed plugins: PrivacyManager

As far as I know, the GDPR tool only works when I still have raw data.

I noticed that the blobs in the archive tables are serialized with PHP functions and Mysql works a bit differently. So I'm suspecting that it needs to be done in PHP to be able to unserialize and then serialized back again.

Any advice would be appreciated.

This is what I got so far(work in progress):

    public function checkForPiiInBlob()
    {
        Piwik::checkUserHasSuperUserAccess();
        $dataRows = $this->db->getBlob();
        echo "<pre>";
        foreach ($dataRows as $row) {
            $needle = "?IDENTIFIER=";
            $chunks = gzuncompress($row['value']);
            if (strpos($chunks,$needle)){
                $unserialized_multi = Common::safe_unserialize($chunks);
                foreach ($unserialized_multi as $serialized) {
                    if (strpos($serialized, $needle)){
                        $chunk = Common::safe_unserialize($serialized);
                        print_r($chunk);
                        echo "\n";
                    }
                }
            }
        }
        echo "</pre>";
    }

Obviously I'm running this locally only to practice and learning about the archiving semantics.

Best regards, Jorge

@jorgeuos jorgeuos added the Enhancement For new feature suggestions that enhance Matomo's capabilities or add a new report, new API etc. label Aug 24, 2022
@sgiehl
Copy link
Member

sgiehl commented Aug 29, 2022

Hi @jorgeuos
As far as I know it is currently not possible to remove certain record from archieved reports.
But in terms of GDPR it might indeed be useful in some cases to be able to do that even after raw data already was removed.
I'll hand this over to our product team for a priority decision.

Btw. the data serializing is a bit more complex in Matomo. Simply unserializing and manipulating the data might end up in a broken state. Guess to be safe you would need to build a datatable from it, manipulate it and serialize it again.

@sgiehl sgiehl added this to the For Prioritization milestone Aug 29, 2022
@jorgeuos
Copy link
Author

Thank you for your reply.

I've done some trial and errors with promising results. I'll update my findings here when I had some more time to try out my PoC.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement For new feature suggestions that enhance Matomo's capabilities or add a new report, new API etc.
Projects
None yet
Development

No branches or pull requests

2 participants