Regex/ code to fix corrupt serialized PHP data.

Related searches

I have a massive multidimensional array that has been serialised by PHP. It has been stored in MySQL and the data field wasn't large enough... the end has been cut off... I need to extract the data... unserialize wont work... does anyone know of a code that can close all the arrays... recalculate string lengths... it's too much data to do by hand.

Many thanks.

I think this is almost impossible. Before you can repair your array you need to know how it is damaged. How many childs missing? What was the content?

Sorry imho you can't do it.

Proof:

<?php

$serialized = serialize(
    [
        'one'   => 1,
        'two'   => 'nice',
        'three' => 'will be damaged'
    ]
);

var_dump($serialized); // a:3:{s:3:"one";i:1;s:3:"two";s:4:"nice";s:5:"three";s:15:"will be damaged";}

var_dump(unserialize('a:3:{s:3:"one";i:1;s:3:"two";s:4:"nice";s:5:"tee";s:15:"will be damaged";}')); // please note 'tee'

var_dump(unserialize('a:3:{s:3:"one";i:1;s:3:"two";s:4:"nice";s:5:"three";s:')); // serialized string is truncated

Link: https://ideone.com/uvISQu

Even if you can recalculate length of your keys/values, you cannot trust the data retrieved from this source, because you cannot recalculate the value of these. Eg. if the serialized data is an object, your properties won't be accessible anymore.

Regex/ code to fix corrupt serialized PHP data, Regex/ code to fix corrupt serialized PHP data. 啃猪蹄的小仙女 关注. 发布时间: 2019-01-17 23:36. 可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能� PHP has many useful functions to work with regular expressions. Here is a quick cheat sheet of the main PHP regex functions. Remember that all of them are case sensitive. For more information about the native functions for PHP regular expressions, have a look at the manual.

This is recalculating the length of the elements in a serialized array:

$fixed = preg_replace_callback(
    '/s:([0-9]+):\"(.*?)\";/',
    function ($matches) { return "s:".strlen($matches[2]).':"'.$matches[2].'";';     },
    $serialized
);

However, it doesn't work if your strings contain ";. In that case it's not possible to fix the serialized array string automatically -- manual editing will be needed.

Corrupted data : malformated strings serialization in PHP , http://stackoverflow.com/questions/3148712/regex-code-to-fix-corrupt-serialized- php-data. And answers are : - Don't do that, revert to your� The ranges shown above are general; you could also use the range [0-3] to match any decimal digit ranging from 0 through 3, or the range [b-v] to match any lowercase character ranging from b through v.

I have tried everything found in this post and nothing worked for me. After hours of pain here's what I found in the deep pages of google and finally worked:

function fix_str_length($matches) {
    $string = $matches[2];
    $right_length = strlen($string); // yes, strlen even for UTF-8 characters, PHP wants the mem size, not the char count
    return 's:' . $right_length . ':"' . $string . '";';
}
function fix_serialized($string) {
    // securities
    if ( !preg_match('/^[aOs]:/', $string) ) return $string;
    if ( @unserialize($string) !== false ) return $string;
    $string = preg_replace("%\n%", "", $string);
    // doublequote exploding
    $data = preg_replace('%";%', "µµµ", $string);
    $tab = explode("µµµ", $data);
    $new_data = '';
    foreach ($tab as $line) {
        $new_data .= preg_replace_callback('%\bs:(\d+):"(.*)%', 'fix_str_length', $line);
    }
    return $new_data;
}

You call the routine as follows:

//Let's consider we store the serialization inside a txt file
$corruptedSerialization = file_get_contents('corruptedSerialization.txt');

//Try to unserialize original string
$unSerialized = unserialize($corruptedSerialization);

//In case of failure let's try to repair it
if(!$unSerialized){
    $repairedSerialization = fix_serialized($corruptedSerialization);
    $unSerialized = unserialize($repairedSerialization);
}

//Keep your fingers crossed
var_dump($unSerialized);

Regex/ code to fix corrupt serialized PHP data., Regex/ code to fix corrupt serialized PHP data. 由▽魔方西西 提交于2019-11-27 04:39:35. I have a massive multidimensional array that has been serialised by� User Submitted Data Magic Quotes Hiding PHP Keeping Current Features HTTP authentication with PHP Cookies Sessions Dealing with XForms Handling file uploads Using remote files Connection handling Persistent Database Connections Safe Mode Command line usage Garbage Collection DTrace Dynamic Tracing Function Reference Affecting PHP's Behaviour

Solution:

1) try online:

Serialized String Fixer (online tool)

2) Use function:

unserialize( serialize_corrector($serialized_string ) ) ;

code:

function serialize_corrector($serialized_string){
    // at first, check if "fixing" is really needed at all. After that, security checkup.
    if ( @unserialize($serialized_string) !== true &&  preg_match('/^[aOs]:/', $serialized_string) ) {
        $serialized_string = preg_replace_callback( '/s\:(\d+)\:\"(.*?)\";/s',    function($matches){return 's:'.strlen($matches[2]).':"'.$matches[2].'";'; },   $serialized_string );
    }
    return $serialized_string;
} 

there is also this script, which i haven't tested.

serialize - Manual, DO NOT serialize data and place it into your database. Serialize Here is the PHP code you can run on your server to try it out: The fix, suggested by evulish on #php/irc.dal.net, is to replace htmlspecialchars()/htmlspecialchars_decode() by� March 2017 Number 2: Over time PHP deprecated and remove some of its functions and variables that were supported in past versions. This package parse PHP code and find known issues of functions, variables and php.ini configuration directives that are deprecated.

Following snippet will attempt to read & parse recursively damaged serialized string (blob data). For example if you stored into database column string too long and it got cut off. Numeric primitives and bool are guaranteed to be valid, strings may be cut off and/or array keys may be missing. The routine may be useful e.g. if recovering significant (not all) part of data is sufficient solution to you.

class Unserializer
{
    /**
    * Parse blob string tolerating corrupted strings & arrays
    * @param string $str Corrupted blob string
    */
    public static function parseCorruptedBlob(&$str)
    {
        // array pattern:    a:236:{...;}
        // integer pattern:  i:123;
        // double pattern:   d:329.0001122;
        // boolean pattern:  b:1; or b:0;
        // string pattern:   s:14:"date_departure";
        // null pattern:     N;
        // not supported: object O:{...}, reference R:{...}

        // NOTES:
        // - primitive types (bool, int, float) except for string are guaranteed uncorrupted
        // - arrays are tolerant to corrupted keys/values
        // - references & objects are not supported
        // - we use single byte string length calculation (strlen rather than mb_strlen) since source string is ISO-8859-2, not utf-8

        if(preg_match('/^a:(\d+):{/', $str, $match)){
            list($pattern, $cntItems) = $match;
            $str = substr($str, strlen($pattern));
            $array = [];
            for($i=0; $i<$cntItems; ++$i){
                $key = self::parseCorruptedBlob($str);
                if(trim($key)!==''){ // hmm, we wont allow null and "" as keys..
                    $array[$key] = self::parseCorruptedBlob($str);
                }
            }
            $str = ltrim($str, '}'); // closing array bracket
            return $array;
        }elseif(preg_match('/^s:(\d+):/', $str, $match)){
            list($pattern, $length) = $match;
            $str = substr($str, strlen($pattern));
            $val = substr($str, 0, $length + 2); // include also surrounding double quotes
            $str = substr($str, strlen($val) + 1); // include also semicolon
            $val = trim($val, '"'); // remove surrounding double quotes
            if(preg_match('/^a:(\d+):{/', $val)){
                // parse instantly another serialized array
                return (array) self::parseCorruptedBlob($val);
            }else{
                return (string) $val;
            }
        }elseif(preg_match('/^i:(\d+);/', $str, $match)){
            list($pattern, $val) = $match;
            $str = substr($str, strlen($pattern));
            return (int) $val;
        }elseif(preg_match('/^d:([\d.]+);/', $str, $match)){
            list($pattern, $val) = $match;
            $str = substr($str, strlen($pattern));
            return (float) $val;
        }elseif(preg_match('/^b:(0|1);/', $str, $match)){
            list($pattern, $val) = $match;
            $str = substr($str, strlen($pattern));
            return (bool) $val;
        }elseif(preg_match('/^N;/', $str, $match)){
            $str = substr($str, strlen('N;'));
            return null;
        }
    }
}

// usage:
$unserialized = Unserializer::parseCorruptedBlob($serializedString);

PHP 7 ChangeLog, Standard: Fixed bug #74267 (segfault with streams and invalid data). Fixed bug #67369 (ArrayObject serialization drops the iterator class). Standard: Fixed bug #78323 (Code 0 is returned on invalid options). Fixed bug #77047 ( pg_convert has a broken regex for the 'TIME WITHOUT TIMEZONE' data type). phar:. User Submitted Data Magic Quotes Hiding PHP Keeping Current Features HTTP authentication with PHP Cookies Sessions Dealing with XForms Handling file uploads Using remote files Connection handling Persistent Database Connections Safe Mode Command line usage Garbage Collection DTrace Dynamic Tracing Function Reference Affecting PHP's Behaviour

After doing further research I have found a work around solution. According to this blog post: "It turns out that if there's a ", ', :, or ; in any of the array values the serialization gets corrupted."

Parameters. options. The options to set. This is a string where each character is an option. To set a mode, the mode character must be the last one set, however there can only be set one mode but multiple options.

serialize() returns a string containing a byte-stream representation of any value that can be stored in PHP. unserialize() can use this string to recreate the original variable values. Using serialize to save an object will save all variables in an object. The methods in an object will not be saved

Comments
  • This may be a useful resource for some people finding this question - I've used it many times and it's worked well every time: github.com/Blogestudio/Fix-Serialization (granted this would likely not help where a large portion of the string has been cut off - only when you've done a search and replace and the string lengths are off)
  • do you have a better answer?
  • why do you want to do anything at all?
  • Ultimately, it is not your answer that is wrong, it is the question that is Unclear / Cannot be reproduced. I got sucked in by all of the other answers that dropped in byte count adjusting snippets and didn't read the question well enough. This page should be closed and I should find a better home for my answer. My apologies for poking your old post.
  • I don't necessarily agree. the scenario op described is a real one. given some serialized data which has been damaged, and op wanted to know if there's any way to fix it using regular expressions. I still think there's no way to do that, (at least not with regular expressions) because it'd be a guesswork
  • Yeah. Okay, again, I think you are right. I think your answer is the only correct answer on the page. I'll be removing mine when I get a chance. Your downvote tally is misleading. Perhaps you could rephrase your wording so that it doesn't look like you are asking questions.
  • Strings containing double quotes do work since the last edit with the ";" added
  • How should apply this to a WordPress scenario where I search and replaced the .sql and the serialized data got out of synch? Please help!
  • Thanks! I finally used this script and worked very well: interconnectit.com/products/…
  • Thanks for the nifty tool.