AWS: how to fix S3 event replacing space with '+' sign in object key names in json

aws s3 cp filename with spaces
s3 file names with spaces
aws path with space
amazon s3 utf-8 encoding
aws s3 file name limitations
s3 filename restrictions
s3 key plus sign
aws lambda upload file to s3 python

I have a lamba function to copy objects from bucket 'A' to bucket 'B', and everything was working fine, until and object with name 'New Text Document.txt' was created in bucket 'A', the json that gets built in S3 event, key as "key": "New+Text+Document.txt".

the spaces got replaced with '+'. I know it is a known issue by seraching on web. But I am not sure how to fix this and the incoming json itself has a '+' and '+' can be actually in the name of the file. like 'New+Text Document.txt'.

So I cannot blindly have logic to space '+' by ' ' in my lambda function.

Due to this issue, when code tries to find the file in bucket it fails to find it.

Please suggest.

I came across this looking for a solution for a lambda written in python instead of java; "urllib.parse.unquote_plus" worked for me, it properly handled a file with both spaces and + signs:

from urllib.parse import unquote_plus
import boto3


bucket = 'testBucket1234'
# uploaded file with name 'foo + bar.txt' for test, s3 Put event passes following encoded object_key
object_key = 'foo %2B bar.txt'
print(object_key)
object_key = unquote_plus(object_key)
print(object_key)

client = boto3.client('s3')
client.get_object(Bucket=bucket, Key=object_key)

plus sign is interpreted as space in, Plus signs anywhere in a S3 URL are interpreted as spaces. applications and prevents the use of S3 (or CloudFront) as a web-server replacement. exports. handler = (event, context, callback) => { const request = event. New object created events — Amazon S3 supports multiple APIs to create objects. You can request notification when only a specific API is used (for example, s3:ObjectCreated:Put), or you can use a wildcard (for example, s3:ObjectCreated:*) to request notification when an object is created regardless of the API used.

What I have done to fix this is

java.net.URLDecoder.decode(b.getS3().getObject().getKey(), "UTF-8")


{
    "Records": [
        {
            "s3": {
                "object": {
                    "key": "New+Text+Document.txt"
                }
            }
        }
    ]
}

So now the JSon value, "New+Text+Document.txt" gets converted to New Text Document.txt, correctly.

This has fixed my issue, please suggest if this is very correct solution. Will there be any corner case that can break my implementation.

Troubleshoot S3 Changes That Aren't Propagating to a Storage , Why aren't changes on my Amazon S3 bucket showing on the Storage Gateway S3) bucket, such as uploading a new file or removing an existing file. an event-driven operation, such as an Amazon S3 event notification, unless This configuration can space the refresh cache calls apart by a few hours. The S3Event type that the handler uses as the input type is one of the predefined classes in the aws-lambda-java-events library that provides methods for you to easily read information from the incoming Amazon S3 event. The handler returns a string as output.

I think in Java you should use:

getS3().getObject().getUrlDecodedKey()

method that returns decoded key, instead of

getS3().getObject().getKey()

File uploaded to S3 with space can't be downloaded with , Upload to S3 a file with a space in its name Create an S3 bucket, attach an SNS notification on the bucket for PUT events Exception Amazon.S3. AmazonS3Exception: The specified key does not exist. and all three receive incorrect (space replaced by plus) keys, so it looks like this is an S3 issue. Amazon S3 can publish events to an Amazon Simple Notification Service (Amazon SNS) topic, an Amazon Simple Queue Service (Amazon SQS) queue, or an AWS Lambda function. For more information, see Configuring Amazon S3 event notifications.

Since we are sharing for other runtimes here is how to do it in NodeJS:

const srcKey = decodeURIComponent(event.Records[0].s3.object.key.replace(/\+/g, " "));

From the AWS docs here

[AWS Lambda] Keys in S3Event Records replace spaces with plus , I just spent an hour or more of my life figuring out why an AWS Lambda function was failing when trying to access S3 only to realize that the� In addition to AWS CloudFormation permissions, you must be allowed to use the underlying services, such as Amazon S3 or Amazon EC2. When stacks are in the DELETE_FAILED state because AWS CloudFormation couldn't delete a resource, rerun the deletion with the RetainResources parameter and specify the resource that AWS CloudFormation can't delete.

Agree with Scott. for me create object event was appending %3 for semicolon : i have to replace it twice to get correct s3 url

Python code:

    def lambda_handler(event, context):
    logger.info('Event: %s' % json.dumps(event))
    source_bucket = event['Records'][0]['s3']['bucket']['name']
    key_old = event['Records'][0]['s3']['object']['key']
    key_new = key_old.replace('%3',':')
    key = key_new.replace(':A',':')
    logger.info('key value')
    logger.info(key)

Amazon S3, With the Amazon S3 destination, you configure the region, bucket, and The Amazon S3 destination can generate events that you can use in an event stream. For example, enter a space to replace each new line character with a space. The MSCK REPAIR TABLE command scans a file system such as Amazon S3 for Hive compatible partitions that were added to or removed from the file system after the table was created. The command updates the metadata in the catalog regarding the partitions and the data associated with them.

Sample Amazon S3 function code, Object key may have spaces or unicode non-ASCII characters. const srcKey = decodeURIComponent(event.Records[0].s3.object.key.replace(/\+/g, " ")); const� Check that there aren’t any extra spaces in the bucket policy or IAM user policies. For example, the following IAM policy has an extra space in the Amazon Resource Name (ARN) arn:aws:s3::: awsexamplebucket/*. Because of the space, the ARN is incorrectly evaluated as arn:aws:s3:::%20awsexamplebucket/*. This means that IAM user doesn’t have

Amazon S3, Amazon S3 or Amazon Simple Storage Service is a service offered by Amazon Web Services Amazon S3 can be used to replace significant existing (static) web-hosting page to display and another page to display in the event of a partially invalid URL, DigitalOcean Spaces API - DigitalOcean Documentation Archived� AWS Glue Crawler read files (any formats from csv to headers to parquet) from S3 and load in AWS Athena. Qlik can connect to Athena with JDBC connector. No server running, unlimited data space, very cheap cost for storage (S3 pricing), it seems that the speed is acceptable also for large files.

AWS S3: 10 Secrets to Optimized Performance, Another approach is with EMR, using Hadoop to parallelize the problem. For multipart uploads on a higher-bandwidth network, a reasonable part� I am unable to connect to my Amazon EC2 Windows instance, I am experiencing boot issues, or I need to perform a restore, fix common issues such as a disk signature collision, or gather operating system (OS) logs for analysis and troubleshooting.

Comments
  • If you express the name as an HTML url, you could avoid this kind of "collision" : space becomes %20 and + becomes %2B ? You can then convert it back to the real character.
  • Thanks @LoneWanderer, but this is a json value that I get from S3 put event.
  • Got it, but I think you are screwed up ... If you have to try all combination of +and ` ` by opening a file to find out what was the real filename, you can get into trouble ... Can't you just forbid + in file names ? sounds violent, but hey ...
  • @LoneWanderer there's an entrenched bug in S3's internal object key representation, presumably a SOAP holdover. %20 and + in a PUT URI are both stored internally as the character +. Both symbols in a URI mean ASCII 32... meanwhile, %2B is stored as %2B, even though no browser would ever escape + as %2B in a path (that should only happen in the query string). If you upload a file called foo+bar or foo%20bar, you can actually download the same file as either foo+bar or foo%20bar. That is the same object.
  • What do have to do If my file is named with a plus sign and a space ?
  • This should be the correct solution. Unless there are edge/corner cases not handled in an expected/sensible fashion by java.net.URLDecoder.decode(), your solution seems exactly correct.
  • The problem is that 1. "New+Text+Document.txt" and 2. "New Text Document.txt", and 3. "New Text+Document.txt" will be the same in the event (key: "New+Text+Document.txt"). Your code will be fail on cases 1 and 3.
  • same issue in golang, fixed with url.QueryUnescape(s3key) from net/url
  • the problem he's describing and that led me here is that the lambda 'create object' event trigger is what includes the + for space, which means you don't have an object yet because the key (as returned by the event) doesn't match any objects in the bucket.
  • I have the exact same problem as the question. This solution solves the problem using a native method available in the Object - simple and elegant. It returns the key with out the encoding. The subsequent getObject operation finds the file key successfully and moves the file from Bucket A to Bucket B.