Categories
Cloud

How to Update AWS S3 meta data

Amazon S3 is an object storage service. The data is stored as an object. Each object contains the data and a metadata file.

The metadata file contains the details about the data. For example, a time series Object metadata contains the start time and end time information.

I use S3 to upload the Prometheus TSDB data. https://thanos.io/ is used to upload the data.

Thanos uploads the data and adds certain labels for its understanding.

Eg metadata file looks like below

[:vishnu:root@opstest1.sjc2 ~]# cat /opt/thanos/data/01E5DBQZGNSP678AH4TW2499H4/meta.json
{
	"ulid": "01E5DBQZGNSP528AH1TW2499H4",
	"minTime": 1586354400000,
	"maxTime": 1586361600000,
	"stats": {
		"numSamples": 712719314,
		"numSeries": 1769555,
		"numChunks": 5999569
	},
	"compaction": {
		"level": 1,
		"sources": [
			"01E5DBQZGNSP528AH1TW2499H4"
		]
	},
	"version": 1,
	"thanos": {
		"labels": {
			"region": "sjc2",
                        "monitor": "sjc2"
		},
		"downsample": {
			"resolution": 0
		},
		"source": "sidecar"
	}
}

Once the files are uploaded, AWS doesn’t allow us to edit the metadata. If we need to update the metadata file, we need to download the meta.json file from S3 and edit it and upload it.

My labels were not unique in all the blocks, as I was testing Prometheus and Thanos in the first 3 months, the labels were not set properly, so I need to update the labels in all the blocks to make it unique. If the labels are not unique across all the blocks, then that’s compactions will not work properly, It is fine to edit one or two metadata files manually, but when we have 100 or 1000 blocks than its difficult to do it manually. so I had to create a script to automate updating the labels.

Steps to update the Labels in meta.json

First, we need to have the AWS CLI installed in the local system and it should have access to the S3 data. AWS CLI installation details can be found in https://docs.aws.amazon.com/cli/latest/userguide/install-cliv2-linux.html

only Download the meta.json file from the S3 storage using the below-mentioned command

aws s3 cp s3://prometheus-os/01E5DBQZGNSP678AH4TW2499H4 /tmp/ --recursive --exclude "*" --include "meta.json"

as the meta.json is in JSON format, I used the shell jq command to replace the labels.
You can find the jq installation details in https://stedolan.github.io/jq/

jq reads the JSON file like a tree. Each element has to be dot-separated. The below-mentioned jq command first deletes the monitor label. and the subsequent jq commands set the replica label value to prometheus1a and region label value to sjc2. if the Label already exists the command sets the new value. if the label doesn’t exist then also it creates the label and sets the value.

jq 'del(.thanos.labels.monitor)' /tmp/meta.json | jq '.thanos.labels.replica = "prometheus1a"' | jq '.thanos.labels.region = "sjc2"'

Then upload the meta.json file back to S3

aws s3 cp /tmp/meta.json s3://prometheus-os/01E5DBQZGNSP678AH4TW2499H4/meta.json

Create a shell script to do the above steps for all your object meta.json files.

Leave a Reply