Write a method that determines if a given data set represents a valid UTF-8 encoding.Prototype: def validUTF8(data)Return: True if data is a valid UTF-8 encoding, else return FalseA character in UTF-8 can be 1 to 4 bytes longThe data set can contain multiple charactersThe data will be represented by a list of integersEach integer represents 1 byte of data, therefore you only need to handle the 8 least significant bits of each integer
Question
Write a method that determines if a given data set represents a valid UTF-8 encoding.Prototype: def validUTF8(data)Return: True if data is a valid UTF-8 encoding, else return FalseA character in UTF-8 can be 1 to 4 bytes longThe data set can contain multiple charactersThe data will be represented by a list of integersEach integer represents 1 byte of data, therefore you only need to handle the 8 least significant bits of each integer
Solution
Here is a Python solution for the problem:
def validUTF8(data):
count = 0
for num in data:
if count == 0:
if (num >> 5) == 0b110:
count = 1
elif (num >> 4) == 0b1110:
count = 2
elif (num >> 3) == 0b11110:
count = 3
elif (num >> 7):
return False
else:
if (num >> 6) != 0b10:
return False
count -= 1
return count == 0
This function works by iterating over each byte in the data. If the byte is the first byte of a new character, it determines how many bytes the character will take up by checking the most significant bits. If the byte is not the first byte of a character, it checks that the most significant bits are 10, which is required for all non-first bytes in a character. If any of these checks fail, the function returns False. If the function finishes checking all bytes and there are no remaining bytes that were expected (i.e., count == 0), it returns True.
Similar Questions
determines if a given data set represents a valid UTF-8 encoding
How many characters or symbols can be represented by 8-bit code
Check whether the given character is an alphabet or a numeric character or special characterInput Format:Enter a Character as inputOutput Format:Print the output as "NUMBER" or "ALPHABET" or "SPECIAL CHARACTER"Constraints:0 <= CHARACTER <= 2^7Sample Input 1:7Sample Output 1:NUMBERSample Input 2:aSample Output 2:ALPHABET
Question No. 1: (Marks 10) Suppose a Station A transmits a message containing five characters to Station B using Even Parity encoding to ensure data integrity during transmission. Each character consists of 8 bits encoded in the ASCII. Station B receives the message of five 9-bit data elements and needs to extract the 8-bit data message from each element and decode it using ASCII to form meaningful words. The data elements received by Station B are: Received 9-bit Data Elements: 010101100 010101010 010011001 010011010 010100110 Assuming the data is received correctly without any errors by Station B, you are required to do the following tasks: 1. Extract 8-bit Data Message: Remove the Parity bits to extract the 8-bit data message from each 9-bit received element. 2. Decode and Extract using ASCII: Convert each 8-bit data message into its corresponding ASCII character to extract the meaningful words. Give your answer in the following table: Received Message 010101100 010101010 010011001 010011010 010100110 Data message (Bits)? Decoded Message?
Consider the following sentence (you don’t need to consider the spaces betweencharacters).“MADAM I AM ADAM”1. Assume that 8-bit extended ASCII encoding is used to represent the above sentence.Find out how many bytes are required to represent the above sentence.2. What is the minimum/optimum number of bits needed to represent the abovesentence?3. Calculate the number of bits that are necessary to represent the above sentence usingHuffman encoding.4. Using Huffman encoding how would you represent the following sequence ofcharacters? “DAMMA”
Upgrade your grade with Knowee
Get personalized homework help. Review tough concepts in more detail, or go deeper into your topic by exploring other relevant questions.