In this step you should define a function read_data() that reads words from a text file into your lexicon.This function should take two arguments, the lexicon that words should be loaded into, and the name of the file to load.From the assignment brief, the criteria for a word is:A word is obtained as a sequence of characters separated by whitespaceWords should be stored in lowercaseAny numbers or punctuation characters should be removedUnique words should only be stored once in the lexicon (You can use the frequency of the word to show that a word appears twice in the files)Hint:Below is an example of how you can remove all punctiation and numeric characters from a string:my_str = 'H1-ell.0o'# Remove all punctuation and numeric charactersmy_str = ''.join([ch for ch in my_str if ch.isalpha()])print(my_str)>>> Hello
Question
In this step you should define a function read_data() that reads words from a text file into your lexicon.This function should take two arguments, the lexicon that words should be loaded into, and the name of the file to load.From the assignment brief, the criteria for a word is:A word is obtained as a sequence of characters separated by whitespaceWords should be stored in lowercaseAny numbers or punctuation characters should be removedUnique words should only be stored once in the lexicon (You can use the frequency of the word to show that a word appears twice in the files)Hint:Below is an example of how you can remove all punctiation and numeric characters from a string:my_str = 'H1-ell.0o'# Remove all punctuation and numeric charactersmy_str = ''.join([ch for ch in my_str if ch.isalpha()])print(my_str)>>> Hello
Solution 1
Sure, here is a Python function that meets the criteria you described:
def read_data(lexicon, filename):
with open(filename, 'r') as file:
for line in file:
for word in line.split():
# Remove punctuation and numbers
word = ''.join([ch for ch in word if ch.isalpha()])
# Convert to lowercase
word = word.lower()
# Add to lexicon if not already present
if word not in lexicon:
lexicon[word] = 1
else:
lexicon[word] += 1
This function opens the file with the given filename, reads it line by line, splits each line into words, removes punctuation and numbers from each word, converts each word to lowercase, and adds each unique word to the lexicon with a count of how many times it appears in the file. If a word is already in the lexicon, it increments the count for that word.
Solution 2
Sure, here is a Python function that meets the criteria you described:
import string
def read_data(lexicon, filename):
with open(filename, 'r') as file:
for line in file:
# Remove punctuation and numbers
line = ''.join(ch for ch in line if ch.isalpha() or ch.isspace())
# Convert to lowercase and split into words
words = line.lower().split()
# Add words to lexicon
for word in words:
if word not in lexicon:
lexicon[word] = 1
else:
lexicon[word] += 1
This function opens the file with the given filename, reads it line by line, removes punctuation and numbers from each line, converts the line to lowercase, splits it into words, and adds each word to the lexicon. If a word is already in the lexicon, it increments the count for that word.
Similar Questions
This task involves creating a lexicon that can store your collection of Word objects, and populating it with words loaded from two text files (in1.txt and in2.txt).More details on each of these steps is provided below, however in short, for this task you should:Create and initialize a variable that represents your lexicon.Define a function read_data() that can read words from a text file into your lexicon (Keeping in mind the constraints from the assignment brief).Call the read_data() function to load all words from the text files in1.txt and in2.txt into your lexicon.Step 1:It is up to you to decide on how you want to represent your lexicon. It may be useful to refer back to the labs to see how we represented collections of Person objects for the sorting algorithms that you implemented.Step 2:In this step you should define a function read_data() that reads words from a text file into your lexicon.This function should take two arguments, the lexicon that words should be loaded into, and the name of the file to load.From the assignment brief, the criteria for a word is:A word is obtained as a sequence of characters separated by whitespaceWords should be stored in lowercaseAny numbers or punctuation characters should be removedUnique words should only be stored once in the lexicon (You can use the frequency of the word to show that a word appears twice in the files)Hint:Below is an example of how you can remove all punctiation and numeric characters from a string:my_str = 'H1-ell.0o'# Remove all punctuation and numeric charactersmy_str = ''.join([ch for ch in my_str if ch.isalpha()])print(my_str)>>> HelloDo not be concerned with initializing the neighbours per-word. This will be done in Task 4.Step 3.In this step, you should make two calls to your read_data() function to load both in1.txt and in2.txt into your lexicon. When calling these functions, you can hardcode the names of these files.
From the assignment brief, along with the spelling of the word, a word in your lexicon needs to store the following information:The frequency: How many times the word appears in the input files.The list of neighbours: A neighbour of a word w is a word that is of the same length and differs from w by only one letter.The best way this can be implemented is as a Word class, where you can populate your lexicon with Word objects.Your task in this section is to create a class Word that is used to represent a Word in your lexicon.When you do this, you will need to:Think about what instance variables should be defined (and how they should be initialized)Think about what methods you need to implement for this classIn a future task, you will need to sort your lexicon full of Word objects. In the labs you saw a similar example where you needed to sort a collection of Person objects. It may be useful to refer back to this to see what methods were required.You may find that once you attempt the following tasks, you need to come back to this class and add additional methods.
Define a function named write_words_start_vowel(filename, names) which takes a filename and a list of words as parameters. The function should write the words that start with a vowel from the list to the file specified by the filename parameter. Each line contains one word only. Note:Remember to close the file.You can assume that the parameter list is not empty.The function needs to handle both uppercase and lowercase letters.The print_contents() function is used for marking purposes. You can assume that it is given in CodeRunner. Do not provide its implementation.For example:Test Resultwrite_words_start_vowel('bdon483.txt', ['life', 'is', 'a', 'long', 'journey', 'with', 'problems', 'to', 'solve'])print_contents('bdon483.txt')
Open the file romeo.txt and read it line by line. For each line, split the line into a list of words using the split() method. The program should build a list of words. For each word on each line check to see if the word is already in the list and if not append it to the list. When the program completes, sort and print the resulting words in python sort() order as shown in the desired output.
you can read a single word at a time from a file usingSelect one:a.fgetc functionb.fscanf functionc.Both A and Bd.None of above
Upgrade your grade with Knowee
Get personalized homework help. Review tough concepts in more detail, or go deeper into your topic by exploring other relevant questions.