In: Computer Science
We ran across this one in the OECD healthcare data. The country names had numbers appended, which served as footnotes in the original spreadsheet but looked dumb when we used them as index labels. The question is how to eliminate them. A short version of the country names is
`names = ['Australia 1', 'Canada 2', 'Chile 3', 'United States 1']`
Do each of these in a separate code cell:
1. Set `us = names[-1]` and call the `rsplit()` method on us.
What do you get?
2. Consult the documentation for `rsplit` to split `us` into two
pieces, the country name and the number 1. How would you extract
just the country name?
3. Use a loop to strip the numbers from all of the elements of
`names`.
4. Use a list comprehension to strip the numbers from all of the
elements of `names`.
Python Code:
#Defining dataset
names = ['Australia 1', 'Canada 2', 'Chile 3', 'United States
1'];
1. us = names[-1]
This will return: 'United States 1'
and if we call rsplit() on us we get : ['United', 'States', '1']
2. If we go through the documentation rsplit takes two arguments the delimiter and number of strings we want to get.
rsplit(sep=None, maxsplit=-1)
In order to get the country and number we will have to set sep=" " and maxsplit=1 i.e. us.rsplit(" ",1) and this will return "['United States', '1']"
To extract just the country name using us.rsplit(" ",1)[0].
3. numbers = []
for country in names:
print()
numbers.append(country.rsplit(" ",1)[1])
output: ['1', '2', '3', '1']
4. [country.rsplit(" ",1)[1] for country in names]
Complete code:
#Defining dataset
names = ['Australia 1', 'Canada 2', 'Chile 3', 'United States
1'];
#1.Set `us = names[-1]` and call the `rsplit()` method on us.
What do you get?
us = names[-1]
us.rsplit()
#2. Consult the documentation for `rsplit` to split `us` into
two pieces, the country name and the number 1.
#How would you extract just the country name?
#to split `us` into two pieces
us.rsplit(" ",1)
#extract just the country name
us.rsplit(" ",1)[0]
#3. Use a loop to strip the numbers from all of the elements of
`names`.
numbers = []
for country in names:
print()
numbers.append(country.rsplit(" ",1)[1])
#4. Use a list comprehension to strip the numbers from all of
the elements of `names`.
numbers_LC = [country.rsplit(" ",1)[1] for country in names]