Python 实现统计字数

问题

在 Python 中实施函数 count_words(),该函数将字符串 s 和数字 n 用作输入,并返回 sn 个出现频率最高的单词。返回值应该是一个元组列表 - 出现频率最高的 n 个单词及其相应的出现次数 [(, ), (, ), ...],按出现次数的降序排列。
您可以假设所有输入都是小写形式,并且不含标点符号或其他字符(只包含字母和单个分隔空格)。如果出现次数相同,则按字母顺序排列出现次数相同的单词。

例如

1
print count_words("betty bought a bit of butter but the butter was bitter",3)

Output:

1
[('butter', 2), ('a', 1), ('betty', 1)]

实现

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
"""Count words."""
import string
def count_words(s, n):
"""Return the n most frequently occuring words in s."""
hist = {}
for x in s.split():
x = x.strip(string.punctuation + string.whitespace)
x = x.lower()
hist[x] = hist.get(x, 0) + 1
# TODO: Count the number of occurences of each word in s
t = []
for key, value in hist.items():
t.append((key, value))
# TODO: Sort the occurences in descending order (alphabetically in case of ties)
t.sort()
t = sorted(t,key=lambda i:i[1],reverse=True)
# TODO: Return the top n words as a list of tuples (<word>, <count>)
top_n = t[0:n]
return top_n


def test_run():
"""Test count_words() with some inputs."""
print count_words("cat bat mat cat bat cat", 3)
print count_words("betty bought a bit of butter but the butter was bitter", 3)


if __name__ == '__main__':
test_run()
文章目录
  1. 1. 问题
  2. 2. 例如
  3. 3. 实现

20160607-python-1/

本页二维码