统计列表重复项

提问

2008/12/11 卢熙 <[email protected]>

  • 要到达以下的效果:

    alist = ['aaa', 'ccc', 'bbb', 'aaa', 'aaa', 'ccc']
    adict = fn(alist)
    print {'aaa': 3, 'bbb': 1, 'ccc': 2}
  • 在实际应用中,len(alist)很有可能超过10万,请问这个fn函数该如何写才能非常高效的完成这个任务?

方案1:for

萧萧 <[email protected]>
reply-to        [email protected]
to      [email protected]
date    Thu, Dec 11, 2008 at 22:51
subject [CPyUG:73576] Re: 如何高效的统计列表里面的重复项

>>> alist = ['aaa', 'ccc', 'bbb', 'aaa', 'aaa', 'ccc']
>>> adict = {}
>>> for i in alist:
...     try:
...             adict[i] += 1
...     except:
...             adict.setdefault(i, 1)
>>> adict

{'aaa': 3, 'bbb': 1, 'ccc': 2} ##endInc

方案2:count()

萧萧 <[email protected]>
reply-to        [email protected]
to      [email protected]
date    Fri, Dec 12, 2008 at 11:18

 alist = ['aaa', 'ccc', 'bbb', 'aaa', 'aaa', 'ccc']
 adict = dict([(i, alist.count(i) for i in list(set(alist))])

方案3:fromkeys()

don li <[email protected]>
reply-to        [email protected]
to      [email protected]
date    Fri, Dec 12, 2008 at 11:53

   1 alist = ['aaa', 'ccc', 'bbb', 'aaa', 'aaa', 'ccc']
   2 adict = dict().fromkeys(alist, 0)
   3 
   4 for a in alist:
   5     adict[a] += 1

方案4:.get()

   1 alist = ['aaa', 'ccc', 'bbb', 'aaa', 'aaa', 'ccc']
   2 adict = {}
   3 for e in alist:
   4    adict[e] = adict.get(e, 0) + 1

对比

[email protected]>
reply-to        [email protected]
to      python-cn`CPyUG`华蟒用户组 <[email protected]>
date    Sat, Dec 13, 2008 at 01:26
subject [CPyUG:73653] Re: 如何高效的统计列表里面的重复项

time python test2.py
865149

real    0m4.840s
user    0m4.610s
sys     0m0.210s


time python test3.py
865113

real    0m5.724s
user    0m5.490s
sys     0m0.220s
test2.py

   1 #!/usr/bin/env python
   2 
   3 import random
   4 
   5 li = []
   6 d = {}
   7 for i in range(10 ** 6 * 2):
   8    li.append(int(random.random() * 10 ** 6))
   9 
  10 for e in li:
  11    if d.has_key(e):
  12        d[e] = d[e] + 1
  13    else:
  14        d[e] = 1
  15 
  16 print len(d)
test3.py

   1 #!/usr/bin/env python
   2 import random
   3 
   4 li = []
   5 d = {}
   6 for i in range(10 ** 6 * 2):
   7    li.append(int(random.random() * 10 ** 6))
   8 
   9 for e in li:
  10    try:
  11        d[e] = d[e] + 1
  12    except:
  13        d[e] = 1
  14 
  15 print len(d)
[email protected]

reply-to        [email protected]
to      [email protected]
date    Sat, Dec 13, 2008 at 03:05
subject [CPyUG:73655] Re: 如何高效的统计列表里面的重复项

$ time python   test_dict_speed.py           
(5.0090830326080322, 9.3741579055786133)

real    0m33.376s
user    0m32.002s
sys     0m0.872s

$ cat test_dict_speed.py 

   1 
   2 =========================================================
   3 import random, time
   4 MAX = 10**6
   5 
   6 ls = [random.randint(1, MAX) for x in xrange(2*MAX)]
   7 
   8 t0 = time.time()
   9 
  10 d = {}
  11 for x in ls: d[x] = d.get(x, 0) + 1
  12 t1 = time.time()
  13 
  14 d = {}
  15 for e in ls:
  16        try: d[e] = d[e] + 1
  17        except: d[e] = 1
  18 t2 = time.time()
  19 
  20 d = {}
  21 for e in ls:
  22    try: d[e] += 1
  23    except: d.setdefault(e, 1)
  24 t3 = time.time()
  25 
  26 print (t1 - t0, t2 - t1, t3 - t2)
结果(运行了两次)

(1.5039999485015869, 2.1619999408721924, 2.2820000648498535)
(1.4950001239776611, 2.2029998302459717, 2.2360000610351562)

所耗时间排序一样,还是这个好一些:for x in ls: d[x] = d.get(x, 0) + 1

结论

[email protected]>
reply-to        [email protected]
to      python-cn`CPyUG`华蟒用户组 <[email protected]>
date    Sun, Dec 14, 2008 at 23:40
subject [CPyUG:73747] Re: 如何高效的统计列表里面的重复项

   1 import random, time
   2 MAX = 10**6
   3 
   4 ls = [random.randint(1, MAX) for x in xrange(2*MAX)]
   5 
   6 t0 = time.time()
   7 
   8 d = {}
   9 for e in ls: d[e] = d.get(e, 0) + 1
  10 t1 = time.time()
  11 
  12 d = {}
  13 for e in ls:
  14        try: d[e] = d[e] + 1
  15        except: d[e] = 1
  16 t2 = time.time()
  17 
  18 d = {}
  19 for e in ls:
  20    try: d[e] += 1
  21    except: d.setdefault(e, 1)
  22 t3 = time.time()
  23 
  24 from collections import defaultdict
  25 d = defaultdict(int)
  26 for e in ls:
  27   d[e] += 1
  28 t4 = time.time()
  29 
  30 print (t1 - t0, t2 - t1, t3 - t2, t4 - t3)
结果(运行了三次)

(1.3619999885559082, 2.187000036239624, 2.3610000610351562,
1.4879999160766602)
(1.3420000076293945, 2.1319999694824219, 2.2860000133514404,
1.4579999446868896)
(1.3270001411437988, 2.1959998607635498, 2.2860000133514404,
1.4579999446868896)

还是这个略胜一筹:for x in ls: d[x] = d.get(x, 0) + 1


反馈

创建 by -- ZoomQuiet [2008-12-12 01:33:16]

Name Password4deL ;) :( X-( B-)
viagra   vqerzlub nrwzwaox pinonwxa
2009-08-09 21:00:56
viagra   plovwece wjwggall aguadwrd
2009-08-09 22:26:12
kamagra   jsgntsag remoabdh daxvkvjz
2009-08-09 23:52:14
viagra   avmasmfo wofsmizo mnufkzjt
2009-08-10 01:15:40
viagra   qowpieug zuqferdg vihwdukf
2009-08-10 02:39:03
acheter viagra   pckqmtfs ystfhvly wxkzylzv
2009-08-14 15:10:47
viagra   ynhictef jceinwpj dbpyzhgi
2009-08-14 16:34:58
achat cialis   wmaueypa weipqqyh dpwtaswb
2009-08-14 17:59:11
viagra france   oqrkdzdw rrccboqm zwqgojqs
2009-08-14 19:22:25
cialis achat   bhlxoiqc nvgchffr ofmfxvra
2009-08-14 20:49:15
generique viagra   bzxqproj gclqtsxe lgahfank
2009-08-14 22:17:06
viagra   mxiqdvbj ylxmlsks oqktuqcw
2009-08-14 23:43:41
acheter cialis   ntftuimx zsgrzsjc ffexoylo
2009-08-15 01:09:54
cialis   zinobpsq fcssectk zequrfvx
2009-08-15 02:34:14
cialis   ogvtsiqt jspktkyj uxbhnblm
2009-08-15 03:58:33
cialis online   tkmzmwrc ipscatxe xekeppwn
2009-08-18 13:38:35
cialis   xsovscka vjhbmyjo zxaklkks
2009-08-18 15:16:44
cialis generika   bsgryppq bhwzguyw vxozbpys
2009-08-18 16:58:26
achat cialis generiq   iljeacxv ntnqyhuf medtujtj
2009-08-19 16:02:58
viagra generico   xngkjebz xrdpppfz qhfrvsgv
2009-08-21 07:25:32
acquisto cialis in f   iulgqgor jxubxccg iniuvqep
2009-08-21 13:24:40
compra cialis online   pctfumik kpbtwlab aiptxvms
2009-08-21 19:20:59
cialis   npqlmenu likzakim sxcylqgc
2009-08-22 01:19:07
acquisto cialis gene   alkukzbe jqahfofy yaxhqmtz
2009-08-22 05:45:28
cialis generico   frzxjyel rqqlnpic oybloicb
2009-08-22 11:44:57
cialis generique   jxtwwuak nxltuosj qybkccit
2009-08-23 00:15:05
achat cialis generiq   tsqezmlg gewxopgc ptxecnuo
2009-08-23 02:02:29
cialis generique   osoiqnrp massodpv vyadjqjb
2009-08-23 03:46:19
acquisto cialis senz   jmgdcqyx dmyptwta dkesbnds
2009-08-29 03:42:07
achat cialis   ooppjsyq gystgsrb fofstthz
2009-08-29 05:21:46
viagra pfizer   zjhykbol uaddbrjv hhinisip
2009-08-29 06:58:29
comprare viagra in f   tfiaoggp rimagnom jqqofkdh
2009-08-29 08:37:38
cialis   enyoiylk ztlfttkl agebbnka
2009-08-29 10:18:21
acheter cialis en li   cyecrido mzcotujk tcijehpv
2009-08-29 11:58:55
achat cialis sur int   cypvoqxq ytzqimtv hkykhutx
2009-08-29 13:36:06
cialis generico   odnthqzz xjyjykan ocqbmogm
2009-08-29 15:12:57
acheter cialis en li   ezoinkyn dlbdfkhy qedockdh
2009-08-29 16:51:50
acquistare viagra se   yasyqlsz gqtchtse bcatyoqg
2009-08-29 18:29:52

PageCommentData