tag:blogger.com,1999:blog-69626905163963256682024-02-07T01:23:50.462-08:00Read The Fine Sourcebangerleehttp://www.blogger.com/profile/00090060391197685879noreply@blogger.comBlogger20125tag:blogger.com,1999:blog-6962690516396325668.post-54759582982072586222013-01-30T00:37:00.005-08:002013-01-30T00:51:46.426-08:00glibc内存管理ptmalloc2瓶颈分析之前遇到一个glibc-2.4内存管理的性能问题,这里拿来分析一下。<br />
<br />
问题现象是业务程序响应很慢,cpu占用升高。最初怀疑是程序代码逻辑的问题,为确认这一点,首先使用oprofile进行采样分析。<br />
<br />
oprofile可以以进程、动态库、内核模块、内核为分析对象,统计cpu具体消耗在什么地方。<br />
<br />
使用oprofile进行抓取后,看到并不是上层程序问题,而是glibc在消耗cpu,安装对应glibc版本的debuginfo包之后,看到具体消耗cpu的是以下这一段代码:<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg2jKIbI2-E4D0TqOcZL89m3Gco-9vKeFAPFmgt31Cvg_rnidFmEkW75b2w8NGskWr9hcXEwQTUAULHm8kPwaw5zk4lGXROB_JZZNrGDqe9jCN0P-9hY4S6fX4kpVCL-ZJ1Qdv3MXg_FbU/s1600/%E5%B1%8F%E5%B9%95%E5%BF%AB%E7%85%A7+2013-01-20+%E4%B8%8A%E5%8D%8810.28.41.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg2jKIbI2-E4D0TqOcZL89m3Gco-9vKeFAPFmgt31Cvg_rnidFmEkW75b2w8NGskWr9hcXEwQTUAULHm8kPwaw5zk4lGXROB_JZZNrGDqe9jCN0P-9hY4S6fX4kpVCL-ZJ1Qdv3MXg_FbU/s1600/%E5%B1%8F%E5%B9%95%E5%BF%AB%E7%85%A7+2013-01-20+%E4%B8%8A%E5%8D%8810.28.41.png" /></a></div>
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
为什么以上一段while语句消耗大量cpu?结合上层程序代码和glibc中ptmalloc2管理内存的机制,有以下分析。<br />
<br />
首先ptmalloc2是以以下方式管理内存的:<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhGnVZ3dQIEgJhVQ79D_Kj-tvrHSpKKt5croSb9_PKEiv4zN_s_tvryQ2bMO2MFSeXTPEXbNgjAgzETbbdADNRmoWxfzOzeUkjSJY6s8McFbyHFschWzmNh8i1-3vTDStdoPpmwbvXa3xU/s1600/%E5%B1%8F%E5%B9%95%E5%BF%AB%E7%85%A7+2013-01-20+%E4%B8%8A%E5%8D%8810.35.02.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhGnVZ3dQIEgJhVQ79D_Kj-tvrHSpKKt5croSb9_PKEiv4zN_s_tvryQ2bMO2MFSeXTPEXbNgjAgzETbbdADNRmoWxfzOzeUkjSJY6s8McFbyHFschWzmNh8i1-3vTDStdoPpmwbvXa3xU/s1600/%E5%B1%8F%E5%B9%95%E5%BF%AB%E7%85%A7+2013-01-20+%E4%B8%8A%E5%8D%8810.35.02.png" /></a></div>
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
其中chunk表示一块内存,small bins存储16~512 bytes的内存块链表,相邻的bin之间相差8bytes,large bins存储大于512bytes的不规则大小内存块链表,unsorted bin存放调用free/delete进行内存释放后,未归入small bins和large bins的内存块。<br />
<br />
small bins下各链表的内存块大小固定,如以上16下面连接的内存块均为16bytes;而large bins下各链表的内存块大小在一个范围内,如576下面连接的内存块可以为 [576, 640) 大小的内存块,并且从大到小排好序。<br />
<br />
调用free/delete之后,内存并不是直接还给操作系统,而归还由ptmalloc2管理。对于属于large bins的free chunk,将放入适合的bin,并且每一个large bin下形成由大到小的有序链表。<br />
<br />
以下结构用于表示一个free chunk:<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhpBLZPqCLuflbXmLMJMz1PshoHD4R0mCBLXA1TN8Hhb4a6q-U6YFlfS-U5PGN8qoWV0J4IUaa6ydlPWl59OlJ2ow2Mm8KCbzNXgG8qf-xiuVxuHLwL8zadoRlHaWmgDC5lflVNS-cwAkE/s1600/%E5%B1%8F%E5%B9%95%E5%BF%AB%E7%85%A7+2013-01-30+%E4%B8%8B%E5%8D%884.08.17.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhpBLZPqCLuflbXmLMJMz1PshoHD4R0mCBLXA1TN8Hhb4a6q-U6YFlfS-U5PGN8qoWV0J4IUaa6ydlPWl59OlJ2ow2Mm8KCbzNXgG8qf-xiuVxuHLwL8zadoRlHaWmgDC5lflVNS-cwAkE/s1600/%E5%B1%8F%E5%B9%95%E5%BF%AB%E7%85%A7+2013-01-30+%E4%B8%8B%E5%8D%884.08.17.png" /></a></div>
<br />
<br />
<br />
<br />
<br />
<br />
<br />
接下来调用malloc/new申请一个大于512bytes的内存块时,将采用best-fit的方式,通过fd指针遍历large bins下某一条链表,查找方法正如以上代码所示。假设有N块内存,那么搜寻best-fit内存块的时间复杂度就是O(N),N值很大时,这将使得后续的malloc/new大于512bytes的调用非常慢。<br />
<br />
<br />
<div style="font: 16.0px STSong; margin: 0.0px 0.0px 0.0px 0.0px;">
为缓解这个问题,glibc-2.6释出一个patch,在表示内存块的chunk结构中,除了原连接chunk的双向链表指针fd和bk,增加了fd_nextsize和bk_nextsize指针,用于将size相同的chunk组成双向链表:</div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEixB6d_hUqUobJPBf_TjmLPUfcMF_7JT00sb4abAu3QmZPt3TQNkIGcdHIpN77g9FKVY3Gb78uKghGRw77AR2uNcoD9CyJfB86brCeMB5s_hoe5_U7fdXW18fWLqgfEKAXTfKiCijhBVHU/s1600/%E5%B1%8F%E5%B9%95%E5%BF%AB%E7%85%A7+2013-01-30+%E4%B8%8B%E5%8D%884.16.00.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEixB6d_hUqUobJPBf_TjmLPUfcMF_7JT00sb4abAu3QmZPt3TQNkIGcdHIpN77g9FKVY3Gb78uKghGRw77AR2uNcoD9CyJfB86brCeMB5s_hoe5_U7fdXW18fWLqgfEKAXTfKiCijhBVHU/s1600/%E5%B1%8F%E5%B9%95%E5%BF%AB%E7%85%A7+2013-01-30+%E4%B8%8B%E5%8D%884.16.00.png" /></a></div>
<div style="font: 16.0px STSong; margin: 0.0px 0.0px 0.0px 0.0px;">
<br /></div>
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<div style="font: 16.0px STSong; margin: 0.0px 0.0px 0.0px 0.0px;">
当在large bins中查找某大小的chunk时,通过fd_nextsiz遍历链表,而不是fd,这样就跳过了相同大小的chunk,减少无谓遍历:</div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj9KuiCAnEFSNaZoddSkA9lybIo9Eqoko7T69mJQVd9zeMWZ_DwzWAt-OuQPkK9VL3gAvdKRqUr1cOzthK5V6P1rZJBiwr_B3NCbxaDyYTk7gFgOXtNDQMLYdQ1lZ1S4Zd-HRqjZYDPqDs/s1600/%E5%B1%8F%E5%B9%95%E5%BF%AB%E7%85%A7+2013-01-30+%E4%B8%8B%E5%8D%884.25.06.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj9KuiCAnEFSNaZoddSkA9lybIo9Eqoko7T69mJQVd9zeMWZ_DwzWAt-OuQPkK9VL3gAvdKRqUr1cOzthK5V6P1rZJBiwr_B3NCbxaDyYTk7gFgOXtNDQMLYdQ1lZ1S4Zd-HRqjZYDPqDs/s1600/%E5%B1%8F%E5%B9%95%E5%BF%AB%E7%85%A7+2013-01-30+%E4%B8%8B%E5%8D%884.25.06.png" /></a></div>
<div style="font: 16.0px STSong; margin: 0.0px 0.0px 0.0px 0.0px;">
<br /></div>
<br />
<br />
<br />
<br />
<br />
<div style="font: 16.0px STSong; margin: 0.0px 0.0px 0.0px 0.0px;">
当然问题并没有真正解决,以上方法在有相当多同样size的chunk时奏效,有大量size不等的大于512bytes的chunk时,同样会有malloc调用返回慢问题。</div>
<div style="font: 16.0px STSong; margin: 0.0px 0.0px 0.0px 0.0px;">
<br /></div>
<div style="font: 16.0px STSong; margin: 0.0px 0.0px 0.0px 0.0px;">
另struct mallinfo和mallinfo调用可用于获取malloc信息,帮助我们分析ptmalloc2相关问题,其中struct mallinfo中的ordblks字段表示空闲内存块的数量,可以理解为以上的N值。</div>
<div style="font: 16.0px STSong; margin: 0.0px 0.0px 0.0px 0.0px;">
<br /></div>
<div style="font: 16.0px STSong; margin: 0.0px 0.0px 0.0px 0.0px;">
参考:<br />
<br /></div>
<div style="font: 16.0px STSong; margin: 0.0px 0.0px 0.0px 0.0px;">
问题描述:<a href="http://www.sourceware.org/bugzilla/show_bug.cgi?id=4349" target="_blank">_int_malloc extremely slow with ordblks free chunks</a></div>
<div style="font: 16.0px STSong; margin: 0.0px 0.0px 0.0px 0.0px;">
复现问题的代码:<a href="http://www.sourceware.org/bugzilla/attachment.cgi?id=1675" target="_blank">sizetest.cpp</a></div>
<div style="font: 16.0px STSong; margin: 0.0px 0.0px 0.0px 0.0px;">
glibc-2.6中针对该问题的patch:<a href="http://lists.debian.org/debian-glibc/2007/05/msg00420.html" target="_blank">serious performance regression in glibc-2.5 malloc</a></div>
<div style="font: 16.0px STSong; margin: 0.0px 0.0px 0.0px 0.0px;">
<br /></div>
<br />
Have fun!bangerleehttp://www.blogger.com/profile/00090060391197685879noreply@blogger.com0tag:blogger.com,1999:blog-6962690516396325668.post-63855014635073824262013-01-25T00:51:00.001-08:002013-01-25T00:51:59.056-08:00使用dumpmem显示进程内存空间中的内容面对一些内存占用率无端升高的问题,通过top等系统命令我们可以看到哪个进程消耗了内存。进一步地,我们想了解这些进程到底将什么内容载入内存,这时我们可以用到dumpmem这个工具。<br />
<br />
dumpmem使用c编写,底层使用ptrace实现,可以在某进程运行的情况下,在线地dump出该进程内存空间中的内容。<br />
<br />
每个进程的虚拟内存空间在内核中用mm_struct管理,其中每一个地址段用vm_area_struct管理,/proc/$/maps即显示进程的所有地址段:<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjr1DMCjIW4JkJL-uW_aMk25gTEOVJ_sR9sh1uFNi08Mf7980i62emuqGr_CW2SdEJ9Ch4Ib_Uyd00GM_6xD9U2aVBhLGiAvJYqcayc8gi2xQuF5bbg2L2ZQkMoi9nRHEM0_ZxqaxoWFgg/s1600/%E5%B1%8F%E5%B9%95%E5%BF%AB%E7%85%A7+2013-01-25+%E4%B8%8B%E5%8D%884.34.06.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjr1DMCjIW4JkJL-uW_aMk25gTEOVJ_sR9sh1uFNi08Mf7980i62emuqGr_CW2SdEJ9Ch4Ib_Uyd00GM_6xD9U2aVBhLGiAvJYqcayc8gi2xQuF5bbg2L2ZQkMoi9nRHEM0_ZxqaxoWFgg/s1600/%E5%B1%8F%E5%B9%95%E5%BF%AB%E7%85%A7+2013-01-25+%E4%B8%8B%E5%8D%884.34.06.png" /></a></div>
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
dumpmem即先通过/proc/$/maps文件,读取指定进程的所有虚拟内存地址段,记录于以下结构:<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhQ6Wcd36R3uuWmNwNgZcB8cthGetyiyy0NhptS_6qQL9c0DfwLooXOE7ibu_mFLRirps3MdgA0prSemo79ETA1XOO93EsnHFKxEGYTkZLm4Rd9_obN4AfsAwDvKioyqYysB4nbLfR2naI/s1600/%E5%B1%8F%E5%B9%95%E5%BF%AB%E7%85%A7+2013-01-25+%E4%B8%8B%E5%8D%884.36.35.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhQ6Wcd36R3uuWmNwNgZcB8cthGetyiyy0NhptS_6qQL9c0DfwLooXOE7ibu_mFLRirps3MdgA0prSemo79ETA1XOO93EsnHFKxEGYTkZLm4Rd9_obN4AfsAwDvKioyqYysB4nbLfR2naI/s1600/%E5%B1%8F%E5%B9%95%E5%BF%AB%E7%85%A7+2013-01-25+%E4%B8%8B%E5%8D%884.36.35.png" /></a></div>
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
最后通过传递PTRACE_PEEKDATA选项给 ptrace 调用,将内存地址中的内容dump到文件中:<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiI4SYPyXIrbLk-a-dghMakyCrslQlAY_Ih57GnY0STZCaLdikqFLP1EyDz08pBTLrd5tNmdEyilC_XagXM4OMm8RFI0qyhnW-yEnndn89b-blaTW0W72m2eyQ0yPC2eIVQ8zUynO7rA24/s1600/%E5%B1%8F%E5%B9%95%E5%BF%AB%E7%85%A7+2013-01-25+%E4%B8%8B%E5%8D%884.39.47.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiI4SYPyXIrbLk-a-dghMakyCrslQlAY_Ih57GnY0STZCaLdikqFLP1EyDz08pBTLrd5tNmdEyilC_XagXM4OMm8RFI0qyhnW-yEnndn89b-blaTW0W72m2eyQ0yPC2eIVQ8zUynO7rA24/s1600/%E5%B1%8F%E5%B9%95%E5%BF%AB%E7%85%A7+2013-01-25+%E4%B8%8B%E5%8D%884.39.47.png" /></a></div>
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
除PTRACE_PEEKDATA选项用于dump出进程内存内容外,PTRACE_ATTACH、PTRACE_TRACEME等选项可用于进程跟踪,在dump在线进程内存内容时用到。<br />
<br />
ptrace “跟踪” 指定进程的执行,在被跟踪进程执行系统调用或机器指令时,父进程可捕捉到相关信息,给查看被跟踪进程内存、寄存器信息,甚至修改相关内容、更改代码执行路径提供了条件。<br />
<br />
Have fun!bangerleehttp://www.blogger.com/profile/00090060391197685879noreply@blogger.com0tag:blogger.com,1999:blog-6962690516396325668.post-62391021251851006322013-01-10T18:58:00.001-08:002013-01-10T19:03:53.494-08:00自旋锁在用户态下的应用——谈folly库的MicroSpinLock自旋锁(如pthreads中的pthread_spin_lock)特点是线程一直占用cpu,直到其他线程退出临界区、设置锁释放标志。相比基于OS休眠和唤醒机制的互斥锁(如pthreads中的mutex),自旋锁更适用于临界区短、对cpu延时要求比较高的情况。<br />
<br />
而当临界区持续时间较长时,传统的自旋锁会有高cpu占用率的缺点;用户态下的进程又有可能被更高优先级的进程或内核线程抢占的可能,因而不能很准确地计算用户态程序临界区持续时间。<br />
<br />
那么有没有什么方法,既能在临界区短时,自旋锁像传统意义那样工作,在临界区持续时间较长时,让出cpu,降低cpu资源消耗呢?<br />
<br />
Fackbook folly库的MicroSpinLock就是根据以上需求改进而来的spin lock,下面我们看其具体实现。<br />
<br />
MicroSpinLock实现spin lock的核心是cas原语,cas过程是原子的,其依赖于硬件实现,x86上对应cmpxchg指令。cas原语的逻辑如下:<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgV_Bq_km6qDdxNzFjsd0X8f9f1WIUujL-f_jeVWNsByMRPe6kirmfY8Q-KtPLnY4RdC3WqfbE63Ak0vKuDY08A8ojfCizmw3XYFm0Zf0hhyphenhyphenS3yXMv97R1sLnXbv7ZWkUnwKP9th8eM1pg/s1600/%E5%B1%8F%E5%B9%95%E5%BF%AB%E7%85%A7+2013-01-11+%E4%B8%8A%E5%8D%889.29.44.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgV_Bq_km6qDdxNzFjsd0X8f9f1WIUujL-f_jeVWNsByMRPe6kirmfY8Q-KtPLnY4RdC3WqfbE63Ak0vKuDY08A8ojfCizmw3XYFm0Zf0hhyphenhyphenS3yXMv97R1sLnXbv7ZWkUnwKP9th8eM1pg/s1600/%E5%B1%8F%E5%B9%95%E5%BF%AB%E7%85%A7+2013-01-11+%E4%B8%8A%E5%8D%889.29.44.png" /></a></div>
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
以上addr是lock(可以是一个unsinged char变量)的起始地址,expected是当前进程认为lock起始地址头字节应该等于的值,newValue是当前进程将要赋给lock起始地址头字节的值。<br />
<br />
利用cas原语,我们可以设计实现spin lock的方法为:<br />
<br />
<ol>
<li>设定锁的两种状态enum{ FREE=0, LOCKED=1 }</li>
<li>当前进程调用cas(&lock, FREE, LOCKED)进行加锁(实际就是尝试改变lock头字节为1)</li>
<li>若lock头字节不为本进程预期的值0,表示有其他进程已占用锁,cas返回false</li>
<li>当前进程反复以上过程,直到case返回true,此时本进程加锁成功</li>
</ol>
<br />
由以上逻辑,有以下代码实现:<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi5YqDk5VjlKGk5o8DQgMbziKNXTYTYbSrCzWo7FanxWl6EadvSWZmeboFXlg-4LEbiRsI-mVRZsh18EUy4X8l6_1oq1n5D6CVXAVZH4FAMZeoQSsS_P0f6_V0KsC-rdmFetjCyeXr728o/s1600/%E5%B1%8F%E5%B9%95%E5%BF%AB%E7%85%A7+2013-01-11+%E4%B8%8A%E5%8D%8810.24.40.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi5YqDk5VjlKGk5o8DQgMbziKNXTYTYbSrCzWo7FanxWl6EadvSWZmeboFXlg-4LEbiRsI-mVRZsh18EUy4X8l6_1oq1n5D6CVXAVZH4FAMZeoQSsS_P0f6_V0KsC-rdmFetjCyeXr728o/s1600/%E5%B1%8F%E5%B9%95%E5%BF%AB%E7%85%A7+2013-01-11+%E4%B8%8A%E5%8D%8810.24.40.png" /></a></div>
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
有了基于cas原语实现的spin lock,我们可以进一步考虑如何在临界区持续时间较长的情况下,如何让spin lock让出cpu。简单地,我们引入一个计数器spinCount,每次尝试加锁时spinCount加1,当spinCount达到一定次数时调用sleep,让进程休眠以让出cpu,如是有以下代码实现:<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjH_sQktd1zCrP798rnVJIqQeNdzEl-JV39jmDG9XeCjL5vDkoal0i7EnGOSrX8cADWAImkZcVbeF-7m-fu5COLaW-zhtp7oOngQYAjw9PSn1xU5w-kiAKec63IcSEXweb9YmJYoYuD0LE/s1600/%E5%B1%8F%E5%B9%95%E5%BF%AB%E7%85%A7+2013-01-11+%E4%B8%8A%E5%8D%8810.31.34.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjH_sQktd1zCrP798rnVJIqQeNdzEl-JV39jmDG9XeCjL5vDkoal0i7EnGOSrX8cADWAImkZcVbeF-7m-fu5COLaW-zhtp7oOngQYAjw9PSn1xU5w-kiAKec63IcSEXweb9YmJYoYuD0LE/s1600/%E5%B1%8F%E5%B9%95%E5%BF%AB%E7%85%A7+2013-01-11+%E4%B8%8A%E5%8D%8810.31.34.png" /></a></div>
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
基于我们临界区短时使用spin lock的假设,多数是不会出现锁竞争情况的,因而很少会调用sleep进行休眠。我们通过添加一条语句,减少不必要的执行路径:<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiA90CeD3Fg4TDZb3vy0XR6HUj9wvjLsDDUGO273YANrFdIYj8dNxU7ob9o9ft2WHTxPND4x1t7VQ-ZZa7mT4savBNh3w0iS_9hzRTLuYI1l4RrGNIbYYSN9fFbNjE56X4nOrrB5f6MXTg/s1600/%E5%B1%8F%E5%B9%95%E5%BF%AB%E7%85%A7+2013-01-11+%E4%B8%8A%E5%8D%8810.47.13.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiA90CeD3Fg4TDZb3vy0XR6HUj9wvjLsDDUGO273YANrFdIYj8dNxU7ob9o9ft2WHTxPND4x1t7VQ-ZZa7mT4savBNh3w0iS_9hzRTLuYI1l4RrGNIbYYSN9fFbNjE56X4nOrrB5f6MXTg/s1600/%E5%B1%8F%E5%B9%95%E5%BF%AB%E7%85%A7+2013-01-11+%E4%B8%8A%E5%8D%8810.47.13.png" /></a></div>
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
又因为在锁竞争情况出现时,锁未释放的情况下没必要再进行一次加锁操作,因而代码可以修改为:<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiRbdSD4toMi9iHQf4CwRRjnXup7t8YHGs-O1Aegk6eCWWJRjFN54fDVdDQYW6Dfk9BGsBmI7S6epdDuftqG8_Mkw_oYd99tP9Cy4tGP7T4hEDri0dXCUe2FjA8-uqwiPZ3Ttfxyu5a3Cs/s1600/%E5%B1%8F%E5%B9%95%E5%BF%AB%E7%85%A7+2013-01-11+%E4%B8%8A%E5%8D%8810.50.45.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiRbdSD4toMi9iHQf4CwRRjnXup7t8YHGs-O1Aegk6eCWWJRjFN54fDVdDQYW6Dfk9BGsBmI7S6epdDuftqG8_Mkw_oYd99tP9Cy4tGP7T4hEDri0dXCUe2FjA8-uqwiPZ3Ttfxyu5a3Cs/s1600/%E5%B1%8F%E5%B9%95%E5%BF%AB%E7%85%A7+2013-01-11+%E4%B8%8A%E5%8D%8810.50.45.png" /></a></div>
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
以上方法属于锁的double check优化,在lock == FREE条件不达成的情况下,避免不必要的加锁开销。<br />
<br />
folly MicroSpinLock源码地址:<br />
<a href="https://github.com/facebook/folly/blob/master/folly/SmallLocks.h">https://github.com/facebook/folly/blob/master/folly/SmallLocks.h</a><br />
<br />
Have fun!bangerleehttp://www.blogger.com/profile/00090060391197685879noreply@blogger.com0tag:blogger.com,1999:blog-6962690516396325668.post-6735902846782615112013-01-05T04:18:00.002-08:002013-01-05T04:19:07.214-08:00基数估算算法LogLog一个数据集中不同数据的数量叫做基数(cradinality),现实中有很多计算基数的需求,如统计网站的UV,统计商城中某一件物品的独立ip点击量。<br />
<br />
使用bitmap可以精确地计算基数,并且利用位运算,可以方便地统计以小时、天、周为单位的数据集。但当数据量的范围十分庞大的时候,即使bitmap用一个bit标识一条数据,让要消耗比较多的内存(1亿条数据约需12M内存)。<br />
<br />
LogLog是一种估算基数的方法,其以基数统计精度为牺牲,换来很少的内存消耗(1亿条数据仅需1K内存)。LogLog得名于其基数统计内存消耗约为log2log2(N),算法描述如下:<br />
<br />
<ul>
<li>对数据集中的数据进行哈希</li>
<li>取哈希后的哈希值后K bit作为桶编号</li>
<li>计算除用作桶编号的后K bit,哈希值还有多少个0尾缀</li>
<li>最后该数据集中基数约为 2**X*num_buckets*0.79402(其中X=最大0尾缀数/num_buckets)</li>
</ul>
<br />
以上魔数0.79402是一个估算经验值,LogLog简单的算法实现如下:<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgVA0Ahva3YTQ_sHSPj1d7WYBh43adg3hlXCQPvqShsaj6IgqQ_3wRPMXkarqJUrfku1S8gqr-NpEB_L8PHkEGLogdygT3yLFWNBeLKitW2JtbCFqRtcfqtkNge4h1ldfBoyQPPAzsq-cU/s1600/%E5%B1%8F%E5%B9%95%E5%BF%AB%E7%85%A7+2013-01-05+%E4%B8%8B%E5%8D%888.14.09.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgVA0Ahva3YTQ_sHSPj1d7WYBh43adg3hlXCQPvqShsaj6IgqQ_3wRPMXkarqJUrfku1S8gqr-NpEB_L8PHkEGLogdygT3yLFWNBeLKitW2JtbCFqRtcfqtkNge4h1ldfBoyQPPAzsq-cU/s1600/%E5%B1%8F%E5%B9%95%E5%BF%AB%E7%85%A7+2013-01-05+%E4%B8%8B%E5%8D%888.14.09.png" /></a></div>
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
参考:<a href="http://blog.notdot.net/2012/09/Dam-Cool-Algorithms-Cardinality-Estimation" target="_blank">Damn Cool Algorithms: Cardinality Estimation</a><br />
<br />
<a href="http://www.ic.unicamp.br/~celio/peer2peer/math/bitmap-algorithms/durand03loglog.pdf" target="_blank">LogLog counting of large cardinalities</a><br />
<br />
Have fun!bangerleehttp://www.blogger.com/profile/00090060391197685879noreply@blogger.com0tag:blogger.com,1999:blog-6962690516396325668.post-21281294960303191022012-12-19T07:14:00.003-08:002012-12-19T07:25:48.972-08:00查找中国防火墙设备ip地址前两天看到一个用于查找GFC(Great Firewall of China)设备ip的python脚本<a href="https://github.com/mothran/mongol/blob/master/mongol.py" style="color: blue;">mongol.py</a>,觉得挺有意思,拿来研究了一下。<br />
<br />
这个脚本基于 <a href="http://pam2011.gatech.edu/papers/pam2011--Xu.pdf"><span class="Apple-style-span" style="color: blue;">Internet Censorship in China: Where Does the Filtering Occur?</span></a> 这篇论文,mongol.py基本就是该论文4.3节Algorithm的实现。除了防火墙设备ip查找算法,论文还阐述了以下内容:<br />
<br />
<ul>
<li>从现有实验结果看,有状态的连接(即已完成三次握手的连接)+ 敏感词 才会触发审查</li>
<li>对访问外国的流量审查严格,国内主要还是靠social control(如人工审查)</li>
<li>国内两大ISP,电信的审查设备主要设置在省区城域网,网通的主要设置在骨干网,因在省区也有审查设备,GFC其实也具备国内流量审查的能力</li>
<li>审查起作用后,链路被阻塞的状态会维持一段时间,这段时间内,即使后续的报文不包含敏感词,也会被阻塞</li>
</ul>
<br />
<br />
mongol.py接受一个ip参数,其完成以下工作:<br />
<br />
首先新建socket与指定ip 80端口进行连接,发送一条GET消息:<br />
GET / HTTP/1.1 \r\n<br />
Host: ip \r\n<br />
\r\n<br />
<br />
在connect调用返回前,三次握手已经完成。之后拿到response,判断响应状态码,如果是200 OK 或 302 Redirect 或 401 Unauthorized,则表明可与目的ip 80端口建立有效连接。<br />
<br />
然后对于有效连接,利用scapy进行ackattack(相当于traceroute),并记录本机到目的ip的中间router设备ip,注意所记录的router中可能有一个就是GFC设备,此时由于还没有发送敏感词,并未触发审查<br />
<br />
再之后重新新建一个socket进行目的ip连接,此时发送一条包含敏感词的GET消息:<br />
GET /tibetalk \r\n<br />
Host: ip \r\n<br />
\r\n<br />
<br />
如果发送后出现socket error,则说明GFC设备在该链路上向本机发送了RST报文(也会向目的ip发送),审查机制被触发<br />
<br />
最后再次进行ackattack,因为本机收到RST后,本机到目的ip的链路还会阻塞一段时间,这时即使是不包含敏感词的一个ack报文都会被阻塞,trace到的最后一个ip地址就是GFC设备的ip地址<br />
<br />
<br />
貌似直接traceroute实现不是基于tcp三次握手的,否则直接traceroute Facebook就可以找到防火墙服务器ip;另对于是否stateful的连接才会触发审查,还可以用netcat工具进行验证。<br />
<br />
以上提到的论文作者为查找全中国范围内的GFC设备,提到的一个方法也很有趣,利用<a href="http://www.gov.cn/zwgk/2008-04/23/content_952239.htm" style="color: blue;">中国政府网-部门地方链接</a>以及各种导航网站获取到全国各个地方的网站,以此作为检测工具的目的ip地址参数。<br />
<br />
Have fun!bangerleehttp://www.blogger.com/profile/00090060391197685879noreply@blogger.com1tag:blogger.com,1999:blog-6962690516396325668.post-47009397954783995452012-12-15T18:48:00.004-08:002012-12-15T19:10:11.189-08:00微信公众平台开放接口微信公众平台,是为有更多话语权的人设置的一个功能,这部分人或许是明星,或许是地产大佬,或许是某行业中知道更多内幕、小道消息的人。公众平台的推广口号虽说是每个人都有自己的品牌,但在这本已信息过载的时代,谁会专门设置一个通道,关注某个普通人生活中鸡毛蒜皮的那点事。<br />
<br />
本着折腾的精神看了下<a href="http://mp.weixin.qq.com/cgi-bin/callbackprofile?t=wxm-callbackapi&type=info&lang=zh_CN" style="color: blue;" target="_blank">公众平台的开放接口</a>,目前提供的接口就2个:<br />
<br />
<div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;">
</div>
<ul>
<li>网址接入公众平台合法性校验功能</li>
<li>普通微信用户消息回复功能</li>
</ul>
<br />
使用前先需要填一些信息,包括token、URL等:<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgk_kRxPBAEagYRmbKJn94B_JYaQRFDw7c06moGYenK2ZO3L06wHhY3mQ_DAVm2zYh4tlY73XfOs7qaH1me6X5fXzryHuaUGGtbB9nZUWWlXvrw1IBn-Ot3g6Afm4WEPhAeY_wRUDpNlsI/s1600/%E5%B1%8F%E5%B9%95%E5%BF%AB%E7%85%A7+2012-12-16+%E4%B8%8A%E5%8D%8810.17.22.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" height="238" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgk_kRxPBAEagYRmbKJn94B_JYaQRFDw7c06moGYenK2ZO3L06wHhY3mQ_DAVm2zYh4tlY73XfOs7qaH1me6X5fXzryHuaUGGtbB9nZUWWlXvrw1IBn-Ot3g6Afm4WEPhAeY_wRUDpNlsI/s400/%E5%B1%8F%E5%B9%95%E5%BF%AB%E7%85%A7+2012-12-16+%E4%B8%8A%E5%8D%8810.17.22.png" width="400" /></a></div>
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
对于以上第二个功能,普通微信用户向公众平台发送消息时,公众平台再以POST的方式向以上配置的URL发送信息,包含以下一些数据:<br />
<br />
<ul>
<li><b>文本消息</b>:包括文本消息内容、接受/发送方微信号等</li>
<li><b>地理位置消息</b>:包括地理位置经纬度等信息</li>
<li><b>图片消息</b>:包括图片链接等信息</li>
</ul>
<div>
我们部署在指定URL上的应用可以以POST方式回应文本、图文信息。</div>
<div>
<br /></div>
<div>
或许可以利用公众平台开放接口实现文本信息查询、基于地理位置的应用。</div>
<div>
<br /></div>
<div>
Have fun!</div>
bangerleehttp://www.blogger.com/profile/00090060391197685879noreply@blogger.com0tag:blogger.com,1999:blog-6962690516396325668.post-32992024174457987012012-12-14T20:44:00.002-08:002012-12-14T20:47:14.939-08:00新浪微博开放平台应用之登录授权在前文《<a href="http://bangerlee.blogspot.com/2012/12/blog-post.html" target="_blank"><span class="Apple-style-span" style="color: blue;">新浪微博开放平台应用之数据抓取</span></a>》中,我们学会了如何使用 <a href="http://open.weibo.com/wiki/Trends/statuses#JSON.E7.A4.BA.E4.BE.8B" target="_blank"><span class="Apple-style-span" style="color: blue;">trends/statuses</span></a> 接口抓取话题数据,相比 trends/statuses 接口,有些抓取数据的接口需要登录授权后才能调用,比如获取评论的接口 <a href="http://open.weibo.com/wiki/2/comments/show" target="_blank"><span class="Apple-style-span" style="color: blue;">2/comments/show</span></a>。下面我们就来看如何进行登录授权。<br />
<br />
完成登录需要用到 <a href="http://open.weibo.com/wiki/Oauth2/authorize" target="_blank"><span class="Apple-style-span" style="color: blue;">oauth2/authorize</span></a> 接口,其接收以下参数:<br />
<ul>
<li><b>client_id</b>: 所申请的app_key</li>
<li><b>response_type</b>: 返回数据类型,值可为code或state,code用于后续获取access_token</li>
<li><b>display</b>: 授权页面的终端类型,default指示游览器</li>
<li><b>redirect_uri</b>: 授权回调地址,需与开放平台中设置的回调地址一致</li>
</ul>
<br />
参数既可以以POST方式传送,也可以以GET方式发送,如下例子:<br />
https://<span class="Apple-style-span" style="font-family: arial; font-size: 14px; line-height: 23px;">api.weibo.com/oauth2/authorize?</span><span class="Apple-style-span" style="font-family: arial; font-size: 14px; line-height: 23px;"><b><span class="Apple-style-span" style="color: blue;">redirect_uri</span></b>=http://</span><span class="Apple-style-span" style="font-family: arial; font-size: 14px; line-height: 23px;">liuxiaofang.sinaapp.com/callback?url=/</span><span class="Apple-style-span" style="font-family: arial; font-size: 14px; line-height: 23px;">init-comments&</span><span class="Apple-style-span" style="font-family: arial; font-size: 14px; line-height: 23px;">ids=</span><span class="Apple-style-span" style="font-family: arial; font-size: 14px; line-height: 23px;">3522096338448283&</span><span class="Apple-style-span" style="font-family: arial; font-size: 14px; line-height: 23px;"><span class="Apple-style-span" style="color: blue;"><b>response_type</b></span>=code&<span class="Apple-style-span" style="color: blue;"><b>client_id</b></span>=622387540&<span class="Apple-style-span" style="color: blue;"><b>display</b></span>=default</span><br />
<br />
以上url以人为可读的方式显示,向应用服务器发送前还得经过编码(如使用python中的urllib.quote)。<br />
<br />
这里所说的回调地址,通过 应用页面 -> 接口管理 -> 授权设置 进行配置。<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgvpIJxn-X6zqmev-w8pKngZvue-SOjUy4pQ8kyLp7Zlms-drvFdd_GvfN3ex8sIORWV0B1ZZojlviDWLtVapgZdOh42cfh4p5FPoRjiZbawhxXNuXXWhyg6Kg0s5Iqws2ZTMNUKBQADuk/s1600/%E5%B1%8F%E5%B9%95%E5%BF%AB%E7%85%A7+2012-12-15+%E4%B8%8A%E5%8D%8811.35.13.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" height="202" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgvpIJxn-X6zqmev-w8pKngZvue-SOjUy4pQ8kyLp7Zlms-drvFdd_GvfN3ex8sIORWV0B1ZZojlviDWLtVapgZdOh42cfh4p5FPoRjiZbawhxXNuXXWhyg6Kg0s5Iqws2ZTMNUKBQADuk/s400/%E5%B1%8F%E5%B9%95%E5%BF%AB%E7%85%A7+2012-12-15+%E4%B8%8A%E5%8D%8811.35.13.png" width="400" /></a></div>
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
正确发送URL后,将进入以下登录界面:<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiimW09pDhv_aLepsA0tARAxMZpyIih4s-YkweE2-eN5PkpBsB0x7yfR6us5R9jZ3aUMnk8LE_Lt-BewLxBQxH39SCWFuGy88SlPCn8-LQKDf3n4cHzLiE8Qyrm6CkANilM2PhU3YradVc/s1600/%E5%B1%8F%E5%B9%95%E5%BF%AB%E7%85%A7+2012-12-13+%E4%B8%8B%E5%8D%8811.25.01.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" height="202" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiimW09pDhv_aLepsA0tARAxMZpyIih4s-YkweE2-eN5PkpBsB0x7yfR6us5R9jZ3aUMnk8LE_Lt-BewLxBQxH39SCWFuGy88SlPCn8-LQKDf3n4cHzLiE8Qyrm6CkANilM2PhU3YradVc/s400/%E5%B1%8F%E5%B9%95%E5%BF%AB%E7%85%A7+2012-12-13+%E4%B8%8B%E5%8D%8811.25.01.png" width="400" /></a></div>
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
成功登录后,将跳转到我们之前设定的 redirect_uri,并返回 code 值:<br />
<span class="Apple-style-span" style="font-family: arial; font-size: 14px; line-height: 23px;">http://</span><span class="Apple-style-span" style="font-family: arial; font-size: 14px; line-height: 23px;">liuxiaofang.sinaapp.com/callback?url=/</span><span class="Apple-style-span" style="font-family: arial; font-size: 14px; line-height: 23px;">init-comments&</span><span class="Apple-style-span" style="font-family: arial; font-size: 14px; line-height: 23px;">ids=</span><span class="Apple-style-span" style="font-family: arial; font-size: 14px; line-height: 23px;">3522096338448283&code=123456</span><br />
<span class="Apple-style-span" style="font-family: arial;"><span class="Apple-style-span" style="font-size: 14px; line-height: 23px;"><br /></span></span>
<span class="Apple-style-span" style="font-family: arial;"><span class="Apple-style-span" style="font-size: 14px; line-height: 23px;">有了code,我们就可以请求获取access_token,获取 access_token 的接口为 <a href="http://open.weibo.com/wiki/Oauth2/access_token" target="_blank"><span class="Apple-style-span" style="color: blue;">oauth2/access_token</span></a>,其接收以下参数:</span></span><br />
<ul>
<li><span class="Apple-style-span" style="font-family: arial;"><span class="Apple-style-span" style="font-size: 14px; line-height: 23px;"><b>grant_type</b>: 请求类型,对应与调用 </span></span><span class="Apple-style-span" style="font-family: arial; font-size: 14px; line-height: 23px;">authorize 获得的code,这里值应为 authorization_code</span></li>
<li><span class="Apple-style-span" style="font-family: arial; font-size: 14px; line-height: 23px;"><b>code</b></span><span class="Apple-style-span" style="font-family: arial; font-size: 14px; line-height: 23px;">: 以上获得的 code 值</span></li>
<li><span class="Apple-style-span" style="font-family: arial; font-size: 14px; line-height: 23px;"><b>client_id</b></span><span class="Apple-style-span" style="font-family: arial; font-size: 14px; line-height: 23px;">: 所申请的 app_key</span></li>
<li><span class="Apple-style-span" style="font-family: arial; font-size: 14px; line-height: 23px;"><b>client_secret</b></span><span class="Apple-style-span" style="font-family: arial; font-size: 14px; line-height: 23px;">: 所申请的 app_secret</span></li>
<li><span class="Apple-style-span" style="font-family: arial;"><span class="Apple-style-span" style="font-size: 14px; line-height: 23px;"><b>redirect_uri</b>: 回调地址,</span></span>需与开放平台中设置的回调地址一致</li>
</ul>
向 <span class="Apple-style-span" style="font-family: arial; font-size: 14px; line-height: 23px;">oauth2/access_token 传送参数,需用POST方式,如:</span><br />
<span class="Apple-style-span" style="font-family: arial; font-size: 14px; line-height: 23px;">https://api.weibo.com/</span><span class="Apple-style-span" style="font-family: arial; font-size: 14px; line-height: 23px;">oauth2/access_token</span><br />
<span class="Apple-style-span" style="font-family: arial; font-size: 14px; line-height: 23px;"><span class="Apple-style-span" style="color: blue;"><b>grant_type</b></span> = </span><span class="Apple-style-span" style="font-family: arial; font-size: 14px; line-height: 23px;">authorization_code</span><br />
<span class="Apple-style-span" style="font-family: arial; font-size: 14px; line-height: 23px;"><b><span class="Apple-style-span" style="color: blue;">client_id</span></b> = </span><span class="Apple-style-span" style="font-family: arial; font-size: 14px; line-height: 23px;">622387540</span><br />
<span class="Apple-style-span" style="font-family: arial;"><span class="Apple-style-span" style="font-size: 14px; line-height: 23px;"><b><span class="Apple-style-span" style="color: blue;">client_secret</span></b> = 123456</span></span><br />
<span class="Apple-style-span" style="font-family: arial;"><span class="Apple-style-span" style="font-size: 14px; line-height: 23px;"><b><span class="Apple-style-span" style="color: blue;">redirect_uri</span></b> = </span></span><span class="Apple-style-span" style="font-family: arial; font-size: 14px; line-height: 23px;">http://</span><span class="Apple-style-span" style="font-family: arial; font-size: 14px; line-height: 23px;">liuxiaofang.sinaapp.com/callback?</span><br />
<span class="Apple-style-span" style="font-family: arial;"><span class="Apple-style-span" style="font-size: 14px; line-height: 23px;"><b><span class="Apple-style-span" style="color: blue;">code</span></b> = 123456</span></span><br />
<br />
正确发送URL后,将跳转到类似以下授权页面:<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhGJaWie4iTViNwrQjODoPBjhHCBU4BcnkNAmt7j8S439ns4WRw2rMdBJt1c4TzZ17GcmcGrg84lZX5v5DNb38AdnBzqg1X80Ya06qjpuYITvLEkXjAuYRLvN369Jy1sV2hV7Hcd-_Nf5A/s1600/OAuth2_intro.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" height="231" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhGJaWie4iTViNwrQjODoPBjhHCBU4BcnkNAmt7j8S439ns4WRw2rMdBJt1c4TzZ17GcmcGrg84lZX5v5DNb38AdnBzqg1X80Ya06qjpuYITvLEkXjAuYRLvN369Jy1sV2hV7Hcd-_Nf5A/s400/OAuth2_intro.png" width="400" /></a></div>
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
完成授权后,将跳转到之前设定的回调地址,并且从 response 中,我们可以获取到 access_token 和 expires_in 超时值。<br />
<br />
有了 access_token,我们终于可以使用 2/comments/show 这类需要事先登录授权的接口了。下面通过GET方式获取指定微博id的评论,获取到的 access_token 放置在请求头中:<br />
<br />
https://api.weibo.com/2/comments/show?id=3522885349661782<br />
Authorization: OAuth2 123456<br />
<br />
之后新浪服务器将返回评论id、评论文本、评论创建时间、评论作者等信息。<br />
<br />
欢迎访问我的一个基于sae和新浪开放平台的网站 <b><a href="http://liuxiaofang.sinaapp.com/" target="_blank"><span class="Apple-style-span" style="color: blue;">带上猫咪去旅行</span></a></b><br />
<br />
Have fun!bangerleehttp://www.blogger.com/profile/00090060391197685879noreply@blogger.com1tag:blogger.com,1999:blog-6962690516396325668.post-8027323679639707062012-12-11T04:12:00.002-08:002012-12-11T04:14:32.106-08:00python web框架bottlebottle是一个python WSGI框架,简单的一个py文件,集成了router、redirect、template,request/response获取与设定等功能,下面介绍其基本使用方法。<br />
<br />
先import相关方法,并声明Bottle对象:<br />
<br />
from bottle import Bottle, jinja2_template as template, static_file, redirect, request, response, run<br />
app = Bottle()<br />
<br />
<b>Router</b><br />
利用python的decorator方法,可以声明多个URL对应一个处理函数:<br />
<br />
@app.get('/')<br />
@app.get('/index')<br />
def index():<br />
return 'Hello bottle!'<br />
<br />
<b>template</b><br />
template用于将后台代码与前端代码分离,增加后台代码重用:<br />
<br />
@app.get('/login')<br />
def login():<br />
return template("login.html", handler=get_site_info())<br />
<br />
@app.post('/login')<br />
def login_post():<br />
return UserService.login()<br />
<br />
<b>redirect</b><br />
bottle提供了redirect方法用于页面跳转,如登出后跳转到登录页面:<br />
<br />
@app.get('/log-out')<br />
def log_out():<br />
UserService.log_out()<br />
redirect('/login', 302)<br />
<br />
<b>request/response</b><br />
bottle提供了request和response对象,通过这两个对象,可方便地操作请求与响应数据:<br />
<br />
@app.get('/admin')<br />
def admin():<br />
_status = request.query.get('status', None)<br />
response.set_cookie('status', _status)<br />
<br />
<b>static_file</b><br />
网页包含html、js、图片等静态内容,处理这些静态内容的请求,我们不需要编写专门的router处理,只需要将静态内容放到一个文件夹下,利用如下一段代码,即可处理所有static文件请求:<br />
<br />
@app.get('/static/<filename:re:.*')<br />
def server_static_file(filename):<br />
return static_file(filename, root='./static/')<br />
<br />
最后,使用run方法让我们后台服务跑起来:<br />
run(app, host='localhost', port=8080)<br />
<br />
Have fun!bangerleehttp://www.blogger.com/profile/00090060391197685879noreply@blogger.com0tag:blogger.com,1999:blog-6962690516396325668.post-40806045947354406082012-12-10T06:20:00.000-08:002012-12-10T06:22:13.226-08:00社会化评论系统 ”多说“多说 是一个评论系统,其整合了新浪微博、豆瓣、人人等多个社交网站评论插件,原先孤立的站点文章、博文可以通过 多说 与社交网站关联起来,利用社交人气活跃站点。<br />
<br />
多说 是个开源的评论系统,使用起来也非常简单,先在多说官网进行注册,注册完成后将获得一段代码,将该段代码粘贴到网页代码<body></body>间任意位置,就可以使用多说评论系统。效果如下:<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgEoxOODUE8_vs3lfI2CD4R-mI8Z290kO0aZTFcHK6XlmDN4MuSkKadyd9zDIHfqGRwXN81ewX0JQAUWVqko7rpgsekkQnOWkF10CclBqFC_JJCx-pMTYtOCqKdmtOPpLQKOWt9wi1F7yc/s1600/%E5%B1%8F%E5%B9%95%E5%BF%AB%E7%85%A7+2012-12-10+%E4%B8%8B%E5%8D%8810.07.16.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" height="188" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgEoxOODUE8_vs3lfI2CD4R-mI8Z290kO0aZTFcHK6XlmDN4MuSkKadyd9zDIHfqGRwXN81ewX0JQAUWVqko7rpgsekkQnOWkF10CclBqFC_JJCx-pMTYtOCqKdmtOPpLQKOWt9wi1F7yc/s400/%E5%B1%8F%E5%B9%95%E5%BF%AB%E7%85%A7+2012-12-10+%E4%B8%8B%E5%8D%8810.07.16.png" width="400" /></a></div>
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
Have fun!bangerleehttp://www.blogger.com/profile/00090060391197685879noreply@blogger.com1tag:blogger.com,1999:blog-6962690516396325668.post-72890734441683611592012-12-09T07:02:00.001-08:002012-12-09T07:04:12.070-08:00新浪微博开放平台应用之数据抓取<div>
新浪微博开放平台为开发者提供了很多API,用于访问或修改各种数据,如微博、评论、话题、收藏、用户标签等。下面展示如何使用“话题”的API,对特定数据进行访问。</div>
<div>
<br /></div>
<div>
新浪微博中的话题,由##括起来,要访问话题数据,需用到 <a href="http://open.weibo.com/wiki/Trends/statuses#JSON.E7.A4.BA.E4.BE.8B" target="_blank"><span class="Apple-style-span" style="color: blue;">trends/statuses</span></a> 接口,其接受以下参数:</div>
<div>
<ul>
<li><b>source</b> : 所申请的app_key</li>
<li><b>trend_name</b> : 要抓取的话题</li>
<li><b>count</b> : 抓取条目的数量</li>
</ul>
</div>
<div>
<br /></div>
<div>
通过GET方式(或直接通过游览器),访问以下URL:</div>
<div>
http://api.t.sina.com.cn/trends/statuses.json?count=40&source=31641035&trend_name=带上猫咪去旅行</div>
<div>
<br /></div>
<div>
该URL指示获取最多40条,话题包含“带上猫咪去旅行”关键字的微博数据,访问该URL后,可获得以下形式的数据:</div>
<div>
<br /></div>
<div>
<div>
[{</div>
<div>
"created_at":"Fri Dec 07 23:01:47 +0800 2012",</div>
<div>
"id":3520736712709956,</div>
<div>
"<span class="Apple-style-span" style="color: blue;">text</span>":"#带上猫咪去旅行图站#低调内测上线 http://t.cn/zjJypQ5",</div>
<div>
"source":"<a href=\"http://weibo.com\" rel=\"nofollow\">新浪微博</a>",</div>
<div>
"<span class="Apple-style-span" style="color: blue;">thumbnail_pic</span>":"http://ww4.sinaimg.cn/thumbnail/66f77025gw1dzlk18f7r3j.jpg",</div>
<div>
"bmiddle_pic":"http://ww4.sinaimg.cn/bmiddle/66f77025gw1dzlk18f7r3j.jpg",</div>
<div>
"original_pic":"http://ww4.sinaimg.cn/large/66f77025gw1dzlk18f7r3j.jpg",</div>
<div>
"<span class="Apple-style-span" style="color: orange;">user</span>":</div>
<div>
{"id":1727492133,</div>
<div>
"screen_name":"bangerlee",</div>
<div>
"<span class="Apple-style-span" style="color: blue;">name</span>":"bangerlee",</div>
<div>
"province":"44",</div>
<div>
"city":"1",</div>
<div>
"location":"广东 广州",</div>
<div>
"gender":"m",</div>
<div>
"created_at":"Fri Apr 09 15:12:15 +0800 2010",</div>
<div>
}]</div>
</div>
<div>
<br /></div>
<div>
可以看到返回的微博数据包含了我们想要搜索的关键词<span class="Apple-style-span" style="color: blue;">#带上猫咪去旅行#</span>,另还有微博文字内容、微博图片ip、微博用户名等信息。</div>
<div>
<br /></div>
<div>
通过一个python小程序,我们可以实现数据抓取:</div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh3oaWylHb89e9xIksGpNidJXtuEfwzhZiAvG2urHiWLGzB9yOn1_rIn_ksS2wCWd04_8fPtXrwF5G11Z0BN0UFTDEHh_MPQFkz9lHfcaJv_6gsFExSxWNLSwhRE6rejIjBwTmXjODNuxQ/s1600/%E5%B1%8F%E5%B9%95%E5%BF%AB%E7%85%A7+2012-12-09+%E4%B8%8B%E5%8D%8810.59.25.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh3oaWylHb89e9xIksGpNidJXtuEfwzhZiAvG2urHiWLGzB9yOn1_rIn_ksS2wCWd04_8fPtXrwF5G11Z0BN0UFTDEHh_MPQFkz9lHfcaJv_6gsFExSxWNLSwhRE6rejIjBwTmXjODNuxQ/s1600/%E5%B1%8F%E5%B9%95%E5%BF%AB%E7%85%A7+2012-12-09+%E4%B8%8B%E5%8D%8810.59.25.png" /></a></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
运行以上程序有:</div>
<div>
<div>
linux # python get_data.py </div>
<div>
bangerlee</div>
<div>
#带上猫咪去旅行图站#低调内测上线 <a href="http://t.cn/zjJypQ5"><span class="Apple-style-span" style="color: blue;">http://t.cn/zjJypQ5</span></a></div>
<div>
http://ww4.sinaimg.cn/thumbnail/66f77025gw1dzlk18f7r3j.jpg</div>
</div>
<div>
<br /></div>
<div>
Have fun!</div>
bangerleehttp://www.blogger.com/profile/00090060391197685879noreply@blogger.com0tag:blogger.com,1999:blog-6962690516396325668.post-20381149395837540642012-11-27T08:22:00.004-08:002012-11-27T08:25:26.583-08:00文件系统缓存控制小工具vmtouch<a href="http://blog.chinaunix.net/uid-27119491-id-3306046.html" target="_blank"><span class="Apple-style-span" style="color: blue;">文件系统缓存</span></a>的一个作用是加快文件读取速度,vmtouch可用于管理文件系统缓存,是个有意思的小工具。vmtouch有以下功能:<br />
<ol>
<li>查询文件/目录有多少被载入缓存</li>
<li>将文件/目录载入缓存</li>
<li>将文件/目录从缓存中清除</li>
<li>锁定缓存</li>
</ol>
<br />
以下为vmtouch使用示例:<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh-n1Nt71d5Pff2MepZ9KM_NKHkRhupEDlV08s3HGiT2qxDmKihKrW1iSzo1M2ekWEQBT8GCzLZPc6InF1rpv6D8gXN6NfMzzjFXBF8tQoJPXHaEMW89l01VueYvf3Ez3a6pq8s8Et2dyM/s1600/%E5%B1%8F%E5%B9%95%E5%BF%AB%E7%85%A7+2012-11-27+%E4%B8%8B%E5%8D%8810.43.53.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh-n1Nt71d5Pff2MepZ9KM_NKHkRhupEDlV08s3HGiT2qxDmKihKrW1iSzo1M2ekWEQBT8GCzLZPc6InF1rpv6D8gXN6NfMzzjFXBF8tQoJPXHaEMW89l01VueYvf3Ez3a6pq8s8Et2dyM/s1600/%E5%B1%8F%E5%B9%95%E5%BF%AB%E7%85%A7+2012-11-27+%E4%B8%8B%E5%8D%8810.43.53.png" /></a></div>
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
最开始 a.txt 文件被载入到缓存中,占用84个page,共336K;使用vmtouch -e 将文件从缓存中清除;使用tail命令读取 a.txt 部分内容,再使用vmtouch进行查看,a.txt 部分内容被载入缓存,占用10个page,40K。<br />
<br />
下面是vmtouch的具体实现:<br />
<br />
<b>递归查找文件</b>:因为我们向vmtouch传入的可以是目录,最终是要操作该目录下的所有文件,而目录下可能包含目录,因而需要用到递归。vmtouch中vmtouch_crawl函数是个递归函数,若其参数path指示一个目录,则不断递归,是否是目录通过stat结构的st_mode字段判断,vmtouch_crawl函数中用到stat、opendir、readdir系统调用。<br />
<br />
<b>打开/映射文件</b>:目录下每个文件最终会跳出递归调用,vmtouch_file函数被vmtouch_crawl调用,用于文件处理。vmtouch_file先判断文件类型,对链接以及过大的文件(500*1024*1024)不进行处理,调用open、mmap完成文件打开和虚拟内存映射。<br />
<br />
<b>查询</b>:vmtouch_file调用mincore查询某个文件的缓存占用情况,传入系统调用mincore的第一个参数是mmap的返回值,第二个参数是文件长度值,第三个参数是指向一块pages_in_file大小的内存指针。根据mincore的查询结果,如果一个页面在内存中,则增加pages_in_core等统计值。<br />
<br />
<b>载入内存</b>:要将一个文件载入缓存,对其进行读取即可,vmtouch_file中通过对mem的访问操作达到载入文件缓存的目的:<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhhqLv-rHC8P6jB84B10kJJT3AmI8nSmO8OmhgFKUNsm8sy3nGKsb2AdY1QSI28MSjbBRpR5DxfQfPc1pGEeTCuSVs9UgEFEh6wwiAQ7vrWfOnmFyBPu1Nn77vwQXfvvuO_SgM-_wRyOGA/s1600/%E5%B1%8F%E5%B9%95%E5%BF%AB%E7%85%A7+2012-11-28+%E4%B8%8A%E5%8D%8812.12.01.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhhqLv-rHC8P6jB84B10kJJT3AmI8nSmO8OmhgFKUNsm8sy3nGKsb2AdY1QSI28MSjbBRpR5DxfQfPc1pGEeTCuSVs9UgEFEh6wwiAQ7vrWfOnmFyBPu1Nn77vwQXfvvuO_SgM-_wRyOGA/s1600/%E5%B1%8F%E5%B9%95%E5%BF%AB%E7%85%A7+2012-11-28+%E4%B8%8A%E5%8D%8812.12.01.png" /></a></div>
<br />
<br />
<br />
<br />
<br />
<br />
<b><br /></b>
<b>清除缓存</b>:在linux下,清除一个文件相应的缓存可调用posix_fadvise完成,posix_fadvise函数原型如为:int posix_fadvise(int fd, off_t offset, off_t len, int advice); 传入相应文件描述符、文件大小和 POSIX_FADV_DONTNEED 标志即可完成缓存清除。<br />
<br />
<b>内存锁</b>:有时候我们系统一些数据长驻内存,不被交换出去,这时我们可通过mlock调用实现,mlock第一个参数为mem,第二个参数为文件长度。<br />
<br />
vmtouch可用于“预热”文件系统缓存,有意识地换出冷数据、控制热数据常驻内存,从而减少page fault,增加缓存命中率。<br />
<br />
Have fun!bangerleehttp://www.blogger.com/profile/00090060391197685879noreply@blogger.com0tag:blogger.com,1999:blog-6962690516396325668.post-38910804156889126472012-11-22T04:22:00.002-08:002012-11-22T04:27:33.123-08:00rtags——node.js+redis实现的标签管理模块<b>引言</b><br />
在我们游览网页时,随处可见标签的身影:<br />
<ul>
<li>进入个人微博主页,可以看到自己/他人的标签,微博系统会推送与你有相同标签的人</li>
<li>游览博文,大多数博文有标签标记,以说明文章主旨,方便搜索和查阅</li>
<li>网上购物,我们经常使用标签进行商品搜索,如点选 “冬装” + “男士” + “外套” 进行衣物过滤</li>
</ul>
<a href="https://github.com/bangerlee/rtags.git" target="_blank">rtags</a>就是一个用于标签管理的node.js模块,其使用redis的set数据结构,存放标签和相关信息。<br />
<br />
<b>API</b><br />
rtags提供以下接口:<br />
<ol>
<li>添加物件及其标签 Tag#add(tags, id[, fn])</li>
<li>查询物件的标签 Tag#queryID(id, fn)</li>
<li>查询两个物件共有的标签 Tag#queryID(id1, id2, fn)</li>
<li>查询具有特定标签的物件 Tag#queryTag(tags, fn)</li>
<li>删除物件的标签 Tag#delTag(tags, id[, fn])</li>
<li>删除物件 Tag#remove(id[, fn])</li>
</ol>
<b>示例</b><br />
首先调用 Tag#createTag 生成一个 Tag 实例,传入一个字符串指示物件的类别,比如 ‘blogs’ 指示博文,‘clothes’ 指示衣服:<br />
var tag = rtags.createTag('blogs');<br />
<br />
然后添加该类别的物件和对应的标签,Tag#add 接收两个参数,第一个是物件的标签,有多个标签可用逗号隔开;第二个参数是物件的 id,以下代码中以 strs 下标为 id:<br />
<br />
var strs = [];<br />
strs.push('travel,photography,food,music,shopping');<br />
strs.push('music,party,food,girl');<br />
strs.push('mac,computer,cpu,memory,disk');<br />
strs.push('linux,kernel,linus,1991');<br />
strs.push('kernel,process,lock,time,linux');<br />
<br />
strs.forEach(function(str, i){ tag.add(str, i); });<br />
<br />
经过上面调用,redis 数据库中就有了博文标签数据,我们就可以进行相关查询了。查询某物件具有哪些标签,我们可以调用 Tag#queryID,该函数接收物件 id 和一个回调函数作为参数,查询结果作为数组存放在 ids 中:<br />
<br />
tag<br />
.queryID(id = '3')<br />
.end(function(err, ids){<br />
if (err) throw err;<br />
console.log('Tags for "%s":', id);<br />
var tags = ids.join(' ');<br />
console.log(' - %s', tags);<br />
});<br />
<br />
以上代码用于查询 id 为 ‘3’ 的博文的标签,执行该段代码,输出为:<br />
<br />
Tags for "3":<br />
- kernel linux linus 1991<br />
<br />
要查询两个物件具有哪些相同标签,同样调用 Tag#queryID,这时传入的参数应为两个物件的 id 和一个回调函数:<br />
<br />
tag<br />
.queryID(id1 = '3', id2 = '4')<br />
.end(function(err, ids){<br />
if (err) throw err;<br />
console.log('Tags for "%s" and "%s" both have:', id1, id2);<br />
var tags = ids.join(' ');<br />
console.log(' - %s', tags);<br />
});<br />
<br />
以上代码用于查询 id 为 ‘3’ 和 ‘4’ 的博文共有的标签,查询结果为:<br />
<br />
Tags for "3" and "4" both have:<br />
- kernel linux<br />
<br />
rtags 还提供根据标签搜索物件的功能,调用 Tag#queryTag,传入标签和一个回调函数,若有多个标签,可用逗号隔开:<br />
<br />
tag<br />
.queryTag(tags = 'music,food')<br />
.end(function(err, ids){<br />
if (err) throw err;<br />
console.log('The objects own the "%s" tags:', tags);<br />
var id = ids.join(' ');<br />
console.log(' - %s', id);<br />
process.exit();<br />
});<br />
<br />
以上代码查询同时具有 ‘music’ 和 ‘food’ 标签的博文,其输出为:<br />
<br />
The objects own the "music,food" tags:<br />
- 0 1<br />
<br />
<b>安装</b><br />
rtags通过以下命令安装,该命令会一同安装rtags依赖的redis模块:<br />
$ npm install rtags<br />
<br />
亦可以通过以下命令从 github 获取 rtags 源码:<br />
$ git clone git@github.com:bangerlee/rtags.git<br />
<br />
拉起 redis-server,安装 should 模块后,我们可以执行 rtags 源码目录下的例子:<br />
$ cd rtags/test<br />
$ node index.js<br />
<br />
github地址: <a href="https://github.com/bangerlee/rtags.git">https://github.com/bangerlee/rtags.git</a><br />
欢迎 git pull/fork/clone。<br />
<br />
Have fun!bangerleehttp://www.blogger.com/profile/00090060391197685879noreply@blogger.com0tag:blogger.com,1999:blog-6962690516396325668.post-59037440978215652092012-11-16T08:26:00.000-08:002012-11-16T08:26:25.402-08:00使用libqrencode生成二维码libqrencode用于生成QR code格式的二维码,其用C编写。相比ZXing支持一维、二维和多种编码格式,libqrencode功能更简单,只针对最常见的QR code,只能用于编码。<br />
<br />
libqrencode提供的接口在源码qrencode.h文件中有详细说明,除了编程接口,libqrencode还提供了一个现成的程序用于生成二维码。安装libqrencode后,源码目录下生成qrencode,其用法如下:<br />
./qrencode -s 5 -o bangerlee.png bangerlee.blogspot.com<br />
<br />
以上命令将字符串 "bangerlee.blogspot.com" 编码为QR code二维码,其中 -s 指示二维码上黑白小块的大小(单位为像素),-o 指示生成的二维码图像文件名称。<br />
<br />
libqrencode支持对中文进行编码。<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj2l9SedT7yg95NsEdOlcyprh3qXFpoBYZy0B9UOqDx2ysFhweDTZD1Uxb2v95-Mwabzz87OAu4kpZdRHTGXSf_Kl1uWPkQrWbG_eXACUdHwlmpPZbzPG6R7cbX6gczEvY47BRAks2EXR0/s1600/sina.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj2l9SedT7yg95NsEdOlcyprh3qXFpoBYZy0B9UOqDx2ysFhweDTZD1Uxb2v95-Mwabzz87OAu4kpZdRHTGXSf_Kl1uWPkQrWbG_eXACUdHwlmpPZbzPG6R7cbX6gczEvY47BRAks2EXR0/s1600/sina.png" /></a></div>
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
Have fun!bangerleehttp://www.blogger.com/profile/00090060391197685879noreply@blogger.com0tag:blogger.com,1999:blog-6962690516396325668.post-60585837332030141052012-11-13T08:22:00.000-08:002012-11-13T08:22:32.531-08:00Node.js+MongoDB实现短域名功能——开源项目shortMongoDB是一个分布式的文档存储数据库,数据用二进制的JSON格式BSON存储。<br />
<br />
设计一个存储博文的数据库表,如果使用关系型数据库,博文本身用一个表存储,评论用另一个单独的表存储,而使用MongoDB,评论可嵌入博文表,一篇完整的博文,其相关信息只需存放在一个表中:<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiiI2VwWLrFJYA8k9eCSUFZnCd2gka3OzE94a1cDCtPGObxhNzaU39sVnNMPQvnRn1kuzexygizMNjcKD6tK-PuPth6tBg7s1OIEE-zRQCqc-hjA_ZwLV7sPZjcnI3OsrLq3af7fswRXD8/s1600/%E5%B1%8F%E5%B9%95%E5%BF%AB%E7%85%A7+2012-11-13+%E4%B8%8B%E5%8D%8810.45.54.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiiI2VwWLrFJYA8k9eCSUFZnCd2gka3OzE94a1cDCtPGObxhNzaU39sVnNMPQvnRn1kuzexygizMNjcKD6tK-PuPth6tBg7s1OIEE-zRQCqc-hjA_ZwLV7sPZjcnI3OsrLq3af7fswRXD8/s1600/%E5%B1%8F%E5%B9%95%E5%BF%AB%E7%85%A7+2012-11-13+%E4%B8%8B%E5%8D%8810.45.54.png" /></a></div>
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
下面来看如何使用Node.js和MongoDB实现短域名功能,主要用到Node.js的Mongoose模块。<br />
<br />
首先设计短域名在MongoDB中的保存结构,除原URL、短域名这两个字段要存储外,还可以存储生成时间、访问者等与短域名相关的信息,表结构如下:<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh7ZE4hfWHADc71RZFoC-gHLO1KLxsGsCG05D4Oy55AXIZcFhz7MxVYaohcP_tsySsxw1_GVzUMq4fgg1IX27sxeZAw6UNa6n7I20bOd_M0nVPyOfLjzooNI4r-DTYSvFc8ZnWZzEz_MlY/s1600/%25E5%25B1%258F%25E5%25B9%2595%25E5%25BF%25AB%25E7%2585%25A7+2012-11-13+%25E4%25B8%258B%25E5%258D%258810.46.09.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh7ZE4hfWHADc71RZFoC-gHLO1KLxsGsCG05D4Oy55AXIZcFhz7MxVYaohcP_tsySsxw1_GVzUMq4fgg1IX27sxeZAw6UNa6n7I20bOd_M0nVPyOfLjzooNI4r-DTYSvFc8ZnWZzEz_MlY/s1600/%25E5%25B1%258F%25E5%25B9%2595%25E5%25BF%25AB%25E7%2585%25A7+2012-11-13+%25E4%25B8%258B%25E5%258D%258810.46.09.png" /></a></div>
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
以上URL表示缩短前的域名,hash表示短域名。<br />
<br />
其次考虑接口,接口很简单,一个接口generate用于接收URL,返回短域名;另一个接口retrieve接收短域名,返回原URL。<br />
<br />
最后需要设计一个hash函数实现URL与短域名关联,hash函数供generate函数调用。<br />
<br />
generate函数:<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg4MXpfsv8sdb7-10pG2afR-MlK3IhBolUZULrUV025wP31P-R2R_uNCP46j0fwlTm45XBBD03GSiNCTUs7CYgaiYBTuaT8CsxjzUO72FO5qVET7q8ai8xQlcOaiZ4-2kuxZ2uUEairVP0/s1600/%25E5%25B1%258F%25E5%25B9%2595%25E5%25BF%25AB%25E7%2585%25A7+2012-11-13+%25E4%25B8%258B%25E5%258D%258810.47.50.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg4MXpfsv8sdb7-10pG2afR-MlK3IhBolUZULrUV025wP31P-R2R_uNCP46j0fwlTm45XBBD03GSiNCTUs7CYgaiYBTuaT8CsxjzUO72FO5qVET7q8ai8xQlcOaiZ4-2kuxZ2uUEairVP0/s1600/%25E5%25B1%258F%25E5%25B9%2595%25E5%25BF%25AB%25E7%2585%25A7+2012-11-13+%25E4%25B8%258B%25E5%258D%258810.47.50.png" /></a></div>
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
以上用到mongoose的save接口,往mongoDB服务器保存短域名数据。<br />
<br />
retrieve函数:<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjN9lWSU6VtbXK0wbqBTh28e60yzSKSduPXnc28pqY8_4emHlY_6LYBnf7jW7WCC86DTSRQxoZHVvnDzwORqh4m30uAnMebuKEYtiBFd7F-qeZEYG1fqENOIz_WjWFzEpttP8Td84p3Wv4/s1600/%25E5%25B1%258F%25E5%25B9%2595%25E5%25BF%25AB%25E7%2585%25A7+2012-11-13+%25E4%25B8%258B%25E5%258D%258810.48.15.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjN9lWSU6VtbXK0wbqBTh28e60yzSKSduPXnc28pqY8_4emHlY_6LYBnf7jW7WCC86DTSRQxoZHVvnDzwORqh4m30uAnMebuKEYtiBFd7F-qeZEYG1fqENOIz_WjWFzEpttP8Td84p3Wv4/s1600/%25E5%25B1%258F%25E5%25B9%2595%25E5%25BF%25AB%25E7%2585%25A7+2012-11-13+%25E4%25B8%258B%25E5%258D%258810.48.15.png" /></a></div>
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
findByHash函数中,调用mongoose的findeOne接口,findeOne根据传入的hash值,在mongoDB服务器中查找相应的短域名条目。完成查找后,findByHash再调用mongoose的update接口更新短域名条目中的hits等字段。<br />
<br />
hash函数:<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjpuo6F2NWohNsbWmpL512GODFq2OBq6pSrEFvaG0xdP32ePhJewfLi2pYW_nFS_n4uwm9cyt9xTsQr6JpocT8nnaFXY7QgFcWvc0YA_tMpB9eoarSYI3nhjLKXUJ-LAq1JThttljjAIjQ/s1600/%25E5%25B1%258F%25E5%25B9%2595%25E5%25BF%25AB%25E7%2585%25A7+2012-11-13+%25E4%25B8%258B%25E5%258D%258810.48.41.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjpuo6F2NWohNsbWmpL512GODFq2OBq6pSrEFvaG0xdP32ePhJewfLi2pYW_nFS_n4uwm9cyt9xTsQr6JpocT8nnaFXY7QgFcWvc0YA_tMpB9eoarSYI3nhjLKXUJ-LAq1JThttljjAIjQ/s1600/%25E5%25B1%258F%25E5%25B9%2595%25E5%25BF%25AB%25E7%2585%25A7+2012-11-13+%25E4%25B8%258B%25E5%258D%258810.48.41.png" /></a></div>
<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
</div>
<div class="separator" style="clear: both; text-align: center;">
</div>
hash函数很简单,一个URL通过hasher对应到一个长度为6的 [0-9a-zA-Z]字符串。<br />
<br />
调用以上generate接口,完成 URL为 http://nodejs.org/,以及URL为 http://bangerlee.blogspot.com/ 的短域名生成后,使用mongo进行数据查询,我们可以看到:<br />
<div class="separator" style="clear: both; text-align: center;">
</div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjLD3-I9Zlx0tN8HlEXcLQmHzQCdcyDGdmX23gbyIDzMfx__JFaWUmr3c182ydSuEUVSsjrkbbBb0rEnAVA8gXCKzANa3QMh2LnlQP3xZCFzz7bTXm3XJAlwh2B6zFy8fEzVULCpG8XLAY/s1600/%E5%B1%8F%E5%B9%95%E5%BF%AB%E7%85%A7+2012-11-14+%E4%B8%8A%E5%8D%8812.15.50.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjLD3-I9Zlx0tN8HlEXcLQmHzQCdcyDGdmX23gbyIDzMfx__JFaWUmr3c182ydSuEUVSsjrkbbBb0rEnAVA8gXCKzANa3QMh2LnlQP3xZCFzz7bTXm3XJAlwh2B6zFy8fEzVULCpG8XLAY/s1600/%E5%B1%8F%E5%B9%95%E5%BF%AB%E7%85%A7+2012-11-14+%E4%B8%8A%E5%8D%8812.15.50.png" /></a></div>
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
有了以上短域名功能,我们可以进一步搭建一个提供短域名跳转的服务器,其核心是根据hash,调用retrieve函数,从MongoDB服务器上获取相应的URL,完成域名跳转:<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi-dtDxB85a1qaLteb7Bt6t-CN6UsbqL9ZZ6W1cEX8t6yhc7HpfOtB0GfJlkZo_3y0yWM9AboGAyWiROuvP6cRAcxqDGM5smgFtr6xpFsmMKAfhOB5AHnQ0vbYISA24tPCW5BzRgTwpkuo/s1600/%E5%B1%8F%E5%B9%95%E5%BF%AB%E7%85%A7+2012-11-14+%E4%B8%8A%E5%8D%8812.01.32.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi-dtDxB85a1qaLteb7Bt6t-CN6UsbqL9ZZ6W1cEX8t6yhc7HpfOtB0GfJlkZo_3y0yWM9AboGAyWiROuvP6cRAcxqDGM5smgFtr6xpFsmMKAfhOB5AHnQ0vbYISA24tPCW5BzRgTwpkuo/s1600/%E5%B1%8F%E5%B9%95%E5%BF%AB%E7%85%A7+2012-11-14+%E4%B8%8A%E5%8D%8812.01.32.png" /></a></div>
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
执行以上服务器程序,然后在地址栏输入 http://localhost:8080/GHJwvl ,回车之后就会跳转到 http://bangerlee.blogspot.com/ 。<br />
<br />
Have fun!bangerleehttp://www.blogger.com/profile/00090060391197685879noreply@blogger.com0tag:blogger.com,1999:blog-6962690516396325668.post-25374562715274314342012-11-08T07:29:00.001-08:002012-11-08T07:51:57.433-08:00Redis+node.js使用实例——英文搜索引擎RedsRedis不仅能像memcached一样用作缓存层,其还可以作以下用途:<br />
<ul>
<li><b>持久化</b>:aof 或 rdb</li>
<li><b>消息队列</b>:使用Redis的list数据结构,也可以使用score set做带权重的消息队列</li>
<li><b>日志收集器</b>:多个端点将日志信息写入Redis,单独一个线程将所有日志写到磁盘</li>
<li><b>记录社交关系</b>:将每个人的好友存放在一个set中,求两个人的共同好友时,只需求出两个集合的交集</li>
</ul>
适合Redis的应用场景远不止上面列的这些,只有想不到,没有用不到。下面看一个使用Redis+node.js实现的英文搜索引擎——Reds,学习Redis的用法。<br />
<br />
Reds用到node.js的Redis模块,以及处理英文自然语言的natural模块,其实现以下功能:<br />
调用Reds的search.index接口将英文语句加到Redis服务器,如以下语句:<br />
<ul>
<li>'Tobi wants 4 dollars'</li>
<li>'Loki is a ferret'</li>
<li>'Tobi is also a ferret'</li>
<li>'Jane is a bitchy ferret'</li>
<li>'Tobi is employed by LearnBoost'</li>
</ul>
调用单词搜索接口search.query、search.type和search.end搜索符合条件的单词。针对以上语句,若查询同时包含 'jane' 和 'bitchy' 的语句,则Reds返回 'Jane is a bitchy ferret' 这条语句作为结果;若查询包含 'jane' 或 'dollars' 的语句,则Reds返回 'Jane is a bitchy ferret' 和 'Tobi wants 4 dollars' 作为结果。<br />
<br />
具体接口调用如下:<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhnCxXTvAUTrgDl7hgG6QqEYM162yxy4To0AXprm7ym4b0YOfpsN_HIhyWSCBev9O7HeTs0XvEvgu68kqDmJZSxsF6RRdH71fgx66LuR8-mgIxUyDEI1Mb9qr8j4r4FEe_ZdBx7820Y4T8/s1600/%E5%B1%8F%E5%B9%95%E5%BF%AB%E7%85%A7+2012-11-08+%E4%B8%8B%E5%8D%887.16.05.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhnCxXTvAUTrgDl7hgG6QqEYM162yxy4To0AXprm7ym4b0YOfpsN_HIhyWSCBev9O7HeTs0XvEvgu68kqDmJZSxsF6RRdH71fgx66LuR8-mgIxUyDEI1Mb9qr8j4r4FEe_ZdBx7820Y4T8/s1600/%E5%B1%8F%E5%B9%95%E5%BF%AB%E7%85%A7+2012-11-08+%E4%B8%8B%E5%8D%887.16.05.png" /></a></div>
<br />
<br />
<br />
<br />
<br />
<br />
<br />
reds.createSearch调用创建一个search对象,传入'misc'作为该search对象的标识,后续在该search下进行语句插入和单词查询,均需匹配该标识。<br />
<br />
search.index处理语句,进行分词,最后存放到redis服务器score set结构中,第一个参数为要保存的语句,第二个参数为语句的id,第三个参数为插入语句时执行的函数,为可选参数。search.index函数原型如下:<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh3TrRzQDc1Vxvu1IlJGFU1AJozhxbcqCUjIPA9xJyRYMRkvQHJov0-eW2BhpXzZJVIZu4EYmjghVgh8wUgt8ogziOTfUH4nVVmvWY0uEaDh3YUozeLMd9YjzRdzryQ0yIAHg3KccJ_Mto/s1600/%E5%B1%8F%E5%B9%95%E5%BF%AB%E7%85%A7+2012-11-08+%E4%B8%8B%E5%8D%887.35.22.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh3TrRzQDc1Vxvu1IlJGFU1AJozhxbcqCUjIPA9xJyRYMRkvQHJov0-eW2BhpXzZJVIZu4EYmjghVgh8wUgt8ogziOTfUH4nVVmvWY0uEaDh3YUozeLMd9YjzRdzryQ0yIAHg3KccJ_Mto/s1600/%E5%B1%8F%E5%B9%95%E5%BF%AB%E7%85%A7+2012-11-08+%E4%B8%8B%E5%8D%887.35.22.png" /></a></div>
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<ul>
<li>key为search对象的标识,即 'misc'</li>
<li>db为redis.createClient()调用所创建的redis客户端对象</li>
<li>对于要插入的str语句,words调用确保str是[a-zA-Z0-9]范围内的有效字符,stripStopWords调用过滤掉str中的stop words(对于英文语句,a/the/to等词就属于stop words),之后再由stem调用得到语句中剩余词的词干(比如 cats/catty/catlike 的词干都是 cat),原语句 'Tobi wants 4 dollars' 经过该步处理后变成一个数组: {'toni', 'want', 'dollar'}</li>
<li>countWords计算以上数组元素个数</li>
<li>metaphoneMap函数也与自然语言处理相关,其底层调用natural模块的metaphone函数,生成单词对应的发音词(如对 'tobi',返回 'TB'),metaphoneMap最后返回包含如下内容的对象:</li>
</ul>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiJqL74-x4TFphDn4WL3fy-xBI3YAWOlLqzYXKhA_4Pk0BWpN8AK-DxudZ5cPdQKaj9pvVlvQkevdIBG-qE1ryrMzcHBKgcKnacodVtyRE66-ULYdR9KOKx5U0kRmVwQl1bvDc_IodRw30/s1600/%E5%B1%8F%E5%B9%95%E5%BF%AB%E7%85%A7+2012-11-08+%E4%B8%8B%E5%8D%888.56.54.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiJqL74-x4TFphDn4WL3fy-xBI3YAWOlLqzYXKhA_4Pk0BWpN8AK-DxudZ5cPdQKaj9pvVlvQkevdIBG-qE1ryrMzcHBKgcKnacodVtyRE66-ULYdR9KOKx5U0kRmVwQl1bvDc_IodRw30/s1600/%E5%B1%8F%E5%B9%95%E5%BF%AB%E7%85%A7+2012-11-08+%E4%B8%8B%E5%8D%888.56.54.png" /></a></div>
<br />
<br />
<br />
<br />
<br />
<br />
<br />
以上都是reds自然语言处理相关的代码,下面才真正用到Redis,Redis zadd命令的格式为:<br />
zadd key score member [score] [member]<br />
key为键值,score表示权重,member为内容,score和member可选。<br />
<br />
再看上面的代码,对于语句中的每个单词,调用zadd添加两条记录,对于 'tobi',有:<br />
zadd misc:word:TB 1 1<br />
zadd misc:object:1 1 TB<br />
<br />
整条语句保存后,有:<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiVke0NctSmHPeQ695sueT15FMwJbOsQ_-i8xU5XsVqLRUis-qUG_AR97XnMVag2X3CXCNzz4AC5BXo-ixWGMph1B_Qo8wxXUsNjLUVJkGI_-0M9pl1ydMhZYLzpBQW3EioZsSA8GX9UE4/s1600/%E5%B1%8F%E5%B9%95%E5%BF%AB%E7%85%A7+2012-11-08+%E4%B8%8B%E5%8D%889.38.29.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiVke0NctSmHPeQ695sueT15FMwJbOsQ_-i8xU5XsVqLRUis-qUG_AR97XnMVag2X3CXCNzz4AC5BXo-ixWGMph1B_Qo8wxXUsNjLUVJkGI_-0M9pl1ydMhZYLzpBQW3EioZsSA8GX9UE4/s1600/%E5%B1%8F%E5%B9%95%E5%BF%AB%E7%85%A7+2012-11-08+%E4%B8%8B%E5%8D%889.38.29.png" /></a></div>
<br />
<br />
<br />
<br />
<br />
<br />
<br />
以上为分词插入Redis score sets过程,可以看到对于每个单词,插入了两条记录,一条以单词发音缩写为key,单词出现次数为score,句子id为内容;另一条以句子id为key,单词出现次数为score,单词发音缩写为内容。<br />
<br />
下面来看单词查询过程,查询主要由end函数完成:<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjIaAt1KO-mLBMzl3IyS-e4fKz3gq8p4eaf9iSK3qVlHFHecl9j1tAQejvfhHRg7OKeng8gOohIwdL-KZ9hgeYFYKiV7nDUP6EFOeDkRxjHuDPxa53MNxpJc53ZCaTCg6aU8m4HvzD44kM/s1600/%E5%B1%8F%E5%B9%95%E5%BF%AB%E7%85%A7+2012-11-08+%E4%B8%8B%E5%8D%889.50.42.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjIaAt1KO-mLBMzl3IyS-e4fKz3gq8p4eaf9iSK3qVlHFHecl9j1tAQejvfhHRg7OKeng8gOohIwdL-KZ9hgeYFYKiV7nDUP6EFOeDkRxjHuDPxa53MNxpJc53ZCaTCg6aU8m4HvzD44kM/s1600/%E5%B1%8F%E5%B9%95%E5%BF%AB%E7%85%A7+2012-11-08+%E4%B8%8B%E5%8D%889.50.42.png" /></a></div>
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
以上查询代码,也是先求出所要查询单词的词干,然后由metaphoneKeys返回之前插入单词时的key格式,对于 'tobi',metaphoneKeys返回 'misc:word:TB'。<br />
<br />
db.multi括起来的代码指示对Redis加权集合进行并集或交集操作,type由search.type接口指定,默认为 'and',即进行交集操作,对应的Redis命令为 zinterstore,zinterstore格式如下:<br />
<br />
<div style="font: 16.0px Times; margin: 0.0px 0.0px 0.0px 0.0px;">
zinterstore destination numkeys key [key …]</div>
<br />
可理解为创立一个名为 '<span class="Apple-style-span" style="font-size: 16px;">destination' 的集合,key对应的member相同,则满足交集条件,这样的member属于</span> '<span class="Apple-style-span" style="font-size: 16px;">destination' 中的一个元素。</span><br />
<br />
对应于本文开头的查询示例,查询既有 'tobi' 单词,又包含 'dollar' 单词,有:<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi41g4UpNPXB3fx3nGmBwgV4XAGGqLZSXM6nob60ILe0mDrEkrellexDafP0kCEzSkV1iDsB1mgTENX6ntk77XvilViaNf766mZcPe8sxZQt3H3CylDhEr1hmi_Vjls5gm9PCeP1VGjIac/s1600/%E5%B1%8F%E5%B9%95%E5%BF%AB%E7%85%A7+2012-11-08+%E4%B8%8B%E5%8D%8810.14.32.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi41g4UpNPXB3fx3nGmBwgV4XAGGqLZSXM6nob60ILe0mDrEkrellexDafP0kCEzSkV1iDsB1mgTENX6ntk77XvilViaNf766mZcPe8sxZQt3H3CylDhEr1hmi_Vjls5gm9PCeP1VGjIac/s1600/%E5%B1%8F%E5%B9%95%E5%BF%AB%E7%85%A7+2012-11-08+%E4%B8%8B%E5%8D%8810.14.32.png" /></a></div>
<br />
<br />
<br />
<br />
<br />
<br />
<br />
可以看到zrevrange返回id 1,对应于本文开头使用search.index插入的语句。<br />
<br />
<div style="font: 16.0px Times; margin: 0.0px 0.0px 0.0px 0.0px;">
<span style="font: 16.0px STSong;">对于求并集,需由调用</span>search..type('or')<span style="font: 16.0px STSong;">,对应的</span>Redis<span style="font: 16.0px STSong;">命令为</span>zunionstore<span style="font: 16.0px STSong;">。</span><br />
<span style="font: 16.0px STSong;"><br /></span>
Have fun!</div>
bangerleehttp://www.blogger.com/profile/00090060391197685879noreply@blogger.com0tag:blogger.com,1999:blog-6962690516396325668.post-29434120885455234372012-11-03T22:25:00.000-07:002012-11-03T22:25:24.598-07:00Redis源码走读同为内存K-V数据库,Redis具有数据持久化等功能,比memcached强大。下面通过走读Redis代码,了解Redis大体框架。<br />
<br />
首先从Redis服务器端的main函数开始,main调用initServerConfig对redisServer结构类型的server全局变量进行默认初始化,填充默认端口(6379)、DB数量(16)等字段。initServerConfig接着调用populateCommandTable对全局变量redisCommandTable保存的Redis命令进行分类,分类后的命令由全局变量commandTable存放,lookupCommand函数用于命令查找。<br />
<br />
之后main函数调用loadServerConfig,使用redis.conf中的配置重新填充server中的字段。接着main调用initServer,大部分服务器初始化工作由该函数完成。initServer进行以下函数调用:<br />
<br />
<br />
<ul>
<li>调用listCreate创建存放不同状态的客户端的链表,填充server中clients、clients_to_close、slaves、monitors等字段</li>
<li>调用createShareObjects创建共享对象,服务器内部实现用到的特定字符串、整数,Redis将其包装成共享对象,存放在share全局变量中,以供其他数据结构共用。如封装了换行符 '\r\n' 的字符串对象,响应消息、出错消息均要使用;strings、sets、lists等结构均要使用整数对象作ID、引用计数等</li>
<li>调用aeCreateEventLoop,该函数调用aeApiCreate,aeApiCreate调用epoll_create,创建epoll实例</li>
<li>调用zmalloc给DB分配内存,之后分别对各个DB分配dict</li>
<li>调用anetTcpServer,其底层调用socket、setsockopt、bind、listen等系统接口,完成端口监听</li>
<li>调用aeCreateTimeEvent,将serverCron加到时间事件队列,serverCron为Redis服务器完成断连超时客户端等定时任务</li>
<li>调用aeCreateFileEvent,将acceptTcpHandler加入IO事件队列,acceptTcpHandler调用anetTcpAccept,anetTcpAccept 调用anetGenericAccept,anetGenericAccept 执行while(1)循环调用系统接口accept,等待客户端的连接</li>
</ul>
<br />
<br />
回到main函数,main调用aeMain,aeMain首先调用beforesleep,如果配置了AOF,beforesleep调用flushAppendOnlyFile将内存数据刷入磁盘,接着aeMain调用aeProcessEvents处理事件事件和IO事件。<br />
<br />
以上为Redis服务器启动过程的代码实现,下面我们来看Redis服务器处理指令的代码实现。<br />
<br />
在服务器初始化initServer函数中,注册了acceptTcpHandler IO事件函数,并循环调用accept等待客户端接入。当有客户端接入时,acceptTcpHandler下的acceptCommonHandler函数被调用,acceptCommonHandler调用createClient,createClient创建一个redisClient对象c,选定一个DB,将accept返回的文件描述符记录到c的fd字段,并将c加到server.client链表,调用aeCreateFileEvent将readQueryFromClient添加到IO事件队列,该函数用于接受客户端指令。<br />
<br />
当客户端向服务器发送指令,readQueryFromClient函数被调用,其调用processInputBuffer对指令进行解析,processInputBuffer调用processCommand对指令作有效性检查,最后processCommand调用cmd->proc对执行进行处理,相对与set指令,cmd->proc指示的就是setCommand函数。<br />
<br />
Have fun!bangerleehttp://www.blogger.com/profile/00090060391197685879noreply@blogger.com0tag:blogger.com,1999:blog-6962690516396325668.post-20566134022002139552012-10-31T08:05:00.001-07:002012-10-31T08:06:54.431-07:00libevent源码走读libevent响应事件(文件描述符可读/可写、超时、信号),调用特定函数进行处理,libevent中主要有以下几个概念:<br />
<br />
<ol>
<li>事件多路分发机制(event demultiplexer),即epoll、kqueue、select等</li>
<li>事件源,在指定的文件描述符上注册关心的事件,如I/O读写、定时、信号事件</li>
<li>事件处理器(event handler),事件触发时被调用</li>
<li>反应器(reactor),事件管理接口,注册事件后进入循环,事件就绪时调用事件处理函数</li>
</ol>
<br />
<br />
libevent中主要的几个接口:<br />
<br />
<ul>
<li>event_init:初始化libevent库,生成event_base实例</li>
<li>event_set:初始化事件event,设置回调函数和关注的事件</li>
<li>event_base_set:设置event从属的event_base,即指明event要注册到哪个event_base上</li>
<li>event_add:正式添加事件</li>
<li>event_base_dispatch:程序进入无限循环,等待就绪事件</li>
</ul>
<br />
<br />
以下是上面各个函数的具体实现(基于libevent-2.0.20-stable版本)。<br />
<br />
<b>event_init函数</b>:调用event_base_new_with_config,该函数进行事件多路分发机制选择。epoll、kqueue、select等多种事件多路分发机制被存放在eventops数组中,从该数组下标0开始选择,将选好的事件多路分发机制存放在evsel字段中。之后调用evsel->init,该接口函数最终调用对应事件多路分发机制的初始化函数,如epoll对应的是epoll_create。<br />
<br />
<b>event_set函数</b>:调用event_assign,event_assign函数中,填充event结构中的文件描述符、事件类型、事件处理函数等字段。<br />
<br />
<b>event_base_set函数</b>:简单地设定event结构中的ev_base字段为指定值。<br />
<br />
<b>event_add函数</b>:调用event_add_internal,该函数中,根据不同的事件类型,调用不同函数处理,对于I/O,调用evmap_io_add;对于signal,调用evmap_signal_add。之后调用event_queue_insert,将事件加入激活事件队列。<br />
<br />
在evmap_io_add和evmap_signal_add中,均会调用evsel->add,其作用就是调用某个具体事件多路分发机制的接口函数,完成事件添加。例如对应于epoll的add函数就是epoll_nochangelist_add,该函数调用epoll_apply_one_change,epoll_apply_one_change调用epoll_ctl进行事件注册。<br />
<br />
<b>event_base_dispatch函数</b>:调用event_base_loop,该函数调用evsel->dispatch,即事件多路分发机制注册的dispatch函数,对应于epoll就是epoll_dispatch,epoll_dispatch调用epoll_wait,调用event_set函数时设定了监听的文件描述符,epoll_wait在此文件描述符上等待I/O事件发生。<br />
<br />
Have fun!bangerleehttp://www.blogger.com/profile/00090060391197685879noreply@blogger.com0tag:blogger.com,1999:blog-6962690516396325668.post-43365699964057223472012-10-30T05:35:00.001-07:002012-10-30T05:38:21.825-07:00memcached客户端一致性哈希算法实现——libketama对于memcached,K-V存储到哪个memcached服务器,由memcached客户端决定。下面我们分析一种memcached客户端一致性哈希算法(consistent hashing algo)实现库——libketama。<br />
<div>
<br /></div>
<div>
<b>使用方法</b></div>
<div>
libketama提供了一个memcached服务器配置文件,我们需先将服务器ip、memory填入该文件:</div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj3UC89141sM__Eko9njKUJYOCaAc6lIdoF6w5vONy8Gn9m-s9_woXpMZVjVR5pFlK2DhRoP2wTwu8hzVkKQoeu4yXzAZFOkS2ZEX9ZPXgpZjj_UgpXwlwDG8644x8WE44HQ2pd2lUhvc4/s1600/%25E5%25B1%258F%25E5%25B9%2595%25E5%25BF%25AB%25E7%2585%25A7+2012-10-30+%25E4%25B8%258B%25E5%258D%25887.16.50.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj3UC89141sM__Eko9njKUJYOCaAc6lIdoF6w5vONy8Gn9m-s9_woXpMZVjVR5pFlK2DhRoP2wTwu8hzVkKQoeu4yXzAZFOkS2ZEX9ZPXgpZjj_UgpXwlwDG8644x8WE44HQ2pd2lUhvc4/s1600/%25E5%25B1%258F%25E5%25B9%2595%25E5%25BF%25AB%25E7%2585%25A7+2012-10-30+%25E4%25B8%258B%25E5%258D%25887.16.50.png" /></a></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
之后我们编码调用libketama接口,输入K-V中的key值,libketama为我们返回该K-V将要被存放到的memcached服务器ip。</div>
<div>
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiPhlwjW_kYciP5UwoxeBP_I54T7qEjYvkfzyDBC8tlXr_6BtVUIMl7TqPkIlsKTMStt9Ei4vKZA9Hum4YnDltc2wjAkU0w7RpHIR27njiP5pVj4_ClsUGqVWRLRpxtz5PJMR264bvHRXo/s1600/%E5%B1%8F%E5%B9%95%E5%BF%AB%E7%85%A7+2012-10-30+%E4%B8%8B%E5%8D%887.25.59.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiPhlwjW_kYciP5UwoxeBP_I54T7qEjYvkfzyDBC8tlXr_6BtVUIMl7TqPkIlsKTMStt9Ei4vKZA9Hum4YnDltc2wjAkU0w7RpHIR27njiP5pVj4_ClsUGqVWRLRpxtz5PJMR264bvHRXo/s1600/%E5%B1%8F%E5%B9%95%E5%BF%AB%E7%85%A7+2012-10-30+%E4%B8%8B%E5%8D%887.25.59.png" /></a></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br />
<br />
以上代码简单展示了libketama的用法,用到libketama提供的ketama_roll、ketama_hashi、ketama_get_server、ketama_smoke几个接口。</div>
<div>
<br /></div>
<div>
<b>一致性哈希模型构建</b></div>
<div>
构建一致性哈希模型,需要模拟两个对象,一个是圆,另一个是圆上的虚拟节点。libketama分别通过continuum、mcs两个结构模拟圆和虚拟节点:</div>
<div>
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhAva_rJhNRv5HF9QfpeEGU2tQWXXhM-RqLTqa-2TzksAGv-K3WY21tZoEfKzi7wLpsD5g-IdFrqaql4vlS8KCCgZWpiPtfuu1BHzyi5BHw4nF-T8DOKGb3wP1BcB3Du9ISxyGOtSIcS8Y/s1600/%E5%B1%8F%E5%B9%95%E5%BF%AB%E7%85%A7+2012-10-30+%E4%B8%8B%E5%8D%887.35.35.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhAva_rJhNRv5HF9QfpeEGU2tQWXXhM-RqLTqa-2TzksAGv-K3WY21tZoEfKzi7wLpsD5g-IdFrqaql4vlS8KCCgZWpiPtfuu1BHzyi5BHw4nF-T8DOKGb3wP1BcB3Du9ISxyGOtSIcS8Y/s1600/%E5%B1%8F%E5%B9%95%E5%BF%AB%E7%85%A7+2012-10-30+%E4%B8%8B%E5%8D%887.35.35.png" /></a></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br />
以上numpoints记录圆上虚拟节点的数目,modtime记录memcached服务器配置文件的修改时间,array为圆上虚拟节点mcs数组。</div>
<div>
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg8nUqg0wEopVg9czQDbSDWCcQ3i0k8zJ0EommUWxGZASY0507E6cSsk26Nqm0ZL8gJz6WxoilTczPKzPlbwuXsopSkFkLWMZNcQ1-4UOij6PvzgWQ_QFbIPfG78y_J4iaDCPjnVDywrR8/s1600/%E5%B1%8F%E5%B9%95%E5%BF%AB%E7%85%A7+2012-10-30+%E4%B8%8B%E5%8D%887.36.49.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg8nUqg0wEopVg9czQDbSDWCcQ3i0k8zJ0EommUWxGZASY0507E6cSsk26Nqm0ZL8gJz6WxoilTczPKzPlbwuXsopSkFkLWMZNcQ1-4UOij6PvzgWQ_QFbIPfG78y_J4iaDCPjnVDywrR8/s1600/%E5%B1%8F%E5%B9%95%E5%BF%AB%E7%85%A7+2012-10-30+%E4%B8%8B%E5%8D%887.36.49.png" /></a></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br />
以上point记录虚拟节点在圆上的位置。</div>
<div>
<br /></div>
<div>
<b>模型构建过程</b></div>
<div>
ketama_roll接口用于一致性哈希模型构建,其调用ketama_create_continuum,ketama_create_continuum函数先调用read_server_definition函数读取memcached服务器配置文件,将信息保存在以下结构中:</div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj2-fZq47obZUF54RTBhhmViKyaoUO9ZDlA2RS8IFjFTZf4xNQmKdNVoNiYk_Mlf-tTv2PPfFJWKoegHQSoGsnRNG5rh6YvcIMF1HpU_7BG_cX48fV-xe1Hg0lXMx8Xh5GGH5HJsI77EQc/s1600/%E5%B1%8F%E5%B9%95%E5%BF%AB%E7%85%A7+2012-10-30+%E4%B8%8B%E5%8D%887.53.01.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj2-fZq47obZUF54RTBhhmViKyaoUO9ZDlA2RS8IFjFTZf4xNQmKdNVoNiYk_Mlf-tTv2PPfFJWKoegHQSoGsnRNG5rh6YvcIMF1HpU_7BG_cX48fV-xe1Hg0lXMx8Xh5GGH5HJsI77EQc/s1600/%E5%B1%8F%E5%B9%95%E5%BF%AB%E7%85%A7+2012-10-30+%E4%B8%8B%E5%8D%887.53.01.png" /></a></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br />
之后构建虚拟节点,根据所配置的服务器个数,一共构建numservers*160个虚拟节点:</div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjBX4TMPW_r52SASqVOMzP6gAj_Z9UUbVJL_UKl5AqqJ05PtTecDr7YGzT54KPIymbVG7O_r2upswIO18lT0oInkXjHWTv99-409G03ZpKK7PLDzlOg2FxMAH0uQ9oWcIYyLlrhk0CInK4/s1600/%E5%B1%8F%E5%B9%95%E5%BF%AB%E7%85%A7+2012-10-30+%E4%B8%8B%E5%8D%887.56.31.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjBX4TMPW_r52SASqVOMzP6gAj_Z9UUbVJL_UKl5AqqJ05PtTecDr7YGzT54KPIymbVG7O_r2upswIO18lT0oInkXjHWTv99-409G03ZpKK7PLDzlOg2FxMAH0uQ9oWcIYyLlrhk0CInK4/s1600/%E5%B1%8F%E5%B9%95%E5%BF%AB%E7%85%A7+2012-10-30+%E4%B8%8B%E5%8D%887.56.31.png" /></a></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br />
以上代码中,首先对每个物理服务器节点计算权重pct,再由权重得出为每个物理服务器设立的虚拟节点个数ks*4。</div>
<div>
<br />
之后调用ketama_md5_digest计算“ip-i”(比如“10.0.1.1:11211-0”)对应的md5值,并由该md5值得出虚拟节点的具体位置,即mcs结构中的point值。</div>
<div>
<br /></div>
<div>
确定所有虚拟节点point值之后,最终对所有point值排序,并将虚拟节点总个数、mcs数组放入共享内存中。至此完成一致性哈希模型的构建。</div>
<div>
<br /></div>
<div>
<b>由Key获取ip</b></div>
<div>
一个K-V应该放入哪个memcached?首先我们可以调用ketama_hashi接口计算出key的md5散列值kh,其计算方法与计算虚拟节点point值的方法相同。</div>
<div>
<br /></div>
<div>
之后调用ketama_get_server,构建虚拟节点时虚拟节点已根据point值完成排序,ketama_get_server中采用二分查找法,若能找到第一个大于kh的point值,则返回相应的mcs结构;若未找到,则返回虚拟节点数组的第一个mcs结构。</div>
<div>
<br /></div>
<div>
Have fun!</div>
<div>
<br /></div>
bangerleehttp://www.blogger.com/profile/00090060391197685879noreply@blogger.com0tag:blogger.com,1999:blog-6962690516396325668.post-11320896971506858242012-10-29T09:34:00.000-07:002012-10-29T09:51:02.374-07:00memcached中的内存管理memcached,分布式K-V内存缓存服务,其核心为内存管理,下面我们就来了解memcached管理内存的方式、解读这部分代码,本文基于memcached 1.4.15版本。<br />
<br />
memcached以类似Linux内核中的slab内存分配机制进行内存管理:<br />
<div class="separator" style="clear: both; text-align: center;">
</div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgUyTTkhTNDwEMcgzTinfQbjL08LDhyphenhyphenMIZ6C7JOD_DAeJxIitbq-k_W7-mXhJg2TvwfugD3DRNeb84nS-XY9_AiB6APSAw_o0gYu0mXrJE7-uGyK6B9a3mIIJhkkR7SFtRBudSuTqT4UTM/s1600/%E5%B1%8F%E5%B9%95%E5%BF%AB%E7%85%A7+2012-10-29+%E4%B8%8A%E5%8D%8812.32.22.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" height="195" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgUyTTkhTNDwEMcgzTinfQbjL08LDhyphenhyphenMIZ6C7JOD_DAeJxIitbq-k_W7-mXhJg2TvwfugD3DRNeb84nS-XY9_AiB6APSAw_o0gYu0mXrJE7-uGyK6B9a3mIIJhkkR7SFtRBudSuTqT4UTM/s400/%E5%B1%8F%E5%B9%95%E5%BF%AB%E7%85%A7+2012-10-29+%E4%B8%8A%E5%8D%8812.32.22.png" width="400" /></a></div>
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
一块连续的1M大小的page被分成多个等大的chunk,slab class管理特定大小的chunk。一对K-V组成一个item,一个item被放置到一块chunk中。<br />
<br />
除了以上结构外,memcached使用名为slots的链表,管理空闲的chunk:<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjcmg1kr0b8uMlb1n0W5N5hxvZrFUA2f5buJImAJf1lKqgfXluJ0r-GRDukSydaVB8uGkh_ciEYUqritExS-uh6Tlntv_zvNSIM1r8PTTqdNSdx6FRuygXtn79lV6aqj_oAsZS1m21AKh4/s1600/%25E5%25B1%258F%25E5%25B9%2595%25E5%25BF%25AB%25E7%2585%25A7+2012-10-29+%25E4%25B8%258A%25E5%258D%258812.33.40.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" height="42" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjcmg1kr0b8uMlb1n0W5N5hxvZrFUA2f5buJImAJf1lKqgfXluJ0r-GRDukSydaVB8uGkh_ciEYUqritExS-uh6Tlntv_zvNSIM1r8PTTqdNSdx6FRuygXtn79lV6aqj_oAsZS1m21AKh4/s400/%25E5%25B1%258F%25E5%25B9%2595%25E5%25BF%25AB%25E7%2585%25A7+2012-10-29+%25E4%25B8%258A%25E5%258D%258812.33.40.png" width="400" /></a></div>
<div class="separator" style="clear: both; text-align: -webkit-auto;">
<br /></div>
<div class="separator" style="clear: both; text-align: -webkit-auto;">
memcached对内存的管理主要是对slabclass和slots链表的维护。slabclass_t结构如下:</div>
<div class="separator" style="clear: both; text-align: -webkit-auto;">
</div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEigp0KnOa7TWQf_KMhVGE6F1dkki390MvDzibEPfqWzRL9tN1y7zw1MZZ0HI_iDdWWSZvIOk9wqW5tetB4nNCWRvjlhUdxyYNlKvPMlag82A7RXfumIfldo3uzk3m9MJxHS2bnP5lFA_9s/s1600/%E5%B1%8F%E5%B9%95%E5%BF%AB%E7%85%A7+2012-10-29+%E4%B8%8A%E5%8D%881.12.28.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEigp0KnOa7TWQf_KMhVGE6F1dkki390MvDzibEPfqWzRL9tN1y7zw1MZZ0HI_iDdWWSZvIOk9wqW5tetB4nNCWRvjlhUdxyYNlKvPMlag82A7RXfumIfldo3uzk3m9MJxHS2bnP5lFA_9s/s1600/%E5%B1%8F%E5%B9%95%E5%BF%AB%E7%85%A7+2012-10-29+%E4%B8%8A%E5%8D%881.12.28.png" /></a></div>
<br />
<div class="separator" style="clear: both; text-align: -webkit-auto;">
下面从memcached初始化内存管理结构、add/delete操作分析相应的memcached代码。</div>
<div class="separator" style="clear: both; text-align: -webkit-auto;">
<br /></div>
<div class="separator" style="clear: both; text-align: -webkit-auto;">
<b>初始化内存管理结构</b></div>
<div class="separator" style="clear: both; text-align: -webkit-auto;">
<b><br /></b></div>
<div class="separator" style="clear: both; text-align: -webkit-auto;">
main函数中,调用slabs_init进行内存管理相关的初始化工作。slabs_init函数中,初始化数组长度为MAX_NUMBER_OF_SLAB_CLASSES的slabclass数组,对每个slab class,填入size和per slab值。size值为sizeof(item) + settings.chunk_size,即最小为96,默认factor为1.25,size若不足8的倍数则补齐。</div>
<div class="separator" style="clear: both; text-align: -webkit-auto;">
<br /></div>
<div class="separator" style="clear: both; text-align: -webkit-auto;">
初始化过程很简单,初始化完成后memcached也并没有真正从操作系统获取物理内存。</div>
<div class="separator" style="clear: both; text-align: -webkit-auto;">
<br /></div>
<div class="separator" style="clear: both; text-align: -webkit-auto;">
<b>add操作</b></div>
<div class="separator" style="clear: both; text-align: -webkit-auto;">
<b><br /></b></div>
<div class="separator" style="clear: both; text-align: -webkit-auto;">
拉起memcached服务后,memcached即对端口进行监听,等待请求的到来。在memcached客户端执行add操作后,函数调用过程如下:</div>
<div class="separator" style="clear: both; text-align: -webkit-auto;">
<br /></div>
<div class="separator" style="clear: both; text-align: -webkit-auto;">
event_handler -> drive_machine -> try_read_command -> process_command -> process_update_command -> item_alloc -> do_item_alloc -> slabs_alloc -> do_slabs_alloc</div>
<div class="separator" style="clear: both; text-align: -webkit-auto;">
<br /></div>
<div class="separator" style="clear: both; text-align: -webkit-auto;">
在do_item_alloc函数中,调用slabs_clsid函数,slabs_clsid根据添加的key的长度计算所要放置到的slabclass的下标,计算方法为遍历slabclass数组,将key长度与slabclass->size进行比较。</div>
<div class="separator" style="clear: both; text-align: -webkit-auto;">
<br /></div>
<div class="separator" style="clear: both; text-align: -webkit-auto;">
在do_slabs_alloc函数中,首先判断是否有空闲的chunk,即sl_curr值是否为零。如果sl_curr非零,则从slots中取chunk;否则调用do_slabs_newslab完成page申请。</div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjeLkK-klQ47PQJ3sp0ff8CH6dWc3Hv0VRkSgWy2IfqWQGJnyrDi_0aWeKEUmWKJArsd60kNrWPF4EDfBXafF0ccepsWXoX6UYzeArtCtuM7sc1RtESwTxDd1aq5LyEmdCJTeW-xwAg9uk/s1600/%E5%B1%8F%E5%B9%95%E5%BF%AB%E7%85%A7+2012-10-29+%E4%B8%8B%E5%8D%888.56.54.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjeLkK-klQ47PQJ3sp0ff8CH6dWc3Hv0VRkSgWy2IfqWQGJnyrDi_0aWeKEUmWKJArsd60kNrWPF4EDfBXafF0ccepsWXoX6UYzeArtCtuM7sc1RtESwTxDd1aq5LyEmdCJTeW-xwAg9uk/s1600/%E5%B1%8F%E5%B9%95%E5%BF%AB%E7%85%A7+2012-10-29+%E4%B8%8B%E5%8D%888.56.54.png" /></a></div>
<div class="separator" style="clear: both; text-align: -webkit-auto;">
do_slabs_newslab -> split_slab_page_into_freelist -> do_slabs_free</div>
<div class="separator" style="clear: both; text-align: -webkit-auto;">
<br /></div>
<div class="separator" style="clear: both; text-align: -webkit-auto;">
do_slabs_newslab调用memory_allocate从系统申请1M大小的内存,之后调用split_slab_page_into_freelist将1M内存分成等大的chunk,split_slab_page_into_freelist 对每块chunk调用do_slabs_free,将这些新生成的空闲chunk加入slots链表。完成以上动作,do_slabs_newslab函数中,将新申请的page加入slab_list数组。</div>
<div class="separator" style="clear: both; text-align: -webkit-auto;">
<br /></div>
<div class="separator" style="clear: both; text-align: -webkit-auto;">
完成空闲chunk申请后,在do_item_alloc函数中,将key、key长度、value等值填到chunk中:</div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgIVX55assL0pOngpdBmO5Nqc1rwbgeRd5_9kuMGBYPl1JiB1aWn46b2436mzGMGCmwINaa_aR0bO6Hgb7Ek-BzPVCfBcdych8W9r4eipLKpxYLDUs-Puvq74XTqbUGW3shGlFd_Qfy5_Y/s1600/%E5%B1%8F%E5%B9%95%E5%BF%AB%E7%85%A7+2012-10-29+%E4%B8%8B%E5%8D%889.55.12.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgIVX55assL0pOngpdBmO5Nqc1rwbgeRd5_9kuMGBYPl1JiB1aWn46b2436mzGMGCmwINaa_aR0bO6Hgb7Ek-BzPVCfBcdych8W9r4eipLKpxYLDUs-Puvq74XTqbUGW3shGlFd_Qfy5_Y/s1600/%E5%B1%8F%E5%B9%95%E5%BF%AB%E7%85%A7+2012-10-29+%E4%B8%8B%E5%8D%889.55.12.png" /></a></div>
<div class="separator" style="clear: both; text-align: -webkit-auto;">
<b>delete操作</b></div>
<div class="separator" style="clear: both; text-align: -webkit-auto;">
<b><br /></b></div>
<div class="separator" style="clear: both; text-align: -webkit-auto;">
对应delete操作,函数调用过程如下:</div>
<div class="separator" style="clear: both; text-align: -webkit-auto;">
event_handler -> drive_machine -> try_read_command -> process_command -> process_delete_command -> item_get -> do_item_get -> item_remove -> do_item_remove -> item_free -> slabs_free -> do_slabs_free</div>
<div class="separator" style="clear: both; text-align: -webkit-auto;">
<br /></div>
<div class="separator" style="clear: both; text-align: -webkit-auto;">
在do_slabs_free中,内存并不是真正归还系统,而是放到相应slab class的slots链表头部:</div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgSXKVCZH0xOgcIqmc3RZHM4Nx38cfqIDkkR-hE8NviHhvEvfJ_AJh6h65Ekk74IANr3kFV6GI1wTfvJpseOTYo3srETgRbYrLSXweOM0QUoxsCp4ye0wPW_3YS1KVdXbKVJIXSvultYCg/s1600/%E5%B1%8F%E5%B9%95%E5%BF%AB%E7%85%A7+2012-10-29+%E4%B8%8B%E5%8D%8810.09.52.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgSXKVCZH0xOgcIqmc3RZHM4Nx38cfqIDkkR-hE8NviHhvEvfJ_AJh6h65Ekk74IANr3kFV6GI1wTfvJpseOTYo3srETgRbYrLSXweOM0QUoxsCp4ye0wPW_3YS1KVdXbKVJIXSvultYCg/s1600/%E5%B1%8F%E5%B9%95%E5%BF%AB%E7%85%A7+2012-10-29+%E4%B8%8B%E5%8D%8810.09.52.png" /></a></div>
<div class="separator" style="clear: both; text-align: -webkit-auto;">
<br /></div>
<div class="separator" style="clear: both; text-align: -webkit-auto;">
Have fun!</div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgr3aCWBtHCgG2myJd2x8n7SKRXhqYujXMMStAp9n1_rMq9G2d4HwXToSN7L98JhG2S5-BSjZ1HAt0eoApZfNF1xr7OxFYOpES1NqivUlYXDR3GPQen2w65QI9bDS4HqVk1P2-edrF6zjc/s1600/%E5%B1%8F%E5%B9%95%E5%BF%AB%E7%85%A7+2012-10-29+%E4%B8%8B%E5%8D%8810.09.52.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><br /></a><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgr3aCWBtHCgG2myJd2x8n7SKRXhqYujXMMStAp9n1_rMq9G2d4HwXToSN7L98JhG2S5-BSjZ1HAt0eoApZfNF1xr7OxFYOpES1NqivUlYXDR3GPQen2w65QI9bDS4HqVk1P2-edrF6zjc/s1600/%E5%B1%8F%E5%B9%95%E5%BF%AB%E7%85%A7+2012-10-29+%E4%B8%8B%E5%8D%8810.09.52.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><br /></a><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgr3aCWBtHCgG2myJd2x8n7SKRXhqYujXMMStAp9n1_rMq9G2d4HwXToSN7L98JhG2S5-BSjZ1HAt0eoApZfNF1xr7OxFYOpES1NqivUlYXDR3GPQen2w65QI9bDS4HqVk1P2-edrF6zjc/s1600/%E5%B1%8F%E5%B9%95%E5%BF%AB%E7%85%A7+2012-10-29+%E4%B8%8B%E5%8D%8810.09.52.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><br /></a></div>
bangerleehttp://www.blogger.com/profile/00090060391197685879noreply@blogger.com0tag:blogger.com,1999:blog-6962690516396325668.post-12426199562364289712012-10-28T06:54:00.000-07:002012-10-28T07:04:26.364-07:00在vim中使用cscope快速查看代码使用vim+cscope,我们可以很方便地跟踪和查看代码。<br />
<div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;">
<br /></div>
<div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;">
安装cscope后,执行以下命令:</div>
<div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;">
<br /></div>
<div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;">
# cd /usr/src/linux</div>
<div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;">
# find . -name '*.h' -o -name '*.c' > cscope.files</div>
<div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;">
# cscope -b -k -q</div>
<div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;">
# ctags -R</div>
<div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;">
# echo 'cs add .' >> /etc/vimrc</div>
<br />
此后进入<span style="font-family: 'Times New Roman'; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px;">/usr/src/linux</span>,使用vim就支持代码跟踪了。<br />
<br />
执行 <span style="font-family: 'Times New Roman'; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px;"><span style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px;">:cs help</span></span> 可以显示cscope帮助信息:<br />
<br />
<div style="background-color: #f2f2f2; color: #3a3a3a; font: 13.0px Courier; margin: 0.0px 0.0px 0.0px 0.0px;">
cscope commands:</div>
<div style="background-color: #f2f2f2; color: #3a3a3a; font: 13.0px Courier; margin: 0.0px 0.0px 0.0px 0.0px;">
add : Add a new database (Usage: add file|dir [pre-path] [flags])</div>
<div style="background-color: #f2f2f2; color: #3a3a3a; font: 13.0px Courier; margin: 0.0px 0.0px 0.0px 0.0px;">
find : Query for a pattern (Usage: find c|d|e|f|g|i|s|t name)</div>
<div style="background-color: #f2f2f2; color: #3a3a3a; font: 13.0px Courier; margin: 0.0px 0.0px 0.0px 0.0px;">
c: Find functions calling this function</div>
<div style="background-color: #f2f2f2; color: #3a3a3a; font: 13.0px Courier; margin: 0.0px 0.0px 0.0px 0.0px;">
d: Find functions called by this function</div>
<div style="background-color: #f2f2f2; color: #3a3a3a; font: 13.0px Courier; margin: 0.0px 0.0px 0.0px 0.0px;">
e: Find this egrep pattern</div>
<div style="background-color: #f2f2f2; color: #3a3a3a; font: 13.0px Courier; margin: 0.0px 0.0px 0.0px 0.0px;">
f: Find this file</div>
<div style="background-color: #f2f2f2; color: #3a3a3a; font: 13.0px Courier; margin: 0.0px 0.0px 0.0px 0.0px;">
g: Find this definition</div>
<div style="background-color: #f2f2f2; color: #3a3a3a; font: 13.0px Courier; margin: 0.0px 0.0px 0.0px 0.0px;">
i: Find files #including this file</div>
<div style="background-color: #f2f2f2; color: #3a3a3a; font: 13.0px Courier; margin: 0.0px 0.0px 0.0px 0.0px;">
s: Find this C symbol</div>
<div style="background-color: #f2f2f2; color: #3a3a3a; font: 13.0px Courier; margin: 0.0px 0.0px 0.0px 0.0px;">
t: Find assignments to</div>
<div style="background-color: #f2f2f2; color: #3a3a3a; font: 13.0px Courier; margin: 0.0px 0.0px 0.0px 0.0px;">
help : Show this message (Usage: help)</div>
<div style="background-color: #f2f2f2; color: #3a3a3a; font: 13.0px Courier; margin: 0.0px 0.0px 0.0px 0.0px;">
kill : Kill a connection (Usage: kill #)</div>
<div style="background-color: #f2f2f2; color: #3a3a3a; font: 13.0px Courier; margin: 0.0px 0.0px 0.0px 0.0px;">
reset: Reinit all connections (Usage: reset)</div>
<div style="background-color: #f2f2f2; color: #3a3a3a; font: 13.0px Courier; margin: 0.0px 0.0px 0.0px 0.0px;">
show : Show connections (Usage: show)</div>
<br />
<br />
常用命令:<br />
<span style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px;"><span style="font-family: 'Times New Roman'; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px;">:cs find g hash</span></span> #查找hash函数或变量的定义<br />
<span style="font-family: 'Times New Roman'; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px;"><span style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px;">:cs find e hash<b> </b></span></span> #查找包含hash字段的代码行<br />
<br />
常用快捷键:<br />
<span style="font-family: 'Times New Roman'; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px;"><span style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px;">:ctrl + ] </span><strong style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px;"> </strong></span> #跳转到光标所在符号的定义位置<br />
<span style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px;"><span style="font-family: 'Times New Roman'; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px;">:ctrl +T<b> </b></span></span> #回到上一次的位置<br />
<br />
<span style="font-family: 'Times New Roman'; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px;">Have fun!</span><br />
<br />bangerleehttp://www.blogger.com/profile/00090060391197685879noreply@blogger.com0