{"componentChunkName":"component---src-pages-author-author-yaml-id-js","path":"/author/mark-duan/","result":{"data":{"allMarkdownRemark":{"edges":[{"node":{"id":"b90e979e-b048-5ef9-b24c-54baec7809f9","html":"<p> Index is a typical way to speed-up queries in normal database system. There is no difference between MongoDB and a document-based database system. This article gives insight about the index in MongoDB, for query optimization.</p>\n<h3 id=\"index-in-mongo\" style=\"position:relative;\"><a href=\"#index-in-mongo\" aria-label=\"index in mongo permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Index in Mongo:</h3>\n<h4 id=\"default\" style=\"position:relative;\"><a href=\"#default\" aria-label=\"default permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Default</h4>\n<p>_id is an ObjectId object, 12-byte BSON type that guarantees uniqueness within the collection. The ObjectId is generated based on timestamp, machine ID, process ID, and a process-local incremental counter.</p>\n<h4 id=\"single-field\" style=\"position:relative;\"><a href=\"#single-field\" aria-label=\"single field permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Single Field</h4>\n<p>For a single-field index and sort operations, the sort order (i.e. ascending or descending) of the index key does not matter because MongoDB can traverse the index in either direction. The value of index is the type of index. For example, 1 indicates ascending order and -1 specifies the descending order.</p>\n<pre class=\"grvsc-container dark-default-dark\" data-language=\"\" data-index=\"0\"><code class=\"grvsc-code\"><span class=\"grvsc-line\">db.friends.createIndex( { &quot;name&quot; : 1 } )</span></code></pre>\n<h4 id=\"\" style=\"position:relative;\"><a href=\"#\" aria-label=\" permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a></h4>\n<p>Compound Field</p>\n<p>The order of fields listed in a compound index has significance. For instance, if a compound index consists of { userid: 1, score: -1 }, the index sorts first by userid and then, within each userid value, sorts by score.</p>\n<pre class=\"grvsc-container dark-default-dark\" data-language=\"\" data-index=\"1\"><code class=\"grvsc-code\"><span class=\"grvsc-line\">db.products.createIndex( { &quot;item&quot;: 1, &quot;stock&quot;: 1 } )</span></code></pre>\n<h4 id=\"-1\" style=\"position:relative;\"><a href=\"#-1\" aria-label=\" 1 permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a></h4>\n<p>Multiple Key</p>\n<p>MongoDB uses multiple index to index the content in an array. MongoDB creates separate index entries for every element of the array. You do not need explicitly create multiple key.</p>\n<h4 id=\"text-index\" style=\"position:relative;\"><a href=\"#text-index\" aria-label=\"text index permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Text Index</h4>\n<p>A collection can have at most one text index.<br>\nPerformance cost for text index:<br>\ntext indexes can be large. They contain one index entry for each unique post-stemmed word in each indexed field for each document inserted.<br>\ntext indexes will impact insertion throughput because MongoDB must add an index entry for each unique post-stemmed word in each indexed field of each new source document.</p>\n<pre class=\"grvsc-container dark-default-dark\" data-language=\"\" data-index=\"2\"><code class=\"grvsc-code\"><span class=\"grvsc-line\">db.reviews.createIndex( { comments: &quot;text&quot; } )</span></code></pre>\n<h4 id=\"hash-index\" style=\"position:relative;\"><a href=\"#hash-index\" aria-label=\"hash index permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Hash index</h4>\n<p>Query content by its hashed value. The hash is a function to computed by its value. The hashed value is designed to be distinct value. The one advantage is it is so quick, which take O(1) at most but by contract the normal binary search tree will take O(Log(N)). Hash will be theoretically quicker than normal binary search tree implementation. But the disadvantage is hash index performing range search will be extremely slowly than normal index.</p>\n<p>This an example in python to build a hash index</p>\n<pre class=\"grvsc-container dark-default-dark\" data-language=\"\" data-index=\"3\"><code class=\"grvsc-code\"><span class=\"grvsc-line\">db.active.createIndex( { a: &quot;hashed&quot; } )</span></code></pre>\n<style class=\"grvsc-styles\">\n  .grvsc-container {\n    overflow: auto;\n    -webkit-overflow-scrolling: touch;\n    padding-top: 1rem;\n    padding-top: var(--grvsc-padding-top, var(--grvsc-padding-v, 1rem));\n    padding-bottom: 1rem;\n    padding-bottom: var(--grvsc-padding-bottom, var(--grvsc-padding-v, 1rem));\n    border-radius: 8px;\n    border-radius: var(--grvsc-border-radius, 8px);\n    font-feature-settings: normal;\n  }\n  \n  .grvsc-code {\n    display: inline-block;\n    min-width: 100%;\n  }\n  \n  .grvsc-line {\n    display: inline-block;\n    box-sizing: border-box;\n    width: 100%;\n    padding-left: 1.5rem;\n    padding-left: var(--grvsc-padding-left, var(--grvsc-padding-h, 1.5rem));\n    padding-right: 1.5rem;\n    padding-right: var(--grvsc-padding-right, var(--grvsc-padding-h, 1.5rem));\n  }\n  \n  .grvsc-line-highlighted {\n    background-color: var(--grvsc-line-highlighted-background-color, transparent);\n    box-shadow: inset var(--grvsc-line-highlighted-border-width, 4px) 0 0 0 var(--grvsc-line-highlighted-border-color, transparent);\n  }\n  \n  .dark-default-dark {\n    background-color: #1E1E1E;\n    color: #D4D4D4;\n  }\n</style>","frontmatter":{"title":"Index in MongoDB","author":{"id":"Mark Duan","github":null,"avatar":null},"date":"September 01, 2015","updated_date":null,"tags":["MongoDB","Database"],"coverImage":{"childImageSharp":{"fluid":{"aspectRatio":1.5037593984962405,"src":"/static/e5befb74a597194f6d10cdce76d28a00/58556/mongo-db-index1.webp","srcSet":"/static/e5befb74a597194f6d10cdce76d28a00/61e93/mongo-db-index1.webp 200w,\n/static/e5befb74a597194f6d10cdce76d28a00/1f5c5/mongo-db-index1.webp 400w,\n/static/e5befb74a597194f6d10cdce76d28a00/58556/mongo-db-index1.webp 800w,\n/static/e5befb74a597194f6d10cdce76d28a00/99238/mongo-db-index1.webp 1200w,\n/static/e5befb74a597194f6d10cdce76d28a00/b40dc/mongo-db-index1.webp 1278w","sizes":"(max-width: 800px) 100vw, 800px"}}}},"fields":{"authorId":"Mark Duan","slug":"/engineering/index-in-mongodb/"}}},{"node":{"id":"c0dc4a92-84b9-51a7-aa76-02b1ac827c77","html":"<p>As my previous blog, I use the python web Crawler library to help crawl the static website. For the Scrapy, there can be customize download middle ware, which can deal with static content in the website like JavaScript.</p>\n<p>However, the Scrapy already helps us with much of the underlying implementation, for example, it uses it own dispatcher and it has pipeline for dealing the parsing word after download.  One drawback for using such library is hard to deal with some strange bugs occurring because they run the paralleled jobs.</p>\n<p>For this tutorial, I want to show the structure of a simple and efficient web crawler.</p>\n<p>First of all, we need a scheduler, who can paralleled the job. Because the most of the time is on the requesting.  I use the  <a href=\"http://www.gevent.org/\">gevent</a> to schedule the jobs. Gevent uses the <a href=\"http://libevent.org/\">libevent</a> as its underlying library, which combines the multithreading and event-based techniques to parallel the job.</p>\n<p>There is the sample code:</p>\n<pre class=\"grvsc-container dark-default-dark\" data-language=\"python\" data-index=\"0\"><code class=\"grvsc-code\"><span class=\"grvsc-line\"><span class=\"mtk15\">import</span><span class=\"mtk1\"> gevent</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk15\">from</span><span class=\"mtk1\"> gevent </span><span class=\"mtk15\">import</span><span class=\"mtk1\"> Greenlet</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk15\">from</span><span class=\"mtk1\"> gevent </span><span class=\"mtk15\">import</span><span class=\"mtk1\"> monkey</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk15\">from</span><span class=\"mtk1\"> selenium </span><span class=\"mtk15\">import</span><span class=\"mtk1\"> webdriver</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">monkey.patch_socket()</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk4\">class</span><span class=\"mtk1\"> </span><span class=\"mtk10\">WebCrawler</span><span class=\"mtk1\">:</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">    </span><span class=\"mtk4\">def</span><span class=\"mtk1\"> </span><span class=\"mtk11\">__init__</span><span class=\"mtk1\">(</span><span class=\"mtk12\">self</span><span class=\"mtk1\">,</span><span class=\"mtk12\">urls</span><span class=\"mtk1\">=[],</span><span class=\"mtk12\">num_worker</span><span class=\"mtk1\"> = </span><span class=\"mtk7\">1</span><span class=\"mtk1\">):</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">        </span><span class=\"mtk4\">self</span><span class=\"mtk1\">.url_queue = Queue()</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">        </span><span class=\"mtk4\">self</span><span class=\"mtk1\">.num_worker = num_worker</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">    </span><span class=\"mtk4\">def</span><span class=\"mtk1\"> </span><span class=\"mtk11\">worker</span><span class=\"mtk1\">(</span><span class=\"mtk12\">self</span><span class=\"mtk1\">,</span><span class=\"mtk12\">pid</span><span class=\"mtk1\">):</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">        driver = </span><span class=\"mtk4\">self</span><span class=\"mtk1\">.initializeAnImegaDisabledDriver()  </span><span class=\"mtk3\">#initilize the webdirver</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk3\">#</span><span class=\"mtk4\">TODO</span><span class=\"mtk3\"> catch the exception</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">        </span><span class=\"mtk15\">while</span><span class=\"mtk1\"> </span><span class=\"mtk4\">not</span><span class=\"mtk1\"> </span><span class=\"mtk4\">self</span><span class=\"mtk1\">.url_queue.empty():</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">            url = </span><span class=\"mtk4\">self</span><span class=\"mtk1\">.url_queue.get()</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">            </span><span class=\"mtk4\">self</span><span class=\"mtk1\">.driver.get(url)</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">            elem = </span><span class=\"mtk4\">self</span><span class=\"mtk1\">.driver.find_elements_by_xpath(</span><span class=\"mtk8\">&quot;//script | //iframe | //img&quot;</span><span class=\"mtk1\">) </span><span class=\"mtk3\"># get such element from webpage</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">    </span><span class=\"mtk4\">def</span><span class=\"mtk1\"> </span><span class=\"mtk11\">run</span><span class=\"mtk1\">(</span><span class=\"mtk12\">self</span><span class=\"mtk1\">):</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">        jobs = [gevent.spawn(</span><span class=\"mtk4\">self</span><span class=\"mtk1\">.worker,i) </span><span class=\"mtk15\">for</span><span class=\"mtk1\"> i </span><span class=\"mtk4\">in</span><span class=\"mtk1\"> </span><span class=\"mtk12\">xrange</span><span class=\"mtk1\">(</span><span class=\"mtk4\">self</span><span class=\"mtk1\">.num_worker)]</span></span></code></pre>\n<p>The next part is the headless browser part. I use the phantomjs with <code>--webdriver=4444 --disk-cache=true --ignore-ssl-errors=true --load-images=false --max-disk-cache-size=100000</code>. You can get the detailed option from their documents.</p>\n<p>Phantomjs uses selenium webdriver as front-end to handle the request. However phantomjs is using the webkit and QT as its underlying browser and controller. It has memory leak bugs therefore the phantomjs will consume ton of memory and it only can use one core of your CPU but you can deploy many instances of the phantomjs on different ports. I wrote a daemon process to monitor the memory and its situation but later I realize I can use Perl script to get the status of process and when it exceeds the limits like 1G memory and send kill signal to the process.</p>\n<p>To speed up the crawler, I choose to use static browser to verify the website first because the website is bad written, there might be deadlock occurring so just skip them.</p>\n<style class=\"grvsc-styles\">\n  .grvsc-container {\n    overflow: auto;\n    -webkit-overflow-scrolling: touch;\n    padding-top: 1rem;\n    padding-top: var(--grvsc-padding-top, var(--grvsc-padding-v, 1rem));\n    padding-bottom: 1rem;\n    padding-bottom: var(--grvsc-padding-bottom, var(--grvsc-padding-v, 1rem));\n    border-radius: 8px;\n    border-radius: var(--grvsc-border-radius, 8px);\n    font-feature-settings: normal;\n  }\n  \n  .grvsc-code {\n    display: inline-block;\n    min-width: 100%;\n  }\n  \n  .grvsc-line {\n    display: inline-block;\n    box-sizing: border-box;\n    width: 100%;\n    padding-left: 1.5rem;\n    padding-left: var(--grvsc-padding-left, var(--grvsc-padding-h, 1.5rem));\n    padding-right: 1.5rem;\n    padding-right: var(--grvsc-padding-right, var(--grvsc-padding-h, 1.5rem));\n  }\n  \n  .grvsc-line-highlighted {\n    background-color: var(--grvsc-line-highlighted-background-color, transparent);\n    box-shadow: inset var(--grvsc-line-highlighted-border-width, 4px) 0 0 0 var(--grvsc-line-highlighted-border-color, transparent);\n  }\n  \n  .dark-default-dark {\n    background-color: #1E1E1E;\n    color: #D4D4D4;\n  }\n  .dark-default-dark .mtk15 { color: #C586C0; }\n  .dark-default-dark .mtk1 { color: #D4D4D4; }\n  .dark-default-dark .mtk4 { color: #569CD6; }\n  .dark-default-dark .mtk10 { color: #4EC9B0; }\n  .dark-default-dark .mtk11 { color: #DCDCAA; }\n  .dark-default-dark .mtk12 { color: #9CDCFE; }\n  .dark-default-dark .mtk7 { color: #B5CEA8; }\n  .dark-default-dark .mtk3 { color: #6A9955; }\n  .dark-default-dark .mtk8 { color: #CE9178; }\n</style>","frontmatter":{"title":"Write a highly efficient python Web Crawler","author":{"id":"Mark Duan","github":null,"avatar":null},"date":"July 14, 2015","updated_date":null,"tags":["Python","Coding"],"coverImage":{"childImageSharp":{"fluid":{"aspectRatio":1,"src":"/static/c068d4f41cc4f2a33e3bd2737ca105df/7fbdd/python-web-crawler.webp","srcSet":"/static/c068d4f41cc4f2a33e3bd2737ca105df/61e93/python-web-crawler.webp 200w,\n/static/c068d4f41cc4f2a33e3bd2737ca105df/1f5c5/python-web-crawler.webp 400w,\n/static/c068d4f41cc4f2a33e3bd2737ca105df/7fbdd/python-web-crawler.webp 610w","sizes":"(max-width: 610px) 100vw, 610px"}}}},"fields":{"authorId":"Mark Duan","slug":"/engineering/write-a-highly-efficient-python-web-crawler/"}}},{"node":{"id":"177215ed-17df-5704-99e4-6260d06c47fb","html":"<p>The  <a href=\"https://github.com/memcached\">memcached</a> is one of the most popular open source on-memory key-value caching systems. I will briefly talk about the design of memory management of memcached.</p>\n<h3 id=\"chunk-and-slab\" style=\"position:relative;\"><a href=\"#chunk-and-slab\" aria-label=\"chunk and slab permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Chunk and Slab</h3>\n<p><img src=\"/361a88243c7d3be246f1e738d71cc778/memcached1.webp\" alt=\"memcached-1\"></p>\n<pre class=\"grvsc-container dark-default-dark\" data-language=\"cpp\" data-index=\"0\"><code class=\"grvsc-code\"><span class=\"grvsc-line\"><span class=\"mtk3\">/* powers-of-N allocation structures */</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk4\">typedef</span><span class=\"mtk1\"> </span><span class=\"mtk4\">struct</span><span class=\"mtk1\"> {</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">    </span><span class=\"mtk4\">unsigned</span><span class=\"mtk1\"> </span><span class=\"mtk4\">int</span><span class=\"mtk1\"> size;</span><span class=\"mtk3\">      /* sizes of items */</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">    </span><span class=\"mtk4\">unsigned</span><span class=\"mtk1\"> </span><span class=\"mtk4\">int</span><span class=\"mtk1\"> perslab;</span><span class=\"mtk3\">   /* how many items per slab */</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">    </span><span class=\"mtk4\">void</span><span class=\"mtk1\"> *slots;</span><span class=\"mtk3\">           /* list of item ptrs */</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">    </span><span class=\"mtk4\">unsigned</span><span class=\"mtk1\"> </span><span class=\"mtk4\">int</span><span class=\"mtk1\"> sl_curr;</span><span class=\"mtk3\">   /* total free items in list */</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">    </span><span class=\"mtk4\">unsigned</span><span class=\"mtk1\"> </span><span class=\"mtk4\">int</span><span class=\"mtk1\"> slabs;</span><span class=\"mtk3\">     /* how many slabs were allocated for this class */</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">    </span><span class=\"mtk4\">void</span><span class=\"mtk1\"> **slab_list;</span><span class=\"mtk3\">       /* array of slab pointers */</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">    </span><span class=\"mtk4\">unsigned</span><span class=\"mtk1\"> </span><span class=\"mtk4\">int</span><span class=\"mtk1\"> list_size;</span><span class=\"mtk3\"> /* size of prev array */</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">    </span><span class=\"mtk4\">unsigned</span><span class=\"mtk1\"> </span><span class=\"mtk4\">int</span><span class=\"mtk1\"> killing;</span><span class=\"mtk3\">  /* index+1 of dying slab, or zero if none */</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">    </span><span class=\"mtk4\">size_t</span><span class=\"mtk1\"> requested;</span><span class=\"mtk3\"> /* The number of requested bytes */</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">} </span><span class=\"mtk10\">slabclass_t</span><span class=\"mtk1\">;</span></span></code></pre>\n<p>This is the struct declaration of slabclass_t. Each slab class contains the same size of chunk, but different classes have different chunk sizes. The size is calculated by this algorithm:</p>\n<pre class=\"grvsc-container dark-default-dark\" data-language=\"cpp\" data-index=\"1\"><code class=\"grvsc-code\"><span class=\"grvsc-line\"><span class=\"mtk1\">&lt;!--</span><span class=\"mtk15\">while</span><span class=\"mtk1\"> (++i &lt; POWER_LARGEST && size </span></span></code></pre>\n<p>The content value factor is defined when memcached memory is deployed with -f, which can change the size between slab classes. For this loop, the size is multiplied by a specified factor. </p>\n<p><img src=\"/af60c92dc1627ea9224338bbf10df1a8/memcached-2.webp\" alt=\"memcached-2\"></p>\n<p>You can get this information by adding -vvv and you can use the command</p>\n<pre class=\"grvsc-container dark-default-dark\" data-language=\"cpp\" data-index=\"2\"><code class=\"grvsc-code\"><span class=\"grvsc-line\"><span class=\"mtk1\">stats slabs under telnet connection ip port</span></span></code></pre>\n<p>Every time when a new memory needs to be allocated. It will scan the slab class to find the most suitable class to store the chunk.</p>\n<h3 id=\"rebalance-and-reassign-slab-memory\" style=\"position:relative;\"><a href=\"#rebalance-and-reassign-slab-memory\" aria-label=\"rebalance and reassign slab memory permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Rebalance and reassign slab memory:</h3>\n<p>From memcached’s wiki:</p>\n<p><strong>Overview</strong>: Memcached 1.4.11. Fixes race conditions and crashes introduced in 1.4.10. Adds the ability to rebalance and reassign slab memory.</p>\n<h4 id=\"slab-reassignment\" style=\"position:relative;\"><a href=\"#slab-reassignment\" aria-label=\"slab reassignment permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Slab Reassignment</h4>\n<p>Long running instances of memcached memory may run into an issue where all available memory has been assigned to a specific slab class (say items of roughly size 100 bytes). Later the application starts storing more of its data into a different slab class (items around 200 bytes). Memcached could not use the 100 byte chunks to satisfy the 200 byte requests, and thus you would be able to store very few 200 byte items.</p>\n<p>1.4.11 introduces the ability to reassign slab pages. This is a <strong>beta</strong> feature and the commands may change for the next few releases, so <strong>please</strong> keep this in mind. When the commands are finalized they will be noted in the release notes</p>\n<h4 id=\"slab-automove\" style=\"position:relative;\"><a href=\"#slab-automove\" aria-label=\"slab automove permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Slab Automove</h4>\n<p>While slab reassign is a manual feature, there is also the start of an automatic memory reassignment algorithm.</p>\n<p>From the source code in <a href=\"https://github.com/memcached/memcached/blob/master/slabs.c#L232\">slabs.c</a> we can see, memcached uses two threads to monitor the slabs class, one is to do maintenance and another one is to do the re-balance the class.</p>\n<p>Memcached defines a global variable <em>struct slab\\</em>rebalance slab_rebal,_ which is used to store the start, end information of slab. s_clsid is the source slab id and d_clsid is the destination slab id. The detailed blog in Chinese <a href=\"http://blog.chinaunix.net/uid-27767798-id-3404133.html\">memcached源码分析—–slab automove和slab rebalance</a> could be helpful.</p>\n<pre class=\"grvsc-container dark-default-dark\" data-language=\"cpp\" data-index=\"3\"><code class=\"grvsc-code\"><span class=\"grvsc-line\"><span class=\"mtk4\">struct</span><span class=\"mtk1\"> </span><span class=\"mtk10\">slab_rebalance</span><span class=\"mtk1\"> {</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">    </span><span class=\"mtk4\">void</span><span class=\"mtk1\"> *slab_start;</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">    </span><span class=\"mtk4\">void</span><span class=\"mtk1\"> *slab_end;</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">    </span><span class=\"mtk4\">void</span><span class=\"mtk1\"> *slab_pos;</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">    </span><span class=\"mtk4\">int</span><span class=\"mtk1\"> s_clsid;</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">    </span><span class=\"mtk4\">int</span><span class=\"mtk1\"> d_clsid;</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">    </span><span class=\"mtk4\">int</span><span class=\"mtk1\"> busy_items;</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">    </span><span class=\"mtk4\">uint8_t</span><span class=\"mtk1\"> done;</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">};</span></span></code></pre>\n<h3 id=\"memory-pool\" style=\"position:relative;\"><a href=\"#memory-pool\" aria-label=\"memory pool permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Memory Pool</h3>\n<p>Memcached implements its own memory pool, which is used to avoid system memory allocation and memory fragmentation. That will make your memory efficient and easy to manage. Here is a demo implementation of memory pool. Basically it is a large pre-allocated chunk of memory.</p>\n<pre class=\"grvsc-container dark-default-dark\" data-language=\"cpp\" data-index=\"4\"><code class=\"grvsc-code\"><span class=\"grvsc-line\"><span class=\"mtk1\"> mem_avail) {</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">        </span><span class=\"mtk15\">return</span><span class=\"mtk1\"> </span><span class=\"mtk4\">NULL</span><span class=\"mtk1\">;</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">    }</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk3\">    /* mem_current pointer _must_ be aligned!!! */</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">    </span><span class=\"mtk15\">if</span><span class=\"mtk1\"> (size % CHUNK_ALIGN_BYTES) {</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">        size += CHUNK_ALIGN_BYTES - (size % CHUNK_ALIGN_BYTES);</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">    }</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">    mem_current = ((</span><span class=\"mtk4\">char</span><span class=\"mtk1\">*)mem_current) + size;</span></span>\n<span class=\"grvsc-line\"><span class=\"mtk1\">    </span><span class=\"mtk15\">if</span><span class=\"mtk1\"> (size </span></span></code></pre>\n<style class=\"grvsc-styles\">\n  .grvsc-container {\n    overflow: auto;\n    -webkit-overflow-scrolling: touch;\n    padding-top: 1rem;\n    padding-top: var(--grvsc-padding-top, var(--grvsc-padding-v, 1rem));\n    padding-bottom: 1rem;\n    padding-bottom: var(--grvsc-padding-bottom, var(--grvsc-padding-v, 1rem));\n    border-radius: 8px;\n    border-radius: var(--grvsc-border-radius, 8px);\n    font-feature-settings: normal;\n  }\n  \n  .grvsc-code {\n    display: inline-block;\n    min-width: 100%;\n  }\n  \n  .grvsc-line {\n    display: inline-block;\n    box-sizing: border-box;\n    width: 100%;\n    padding-left: 1.5rem;\n    padding-left: var(--grvsc-padding-left, var(--grvsc-padding-h, 1.5rem));\n    padding-right: 1.5rem;\n    padding-right: var(--grvsc-padding-right, var(--grvsc-padding-h, 1.5rem));\n  }\n  \n  .grvsc-line-highlighted {\n    background-color: var(--grvsc-line-highlighted-background-color, transparent);\n    box-shadow: inset var(--grvsc-line-highlighted-border-width, 4px) 0 0 0 var(--grvsc-line-highlighted-border-color, transparent);\n  }\n  \n  .dark-default-dark {\n    background-color: #1E1E1E;\n    color: #D4D4D4;\n  }\n  .dark-default-dark .mtk3 { color: #6A9955; }\n  .dark-default-dark .mtk4 { color: #569CD6; }\n  .dark-default-dark .mtk1 { color: #D4D4D4; }\n  .dark-default-dark .mtk10 { color: #4EC9B0; }\n  .dark-default-dark .mtk15 { color: #C586C0; }\n</style>","frontmatter":{"title":"Memcached Memory Management","author":{"id":"Mark Duan","github":null,"avatar":null},"date":"July 07, 2015","updated_date":null,"tags":["Memory Management"],"coverImage":{"childImageSharp":{"fluid":{"aspectRatio":1,"src":"/static/283a834484b5b25cf8471e04a8e2ff1b/7fbdd/memcached-memory-management.webp","srcSet":"/static/283a834484b5b25cf8471e04a8e2ff1b/61e93/memcached-memory-management.webp 200w,\n/static/283a834484b5b25cf8471e04a8e2ff1b/1f5c5/memcached-memory-management.webp 400w,\n/static/283a834484b5b25cf8471e04a8e2ff1b/7fbdd/memcached-memory-management.webp 610w","sizes":"(max-width: 610px) 100vw, 610px"}}}},"fields":{"authorId":"Mark Duan","slug":"/engineering/memcach-memory-management/"}}}]},"authorYaml":{"id":"Mark Duan","bio":null,"github":null,"stackoverflow":null,"linkedin":null,"medium":null,"twitter":null,"avatar":null}},"pageContext":{"id":"Mark Duan","__params":{"id":"mark-duan"}}},"staticQueryHashes":["1171199041","1384082988","2100481360","23180105","528864852"]}