{"id":283,"date":"2010-05-20T19:17:53","date_gmt":"2010-05-20T18:17:53","guid":{"rendered":"http:\/\/www.hlynes.com\/?p=283"},"modified":"2010-05-20T19:19:21","modified_gmt":"2010-05-20T18:19:21","slug":"stepping-through-large-database-tables-in-python","status":"publish","type":"post","link":"http:\/\/www.hlynes.com\/?p=283","title":{"rendered":"Stepping Through Large Database Tables in Python"},"content":{"rendered":"<p>In order to report usage on our PBSPro compute cluster at work I wrote a simple set of python scripts to dump the accounting information into a MySQL database. This has been working fine for the last year churning out reports every month.<br \/>\nThis week I had cause to generate some statics aggregated across the whole three years of the data in the database. I&#8217;m using a mixture of Elixir and SQLalchemy to talk to the database. Normally I would do something like this:<\/p>\n<pre class=\"brush: python; title: ; notranslate\" title=\"\">\r\nmybigtablequery = MyBigTable.query()\r\n\r\nfor job in mybigtablequery:\r\n    if job.attribute = &quot;thing&quot;:\r\n        dosomething()\r\n<\/pre>\n<p>Which worked fine when the database was quite small. I was horrified to see that as this loop went on my python process used more and more memory because the database connection object never throws away a row once it has been loaded. Fortunately I found an <a href=\"http:\/\/stackoverflow.com\/questions\/1145905\/scanning-huge-tables-with-sqlalchemy-using-the-orm\/1217947#1217947\">answer<\/a> on stackoverflow.<br \/>\nSo I ended up doing the following:<\/p>\n<pre class=\"brush: python; title: ; notranslate\" title=\"\">\r\ndef batch_query(query,batch=10000):\r\n    offset = 0\r\n    while True:\r\n        for elem in query.limit(batch).offset(offset):\r\n            r = True\r\n            yield elem\r\n        offset += batch\r\n        if not r:\r\n            break\r\n        r = False\r\n\r\nmybigtablequery = MyBigTable.query()\r\n\r\nfor job in batch_query(mybigtablequery,50000):\r\n    if job.attribute = &quot;thing&quot;:\r\n        dosomething()\r\n<\/pre>\n<p>&#8220;batch&#8221; is just an integer defining how many rows will be fetched by each query. The larger this is the more memory the python interpreter will use but the more efficiently the code will run.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>In order to report usage on our PBSPro compute cluster at work I wrote a simple set of python scripts to dump the accounting information into a MySQL database. This has been working fine for the last year churning out reports every month. This week I had cause to generate some statics aggregated across the &hellip; <a href=\"http:\/\/www.hlynes.com\/?p=283\" class=\"more-link\">Continue reading <span class=\"screen-reader-text\">Stepping Through Large Database Tables in Python<\/span> <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[9],"tags":[],"class_list":["post-283","post","type-post","status-publish","format-standard","hentry","category-tech"],"_links":{"self":[{"href":"http:\/\/www.hlynes.com\/index.php?rest_route=\/wp\/v2\/posts\/283"}],"collection":[{"href":"http:\/\/www.hlynes.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/www.hlynes.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/www.hlynes.com\/index.php?rest_route=\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"http:\/\/www.hlynes.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=283"}],"version-history":[{"count":8,"href":"http:\/\/www.hlynes.com\/index.php?rest_route=\/wp\/v2\/posts\/283\/revisions"}],"predecessor-version":[{"id":291,"href":"http:\/\/www.hlynes.com\/index.php?rest_route=\/wp\/v2\/posts\/283\/revisions\/291"}],"wp:attachment":[{"href":"http:\/\/www.hlynes.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=283"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/www.hlynes.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=283"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/www.hlynes.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=283"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}