{"id":127,"date":"2019-12-06T12:32:07","date_gmt":"2019-12-06T11:32:07","guid":{"rendered":"https:\/\/blog.nubisoft.pl\/?p=127"},"modified":"2020-01-14T20:49:46","modified_gmt":"2020-01-14T19:49:46","slug":"the-concept-of-map-reduce-computing-approach-on-example-of-python-declarative-vs-imperative-iteration","status":"publish","type":"post","link":"https:\/\/nubisoft.io\/blog\/the-concept-of-map-reduce-computing-approach-on-example-of-python-declarative-vs-imperative-iteration\/","title":{"rendered":"The concept of map-reduce computing approach on example of Python (declarative vs imperative iteration)"},"content":{"rendered":"\n<h2 class=\"wp-block-heading\"> This post is a St. Nicholas&#8217; Day gift for one of my students who asked to explain the concept of map-reduce used to solve computationally complex problems. I will do this, on the example, using Python language, which is known for its brevity. At the same time, I will be happy to test a new plug-in installed in our WordPress that allows you to display source codes nicer &#8211; namely Enlighter \ud83d\ude09<\/h2>\n\n\n\n<p> Map-reduce is a concept inherent in parallel data processing. Its application consists in dividing the computational problem into smaller sub-problems, solving these smaller sub-problems, and then merging these solutions into one solution that solves the initial problem. <\/p>\n\n\n\n<p> Perhaps the best-known implementations of the above problem are the <a href=\"https:\/\/hadoop.apache.org\/\">Apache Hadoop<\/a> and the slightly younger <a href=\"https:\/\/spark.apache.org\/\">Apache Spark<\/a> frameworks.  Here, however, the concept will be presented in the simplest possible way using Python programming language.<\/p>\n\n\n\n<p> Let&#8217;s assume that we have a large data set &#8211; e.g. a set of feet size given in inch that our customers wear and we want to convert it to a more commonly used metric.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">feet_size = [(\"Alan\",17),(\"Bob\",18),(\"Carol\",14),(\"Dean\",15),(\"Elvis\",16)]<\/pre>\n\n\n\n<p>Then we define the conversion function using lambda syntax.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">to_shoe_size = lambda data:(data[0],3\/2.0 * (data[1] + 1.5))<\/pre>\n\n\n\n<p>Now, we compute and display a converted dataset.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">shoe_size = list(map(to_shoe_size, feet_size))\nprint(shoe_size)<\/pre>\n\n\n\n<p>And we get the following result.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">[('Alan', 27.75), ('Bob', 29.25), ('Carol', 23.25), ('Dean', 24.75), ('Elvis', 26.25)]  <\/pre>\n\n\n\n<p>Let us filter, for example, all records having value above the average.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">import statistics\navg = statistics.mean(map(lambda x: x[1], shoe_size))\nabove_avg_shoe_size = filter(lambda x: x[1] > avg, shoe_size)\nprint(list(above_avg_shoe_size))<\/pre>\n\n\n\n<p>And we get the following result.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">[('Alan', 27.75), ('Bob', 29.25)]<\/pre>\n\n\n\n<p>Now, if we want to find the first smallest value that is higher than average we use the reduce function as below.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">from functools import reduce\nmin_above_avg = reduce(lambda x,y: min( x[1],y[1]), filter(lambda x: x[1] > avg, shoe_size))\nprint(min_above_avg)<\/pre>\n\n\n\n<p>And we get the following result.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">27.75 <\/pre>\n\n\n\n<p><strong>That&#8217;s all. And what do you think about the above-described constructions? Are they more readable and self-explaining comparing to loop iterators or quite contrary? <\/strong><\/p>\n\n\n\n<h2 class=\"wp-block-heading\"> Another view angles  <\/h2>\n\n\n\n<ol class=\"wp-block-list\"><li>Worth noting here, that in Python3+, reduce is not a builtin function but is moved to functools module. This is intended and is explained by far more readability of classic for-loop construction. <a href=\"https:\/\/www.artima.com\/weblogs\/viewpost.jsp?thread=98196\">Here<\/a> you can find a more detailed explanation conducted by the creator of Python, while the specification of reduce function and description of the way how to substitute it with for-loop can be found <a href=\"https:\/\/thepythonguru.com\/python-builtin-functions\/reduce\/\">here<\/a>.<\/li><li>Both map and reduce functions used in the above example are NOT multiprocessing and have been used for illustrative purposes. However, there exist Python extensions to enable parallel computing using a similar map-reduce model. <\/li><\/ol>\n","protected":false},"excerpt":{"rendered":"<p>This post is a St. Nicholas&#8217; Day gift for one of my students who asked to explain the concept of map-reduce used to solve computationally complex problems. I will do this, on the example, using Python language, which is known for its brevity. At the same time, I will be happy to test a new [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":128,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_case_study_excerpt":"","footnotes":""},"categories":[2,3],"tags":[36,7],"class_list":["post-127","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-data-science","category-software-development","tag-mapreduce","tag-python"],"_links":{"self":[{"href":"https:\/\/nubisoft.io\/blog\/wp-json\/wp\/v2\/posts\/127","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/nubisoft.io\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/nubisoft.io\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/nubisoft.io\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/nubisoft.io\/blog\/wp-json\/wp\/v2\/comments?post=127"}],"version-history":[{"count":10,"href":"https:\/\/nubisoft.io\/blog\/wp-json\/wp\/v2\/posts\/127\/revisions"}],"predecessor-version":[{"id":196,"href":"https:\/\/nubisoft.io\/blog\/wp-json\/wp\/v2\/posts\/127\/revisions\/196"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/nubisoft.io\/blog\/wp-json\/wp\/v2\/media\/128"}],"wp:attachment":[{"href":"https:\/\/nubisoft.io\/blog\/wp-json\/wp\/v2\/media?parent=127"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/nubisoft.io\/blog\/wp-json\/wp\/v2\/categories?post=127"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/nubisoft.io\/blog\/wp-json\/wp\/v2\/tags?post=127"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}