Browse code

reword some sections of the custom loss section

Also update references to the loss functions defined in adaptive.

Joseph Weston authored on 19/02/2018 18:39:18
Showing 1 changed files
... ...
@@ -448,22 +448,22 @@
448 448
    "cell_type": "markdown",
449 449
    "metadata": {},
450 450
    "source": [
451
-    "# Custom point choosing logic for 1D and 2D"
451
+    "# Custom adaptive logic for 1D and 2D"
452 452
    ]
453 453
   },
454 454
   {
455 455
    "cell_type": "markdown",
456 456
    "metadata": {},
457 457
    "source": [
458
-    "The `Learner1D` and `Learner2D` implement a certain logic for chosing points based on the existing data.\n",
458
+    "`Learner1D` and `Learner2D` both work on the principle of subdividing their domain into subdomains, and assigning a property to each subdomain, which we call the *loss*. The algorithm for choosing the best place to evaluate our function is then simply *take the subdomain with the largest loss and add a point in the center, creating new subdomains around this point*. \n",
459 459
     "\n",
460
-    "For some functions this default stratagy might not work, for example you'll run into trouble when you learn functions that contain divergencies.\n",
460
+    "The *loss function* that defines the loss per subdomain is the canonical place to define what regions of the domain are \"interesting\".\n",
461
+    "The default loss function for `Learner1D` and `Learner2D` is sufficient for a wide range of common cases, but it is by no means a panacea. For example, the default loss function will tend to get stuck on divergences.\n",
461 462
     "\n",
462
-    "Both the `Learner1D` and `Learner2D` allow you to use a custom loss function, which you specify as an argument in the learner. See the doc-string of `Learner1D` and `Learner2D` to see what `loss_per_interval` and `loss_per_triangle` need to return and take as input.\n",
463
+    "Both the `Learner1D` and `Learner2D` allow you to specify a *custom loss function*. Below we illustrate how you would go about writing your own loss function. The documentation for `Learner1D` and `Learner2D` specifies the signature that your loss function needs to have in order for it to work with `adaptive`.\n",
463 464
     "\n",
464
-    "As an example we implement a homogeneous sampling strategy (which of course is not the best way of handling divergencies).\n",
465 465
     "\n",
466
-    "Note that both these loss functions are also available from `adaptive.learner.learner1d.uniform_sampling` and `adaptive.learner.learner2d.uniform_sampling`."
466
+    "Say we want to properly sample a function that contains divergences. A simple (but naive) strategy is to *uniformly* sample the domain:\n"
467 467
    ]
468 468
   },
469 469
   {
... ...
@@ -473,6 +473,7 @@
473 473
    "outputs": [],
474 474
    "source": [
475 475
     "def uniform_sampling_1d(interval, scale, function_values):\n",
476
+    "    # Note that we never use 'function_values'; the loss is just the size of the subdomain\n",
476 477
     "    x_left, x_right = interval\n",
477 478
     "    x_scale, _ = scale\n",
478 479
     "    dx = (x_right - x_left) / x_scale\n",
... ...
@@ -492,7 +493,9 @@
492 493
    "metadata": {},
493 494
    "outputs": [],
494 495
    "source": [
495
-    "%%opts EdgePaths (color='w')\n",
496
+    "%%opts EdgePaths (color='w') Image [logz=True]\n",
497
+    "\n",
498
+    "from adaptive.runner import SequentialExecutor\n",
496 499
     "\n",
497 500
     "def uniform_sampling_2d(ip):\n",
498 501
     "    from adaptive.learner.learner2D import areas\n",
... ...
@@ -504,18 +507,32 @@
504 507
     "    return 1 / (x**2 + y**2)\n",
505 508
     "\n",
506 509
     "learner = adaptive.Learner2D(f_divergent_2d, [(-1, 1), (-1, 1)], loss_per_triangle=uniform_sampling_2d)\n",
507
-    "runner = adaptive.BlockingRunner(learner, goal=lambda l: l.loss() < 0.02)\n",
508
-    "learner.plot(tri_alpha=0.3)"
510
+    "\n",
511
+    "# this takes a while, so use the async Runner so we know *something* is happening\n",
512
+    "runner = adaptive.Runner(learner, goal=lambda l: l.loss() < 0.02)\n",
513
+    "runner.live_info()\n",
514
+    "runner.live_plot(update_interval=0.2,\n",
515
+    "                 plotter=lambda l: l.plot(tri_alpha=0.3).relabel('1 / (x^2 + y^2) in log scale'))"
516
+   ]
517
+  },
518
+  {
519
+   "cell_type": "markdown",
520
+   "metadata": {},
521
+   "source": [
522
+    "The uniform sampling strategy is a common case to benchmark against, so the 1D and 2D versions are included in `adaptive` as `adaptive.learner.learner1D.uniform_sampling` and `adaptive.learner.learner2D.uniform_sampling`."
509 523
    ]
510 524
   },
511 525
   {
512 526
    "cell_type": "markdown",
513 527
    "metadata": {},
514 528
    "source": [
515
-    "#### Doing better\n",
516
-    "Of course we can improve on the the above result, since just homogeneous sampling is usually the dumbest way to sample.\n",
529
+    "### Doing better\n",
530
+    "\n",
531
+    "Of course, using `adaptive` for uniform sampling is a bit of a waste!\n",
532
+    "\n",
533
+    "Let's see if we can do a bit better. Below we define a loss per subdomain that scales with the degree of nonlinearity of the function (this is very similar to the default loss function for `Learner2D`), but which is 0 for subdomains smaller than a certain area, and infinite for subdomains larger than a certain area.\n",
517 534
     "\n",
518
-    "The loss function (slightly more general version) below is available as `adaptive.learner.learner2D.resolution_loss`."
535
+    "A loss defined in this way means that the adaptive algorithm will first prioritise subdomains that are too large (infinite loss). After all subdomains are appropriately small it will prioritise places where the function is very nonlinear, but will ignore subdomains that are too small (0 loss)."
519 536
    ]
520 537
   },
521 538
   {
... ...
@@ -529,13 +546,17 @@
529 546
     "def resolution_loss(ip, min_distance=0, max_distance=1):\n",
530 547
     "    \"\"\"min_distance and max_distance should be in between 0 and 1\n",
531 548
     "    because the total area is normalized to 1.\"\"\"\n",
549
+    "\n",
532 550
     "    from adaptive.learner.learner2D import areas, deviations\n",
551
+    "\n",
533 552
     "    A = areas(ip)\n",
534 553
     "\n",
535
-    "    # `deviations` returns an array of the same length as the\n",
536
-    "    # vector your function to be learned returns, so 1 in this case.\n",
537
-    "    # Its value represents the deviation from the linear estimate based\n",
538
-    "    # on the gradients inside each triangle.\n",
554
+    "    # 'deviations' returns an array of shape '(n, len(ip))', where\n",
555
+    "    # 'n' is the  is the dimension of the output of the learned function\n",
556
+    "    # In this case we know that the learned function returns a scalar,\n",
557
+    "    # so 'deviations' returns an array of shape '(1, len(ip))'.\n",
558
+    "    # It represents the deviation of the function value from a linear estimate\n",
559
+    "    # over each triangular subdomain.\n",
539 560
     "    dev = deviations(ip)[0]\n",
540 561
     "    \n",
541 562
     "    # we add terms of the same dimension: dev == [distance], A == [distance**2]\n",
... ...
@@ -556,6 +577,15 @@
556 577
     "learner.plot(tri_alpha=0.3).relabel('1 / (x^2 + y^2) in log scale')"
557 578
    ]
558 579
   },
580
+  {
581
+   "cell_type": "markdown",
582
+   "metadata": {},
583
+   "source": [
584
+    "Awesome! We zoom in on the singularity, but not at the expense of sampling the rest of the domain a reasonable amount.\n",
585
+    "\n",
586
+    "The above strategy is available as `adaptive.learner.learner2D.resolution_loss`."
587
+   ]
588
+  },
559 589
   {
560 590
    "cell_type": "markdown",
561 591
    "metadata": {},