Browse code

Add coding posts

Joseph Weston authored on 28/03/2021 06:12:17
Showing 17 changed files
... ...
@@ -81,17 +81,17 @@ type = "application/rss+xml"
81 81
   url = "about"
82 82
 [[menu.main]]
83 83
   name = "Contact"
84
-  weight = 1
84
+  weight = 2
85 85
   url = "contact"
86
-# [[menu.main]]
87
-#   name = "Blog"
88
-#   weight = 1
89
-#   url  = "posts"
86
+[[menu.main]]
87
+  name = "Blog"
88
+  weight = 3
89
+  url  = "posts"
90 90
 [[menu.main]]
91 91
   name = "Publications"
92
-  weight = 1
92
+  weight = 4
93 93
   url  = "publications"
94 94
 [[menu.main]]
95 95
   name = "CV"
96
-  weight = 2
96
+  weight = 5
97 97
   url  = "cv.pdf"
98 98
new file mode 100644
... ...
@@ -0,0 +1,24 @@
1
+---
2
+title: Adaptive vs. Parallel computation
3
+date: 2017-11-08
4
+tags:
5
+  - coding
6
+draft: true
7
+---
8
+
9
+Often we will run simulations for several values of a parameter.
10
+Often we want to *sweep* over a parameter space, and look for *features* in
11
+the simulation results (e.g. when the simulated quantity changes abruptly).
12
+
13
+Homogeneous sampling is simple but kind of dumb. We are using a computer -- can't
14
+we do better?
15
+
16
+Yes we can! We can try and sample in an *adaptive* manner, that is, we choose points
17
+in "interesting" regions of parameter space, by inspecting the values of the function
18
+that we have evaluated thus far.
19
+
20
+There are of course challenges to making a "good" adaptive sampler, but essentially
21
+any problems that people have with any particular method can all be summarized as
22
+"my idea of what constitutes an 'interesting region of parameter space' differs
23
+from yours". This is nevertheless and interesting discussion, and will probably appear
24
+in the form of a blog post at a later stage
0 25
new file mode 100644
... ...
@@ -0,0 +1,120 @@
1
+---
2
+title: Tracking down bugs in GCC
3
+date: 2016-03-01
4
+tags:
5
+  - coding
6
+  - C
7
+---
8
+
9
+I *think* I may have found a bug
10
+in [msp430-gcc][mspgcc], which is The GNU C compiler for the MSP430 series of
11
+microntrollers. While I was
12
+hacking on a tiny event loop to power the devices in my personal intranet of
13
+things I discovered that something was going a bit crazy.
14
+Cracking out [mspdebug][mspdebug] I noticed that at a certain point control was
15
+jumping to a seemingly random location in memory that did not have valid
16
+instructions, causing the microntroller to reset. Weird! Where was this
17
+happening, and why?
18
+
19
+I managed to track the problem down to the main event loop that pops events off
20
+a FIFO and acts on them. An "event" in this context is a two-element C-struct
21
+consisting of a function pointer and a data pointer. Even when I provided a
22
+perfectly valid function pointer, my code was still jumping to an arbitrary
23
+position in memory and resetting. The plot thickens; it looks like I'm going to
24
+have to get my hands dirty and dig around a bit in the generated assembly!
25
+After some more back-and-forth between my C source and the assembly I
26
+managed to construct a minimal example that illustrates the problem that I
27
+am having:
28
+
29
+```C
30
+// problem_test.c
31
+typedef struct {
32
+    void (*function)(void*) ;
33
+    void *data ;
34
+} event_t ;
35
+
36
+extern void placeholder(event_t*) ;
37
+extern void test_function(void*) ;
38
+
39
+int main(void) {
40
+    event_t e ;
41
+    e.function = test_function ;  // set to valid function pointer
42
+    e.data = (void*) 0x03 ;  // arbitrary data
43
+    placeholder(&e) ;  // prevent everything from being optimised away
44
+    e.function(e.data) ;
45
+}
46
+```
47
+
48
+When the above code is compiled with optimisations disabled it produces correct
49
+output. The output of `msp430-gcc -O0 -S -c problem_test.c` is shown below.
50
+For clarity I have removed the assembler directives and have added in-line
51
+comments.
52
+
53
+```nasm
54
+main:
55
+; stack setup and allocation of space for `event_t e`
56
+mov r1, r4
57
+add #2, r4
58
+sub #4, r1
59
+; `e.function = test_function`
60
+mov #test_function, -6(r4)
61
+; `e.data = 0x03`
62
+mov #3, -4(r4)
63
+; call `placeholder(&e)`
64
+mov r4, r15
65
+add #llo(-6), r15
66
+call    #placeholder
67
+; call `e.function(e.data)`
68
+mov -6(r4), r14
69
+mov -4(r4), r15
70
+call    r14
71
+; de-allocate stack space for `e`
72
+add #4, r1
73
+```
74
+
75
+This code is correct, however if we now enable optimisations, compiling
76
+with `msp430-gcc -O1 -S -c problem_test.c` (`-O1` and `-O2`
77
+produce the same output for the above C code), we get the following
78
+assembly:
79
+
80
+```nasm
81
+main:
82
+; allocate space for `event_t e` on the stack
83
+sub #4, r1
84
+; `e.function = test_function`
85
+mov #test_function, @r1
86
+; `e.data = 0x03`
87
+mov #3, 2(r1)
88
+; call `placeholder(&e)`
89
+mov r1, r15
90
+call    #placeholder
91
+; move `e.data` into r15
92
+mov 2(r1), r15
93
+; ??? call `e.data(e.data)` ???
94
+call    2(r1)
95
+; de-allocate stack space for `e`
96
+add #4, r1
97
+```
98
+
99
+The second and third to last lines are the most important ones.
100
+we know that `r1` points to the top of the stack, and so the values
101
+of `e.function` and `e.data` can be found with `0(r1)` and `2(r1)`
102
+respectively, as each is a pointer, and hence 2 bytes wide on the
103
+MSP430 architecture. Despite this we clearly see that there is
104
+a `call 2(r1)` -- the program is going to jump to the address in
105
+`e.data` and start executing the data it finds there as if they
106
+were machine code! Clearly for sufficiently arbitrary data we will
107
+very quickly run into something that is not a valid machine instruction
108
+and the microcontroller will reset.
109
+
110
+So, it appears that we have found the source of the problem, although it
111
+is still not clear why the wrong offsets are calculated  when optimisations
112
+are enabled; I will submit a bug report when I have a moment.
113
+As a workaround I noticed that if I use a global variable for the
114
+`event_t` then everything works correctly, even with optimisations enabled.
115
+Luckily for my actual use case this is a viable option, so I will be
116
+able to keep working until a fix is released.
117
+
118
+
119
+[mspgcc]: http://www.ti.com/tool/msp430-gcc-opensource
120
+[mspdebug]: https://github.com/dlbeer/mspdebug
0 121
new file mode 100644
... ...
@@ -0,0 +1,133 @@
1
+---
2
+title: How I learned to stop worrying and love the rebase
3
+date: 2018-10-20
4
+tags:
5
+  - coding
6
+  - git
7
+---
8
+
9
+I've been using Git for nearly ten years now. Ten years is a long time, and I've been able to try
10
+different approaches and evaluate how effective they are in my workflow. I've also had the opportunity to
11
+teach Git to others; both to colleagues in an informal environment, and to students in the more structured
12
+environment of the Casimir graduate school programming course. This experience has given me the chance to reflect on the
13
+Git workflow and how best to use the tool.
14
+
15
+There's one question in particular which often comes up among people who have used Git for a while, and
16
+there never seems to be any concensus on how to use it properly: `git rebase`.
17
+
18
+## What is `rebase`?
19
+
20
+Let's start with a quick recap of what `git rebase` does for us. Let's say that we're developing a new
21
+feature on an aptly-named branch:
22
+
23
+                                              ◯—◯ ← feature
24
+                                             ╱
25
+                                        ◯—◯—◯ ← master
26
+
27
+We then pull in some changes from master, so that the histories for the master and feature
28
+branches are now divergent:
29
+
30
+                                              ◯—◯ ← feature
31
+                                             ╱
32
+                                        ◯—◯—◯—◯—◯ ← master
33
+
34
+Now, if the changes made on `master` were made to the same places in the same files as the
35
+changes on `feature`, then we know that when we finally merge our feature branch we're going
36
+to get conflicts. It's a general rule that the longer that you leave a branch un-merged, the
37
+more likely it is that you are going to get conflicts. Generally, while we're developing on
38
+`feature` we're going to want to incorporate the changes from `master` every so often, so
39
+that we don't have to deal with all the merge conflicts at once during the final merge.
40
+At this point we have 2 options for incorporating the changes from `master`:
41
+
42
+                                          ◯—◯—◯ ← feature      ╮
43
+                                         ╱   ╱                 │ merge
44
+                                    ◯—◯—◯—◯—◯ ← master         ╯
45
+
46
+                                              ◯—◯ ← feature    ╮
47
+                                             ╱                 │ rebase
48
+                                    ◯—◯—◯—◯—◯ ← master         ╯
49
+
50
+See what we did? Rebase allows us to "chop" the link attaching the base
51
+of the `feature` branch and re-attach it (re-*base* geddit?) to the commit
52
+where `master` is pointing now.
53
+
54
+Then we add a couple more commits and merge:
55
+
56
+                                      ◯—◯—◯—◯—◯ ← feature      ╮
57
+                                     ╱   ╱     ╲               │ merge
58
+                                ◯—◯—◯—◯—◯———————◯ ← master     ╯
59
+
60
+                                          ◯—◯—◯—◯ ← feature    ╮
61
+                                         ╱       ╲             │ rebase
62
+                                ◯—◯—◯—◯—◯—————————◯ ← master   ╯
63
+
64
+Using `rebase` in this way allows us to maintain an almost-linear history (i.e. we could
65
+always fast-forward when merging instead of creating an explicit merge commit), which makes
66
+it easier to understand what we've done.
67
+
68
+### Interactive `rebase`
69
+
70
+The above usage of rebase is pretty uncontentious; you start to get divided opinions when
71
+you start talking about *interactive rebase*, which allows us to rewrite history in more
72
+exotic ways. For example, we can use interactive rebase to re-order commits or squash them
73
+together:
74
+
75
+                                              A B C D
76
+                                              ◯—◯—◯—◯ ← feature
77
+                                             ╱
78
+                                    ◯—◯—◯—◯—◯ ← master
79
+
80
+                                              C' B' A+D
81
+                                              ◯——◯———◯ ← feature
82
+                                             ╱
83
+                                    ◯—◯—◯—◯—◯ ← master
84
+
85
+Developing is an inherently iterative process; your understanding of a problem evolves
86
+as you work on the solution. This means that the logical separation of ideas may not
87
+become apparent until *after* the fact. Git rebase can help us express the *logical*
88
+set of changes, rather than the (convoluted) set of changes as they actually happened.
89
+
90
+### So what's the problem?
91
+
92
+Rebase *rewrites history*. Each git commit contains a pointer to the parent commit(s), so
93
+when we rebase a set of commits they won't hash to the same values as they did before the
94
+rebase, even though the *changeset* may be the same.
95
+
96
+This rewriting of history makes it problematic to use rebase on branches that are also being
97
+worked on by other people, and it's the generally accepted wisdom not to use rebase with any
98
+branch that you've pushed to a remote repository (i.e. made public).
99
+
100
+
101
+## My Git workflow
102
+
103
+When conducting scientific experiments, one will typically
104
+keep a lab book, which contains notes, observations and key results as they occur. The
105
+goal of keeping a lab book is to make sure that *you don't forget what you were doing*.
106
+The goal of a lab book is, however, *not* to communicate results to a wider community.
107
+A lab book — despite being an accurate record — requires *context* to understand; it
108
+is messy, and does not present information in a way that someone without the relevant
109
+context can easily understand. A *scientific article*
110
+— on the other hand — is designed to disseminate information to a wide audience, and to give
111
+the necessary context to understand any conclusions. When doing science, *both* of these
112
+ways of working are necessary: an *accurate recollection* of what has been done, and then
113
+a *reorganisation* and *reinterpretation* of what was done.
114
+
115
+In my daily work I use Git as both a *lab book* and a *scientific article*. When I am developing
116
+a new feature or fixing a bug I will create a new branch, and then start experimenting; committing
117
+whenever I make incremental progress towards my goal. This incremental progress will certainly include
118
+many dead-ends and false starts, and that's fine. By committing early and committing often I can ensure
119
+that any work I do won't be lost. However, when it's time to explain to other people bwhat I've done, it's
120
+time to *make sense* of that history. This is when I'll go through my lab book of commits and use the
121
+power of `rebase` to sequence everything into *logical* changes. When my changes are reviewed there will
122
+typically be small fixups (refactoring, naming fixes etc.). During the review I make these changes
123
+as separate commits, which makes it easier for the reviewer to see that I have applied their suggestions.
124
+Once the reviewer is happy I do one final pass with interactive rebase to incorporate the changes
125
+into the commits where they make the most sense. I then rebase on top of the branch into which I'm
126
+merging and perform the merge using the `--no-ff` option (to ensure that an explicit merge commit is made).
127
+
128
+Enforcing this strategy for merging in changes has a few nice features. Firstly, the history is essentially
129
+linear — any merges could have been "fast-forward" — which makes it easier to visualise in tools like `tig`
130
+or `gitk`. Secondly, preserving the individual commits from each merge means that anyone looking back in
131
+history can see the logical set of changes that went into implementing a particular feature or bugfix.
132
+Finally, cleaning up the commits (i.e. not merging the "lab book" into the master branch) means that
133
+anyone looking back in history will not have to sift through endless trivia to get to the meat of a changeset.
0 134
new file mode 100644
... ...
@@ -0,0 +1,142 @@
1
+---
2
+title: Google sheets authenticator for Jupyterhub
3
+date: 2018-01-17
4
+tags:
5
+  - coding
6
+  - jupyter
7
+---
8
+
9
+Back in November I was again involved in running the
10
+programming course for the [Casimir graduate school][casimir]
11
+of the universities of Delft and Leiden. In addition
12
+to the usual tweaks to the material in response to previous year's
13
+feedback, we also wanted to tweak the setup of our programming environment.
14
+
15
+[casimir]: https://casimir.researchschool.nl/
16
+
17
+The course is taught in Python and we provide a [Jupyter][jupyter]-based
18
+environment for our learners for the duration of the course by running our own
19
+deployment of [JupyterHub][jupyterhub].
20
+We've found that it's very effective in getting everyone up and running as quickly as
21
+possible, as everyone has the same environment and it's super easy to push updates
22
+to the course materials (though that's more due to the fact that we use Docker).
23
+
24
+[jupyter]: https://jupyter.org/
25
+[jupyterhub]: https://jupyterhub.readthedocs.io/en/latest/
26
+
27
+When we ran the course in 2016 we were still relative noobs when it came
28
+to Jupyterhub deployments, but after a year of experience setting up around 10 different
29
+Jupyterhubs (with the help of our ever-evolving Ansible role!)
30
+we were starting to get the hang of things. One thing in particular that we wanted
31
+to streamline was the signup process.
32
+
33
+When signing up for the course people give their
34
+Github username (which we use in the Git portion of the course). This means that
35
+we can use the [OAuthenticator][oauthenticator] module. However,
36
+we still need to whitelist the usernames of participants, otherwise we'd be letting
37
+anyone with a Github account access to our environment!
38
+
39
+[oauthenticator]: https://github.com/jupyterhub/oauthenticator
40
+
41
+We had a few options as to how to do this. Last year we just manually added the names
42
+to a whitelist file, but this is not optimal because the file is only read when the
43
+hub starts, meaning that any people who sign up late need to be added manually (or
44
+we'd have to bounce the hub just to update the whitelist).
45
+In addition we wanted to be able to give people access to the hub as soon as
46
+they signed up, so they could have time to get used to it and work through some of
47
+the preliminary material if they wanted. Manually adding people just wasn't going
48
+to cut it.
49
+Another possibility was to make all the participants request access to a Github
50
+organization (which we would set up specifically for the course) and use the new
51
+"group whitelisting" functionality of OAuthenticator to whitelist everyone in that
52
+organization. This was not ideal either, as we would need to manually accept each
53
+participant's request to join the organization, and the whole point was to avoid
54
+`O(N_participants)` effort!
55
+
56
+The solution that we came up with with was pretty hacky, but actually ended up
57
+working perfectly for us. Learners would sign up using a google form that we
58
+had prepared and the submitted form data is magically added to a google docs
59
+spreadsheet set up for the purpose.
60
+Our idea was to "share" the google sheet via a web link,
61
+which we could then fetch from within out whitelisting logic. While this might
62
+seem insanely insecure (it seems like we're making private data public by sharing
63
+using the web link), it's actually not that bad. The URLs that google docs
64
+generates contain a random string of 20 or so alphanumeric characters that's
65
+probably got as much entropy as a reasonable passphrase (sounds like a good topic
66
+for a future blog post!). It goes without saying that we only hit this URL using
67
+HTTPS and don't ever share it around in non-secure channels.
68
+
69
+The following 50(ish) line snippet is the whole thing! (also available
70
+as a [gist][gist]).
71
+
72
+[gist]: https://gist.github.com/jbweston/389fad330108f12c816b21da162fb123
73
+
74
+```python
75
+import csv
76
+import subprocess
77
+
78
+from tornado import gen, AsyncHTTPClient
79
+
80
+
81
+@gen.coroutine
82
+def get_whitelist(sheets_url, usernames_field):
83
+    # Get CSV from sheet
84
+    client = AsyncHTTPClient()
85
+    resp = yield client.fetch(sheets_url)
86
+    raw_csv = resp.body.decode('utf-8', 'replace').split('\n')
87
+
88
+    reader = csv.reader(raw_csv)
89
+
90
+    # Extract column index of usernames
91
+    headers = next(reader)
92
+    try:
93
+        username_column = headers.index(usernames_field)
94
+    except ValueError:
95
+        raise ValueError('header field "{}" not found in sheet {}'
96
+                         .format(usernames_field, sheets_url))
97
+
98
+    usernames = [row[username_column] for row in reader]
99
+    return usernames
100
+
101
+
102
+class SheetWhitelister:
103
+
104
+    sheets_url = 'https://docs.google.com/spreadsheets/d/xxxxxx'
105
+    usernames_column = 'Github username'
106
+
107
+    @gen.coroutine
108
+    def check_whitelist(self, username):
109
+        if super().check_whitelist(username):
110
+            return True
111
+        try:
112
+            whitelist = yield get_whitelist(self.sheets_url,
113
+                                            self.usernames_column)
114
+            self.log.info('Retrieved users from spreadsheet: {}'
115
+                          .format(whitelist))
116
+            self.whitelist.update(whitelist)
117
+        except Exception:
118
+            self.log.error('Failed to fetch usernames from spreadsheet',
119
+                           exc_info=True)
120
+        return (username in self.whitelist)
121
+```
122
+
123
+The above defines a mixin class, `SheetWhitelister`, that we can use with an
124
+existing Jupyterhub authenticator to "plug in" the custom whitelisting
125
+logic. To actually use it in the Jupyterhub config we'd need to combine
126
+it with an existing authenticator (e.g. Github), as below:
127
+
128
+```python
129
+from oauthenticator.github import GithubOAuthenticator
130
+
131
+class GithubWithSheets(SheetWhitelister, GithubOAuthenticator):
132
+    pass
133
+
134
+c.JupyterHub.authenticator_class = GithubWithSheets
135
+```
136
+
137
+I'm really not a fan of the mixin class pattern because you always need
138
+to make these boilerplate classes that combine all the required
139
+functionality, and combining these behaviours at runtime
140
+it's more cumbersome. Give me a nice functional strategy pattern any day!
141
+But hey, it works so I can't complain, and hopefully somebody on the internet
142
+will find this useful.
0 143
new file mode 100644
... ...
@@ -0,0 +1,350 @@
1
+---
2
+title: Fizzbuzz in Haskell
3
+date: 2018-02-20
4
+tags:
5
+  - coding
6
+  - haskell
7
+---
8
+
9
+Continuing in the vein of cool Haskell examples I find on the internet, this
10
+post is going to be about a particularly epic [fizzbuzz][fb] implementation that
11
+I saw in a [three-year-old Reddit thread][reddit]. Now, the OP in that thread
12
+had a serviceable but run of the mill fizzbuzz implementation, but what caught
13
+my eye was the top-voted comment. The author (who has since, sadly,
14
+deleted their account, or I would have credited them here) had accomplished
15
+fizzbuzz in a mere 2 lines of code! Here is the snippet copied verbatim:
16
+
17
+```haskell
18
+let (m ~> str) x = str <$ guard (x `mod` m == 0)
19
+in map (fromMaybe . show <*> 3 ~> "fizz" <> 5 ~> "buzz")
20
+```
21
+[reddit]: https://www.reddit.com/r/haskell/comments/2cum9p/i_did_a_haskell_fizzbuzz/
22
+[fb]: http://wiki.c2.com/?FizzBuzzTest
23
+
24
+Seeing this was one of those moments where you just say "oh man, I *have* to
25
+understand how this works!". Luckily there were a few people in that thread who
26
+had already hashed out explanations, so I could already get the gist of what was
27
+going on. This post is going to be an attempt to explain the above two lines to
28
+myself.
29
+
30
+#### Let's go
31
+The fizzbuzz two-liner is a single expression with a `let` binding that defines
32
+an operator called `~>`. We shall put the `let` binding to one side for the
33
+moment and concentrate just on the core expression:
34
+
35
+```haskell
36
+map (fromMaybe . show <*> 3 ~> "fizz" <> 5 ~> "buzz")
37
+```
38
+OK so we're using the function `map`, which has the signature `map :: (a -> b) ->
39
+[a] -> [b]`, and we've applied it to a single argument, meaning that the
40
+bit in parentheses must be a function `a -> b`. Now, the core of fizzbuzz is all
41
+about turning integers into strings (arbitrary integers into their string
42
+representation, multiples of 3 into "fizz" etc.) so we can probably assume that we
43
+will be mapping over a list of integers and producing a list of strings.
44
+
45
+We can test this hypothesis by loading the two-liner into GHCi (We have to add
46
+the imports -- which I got by [hoogling][hoogle] the function names that GHCi
47
+didn't know about).
48
+
49
+```haskell
50
+λ> import Control.Monad (guard)
51
+λ> import Data.Monoid ((<>))
52
+λ> import Data.Maybe (fromMaybe)
53
+λ> let (m ~> str) x = str <$ guard (x `mod` m == 0)
54
+λ> let core = (fromMaybe . show <*> 3 ~> "fizz" <> 5 ~> "buzz")
55
+λ> :t core
56
+core :: (Show a, Integral a) => a -> String
57
+```
58
+This seems to check out; the type signature looks a bit weird because Haskell
59
+derives the most general signature it can, but we can interpret it as `core ::
60
+Integer -> String`.
61
+
62
+[hoogle]: https://www.haskell.org/hoogle/
63
+
64
+#### From abstract to concrete
65
+Ok, so now we're going to start from the `core` expression (adding clarifying
66
+parentheses):
67
+
68
+```haskell
69
+(fromMaybe . show) <*> (3 ~> "fizz" <> 5 ~> "buzz")
70
+```
71
+Let's analyse this from the outside in by first looking at the types of the
72
+arguments on either side of the `<*>`:
73
+
74
+```haskell
75
+λ> :t (fromMaybe . show)
76
+fromMaybe . show :: Show a => a -> Maybe String -> String
77
+λ> :t (3 ~> "fizzbuzz" <> 5 ~> "buzz")
78
+... :: (Alternative f, Integral a) => a -> f String
79
+```
80
+Hmm, the first one is kind of understandable, but the second one is still quite
81
+abstract. In order to make this more concrete we could try to glue these pieces
82
+together with `<*>`. Let's remind ourselves of the signature for `<*>`:
83
+
84
+```haskell
85
+λ> :t (<*>)
86
+(<*>) :: Applicative f => f (a -> b) -> f a -> f b
87
+```
88
+Now we have all the ingredients; let's try and match the type signatures for
89
+the previous expressions with the (very abstract) one for `<*>`:
90
+
91
+```haskell
92
+f     (a            -> b)      -> f     a         -> f     b
93
+a' -> (Maybe String -> String) -> a' -> f' String -> a' -> String
94
+```
95
+So the `Applicative` structure `f` matches up with the `a' ->`, and the `f'`
96
+matches up with the `Maybe`. Given that we know that the whole combination needs
97
+to give something of type `Integer -> String`, this fixes the type of `a'` in
98
+the above to be "a function that takes an integer".
99
+
100
+Just to make things crystal clear let's rewrite the signatures for the two
101
+sub-expressions using the concrete types that we managed to deduce:
102
+
103
+```haskell
104
+(fromMaybe . show) :: Integer -> Maybe String - String
105
+(3 ~> "fizz" <> 5 ~> "buzz") :: Integer -> Maybe String
106
+```
107
+This is pretty cool; by combining several expressions that individually have
108
+very abstract types we've managed to deduce *concrete* types for these
109
+expressions!
110
+
111
+We can also see that by using `<*>` we're using the [`Applicative` instance of
112
+functions][app] to elide the `Integer` parameter to the two sub-expressions. We
113
+could rewrite `core` like so:
114
+
115
+```haskell
116
+core n = fromMaybe (show n) $ (3 ~> "fizz" <> 5 ~> "buzz") n
117
+```
118
+which is, in my opinion, more explicit but much less readable!
119
+
120
+[app]: https://hackage.haskell.org/package/base-4.10.1.0/docs/src/GHC.Base.html#local-6989586621679017723
121
+
122
+#### Down the layers
123
+Now that we have these concrete types we can start understanding how everything
124
+fits together.
125
+
126
+`fromMaybe` has signature `a -> Maybe a -> a`; it takes a default value, a
127
+`Maybe` value and returns the default value if the `Maybe` is `Nothing`. In code:
128
+
129
+```haskell
130
+fromMaybe a (Just b) = b
131
+fromMaybe a Nothing = a
132
+
133
+```
134
+In `core` the default value is `show n`, where `n`
135
+is the number we're fizz-buzzing. This makes sense, as if `n` is not divisible
136
+by 3 or 5 then we should show just the number itself.
137
+
138
+We can therefore see that `3 ~> "fizz" <> 5 ~> "buzz"` takes `n` and should
139
+return `Nothing` if `n` is not divisible by 3 or 5, and `Just "something"`
140
+otherwise.
141
+
142
+Given this, it kind of makes sense if we can first look at `3 ~> "fizz"` in
143
+isolation. If we look at the type signature for `<>`:
144
+
145
+```haskell
146
+λ> :t (<>)
147
+(<>) :: Monoid m => m -> m -> m
148
+```
149
+we see that it takes two things of type `m` and produces a third thing of the
150
+same type. We can therefore deduce that the type of `3 ~> "fizz"` is the same as
151
+the whole expression `3 ~> "fizz" <> 5 ~> "buzz"`, and is therefore `Integer ->
152
+Maybe String`.
153
+
154
+To understand how `3 ~> "fizz"` works we'll first have to look at the definition
155
+of `~>` again:
156
+
157
+```haskell
158
+(m ~> str) x = str <$ guard (x `mod` m == 0)
159
+```
160
+Ok, the last bit, ``x `mod` m == 0``, is clearly checking whether `x` is
161
+divisible by `m`. Let's look at the signatures of `<$` and `guard`:
162
+
163
+```haskell
164
+λ> :t (<$)
165
+(<$) :: Functor f => a -> f b -> f a
166
+λ> :t guard
167
+guard :: Alternative f => Bool -> f ()
168
+```
169
+Ok, so `<$` seems to take two arguments, the second one being a functorial one,
170
+and returns the first value in the functorial context of the second value. If I
171
+had to guess I would say that it's implemented like so:
172
+
173
+```haskell
174
+a <$ fb = fmap (const a) fb
175
+```
176
+or, in point free style:
177
+
178
+```haskell
179
+(<$) = fmap . const
180
+```
181
+Looking at the definition of `~>` again we can see that the expression evaluates
182
+to `str` put into the functorial context of ``guard (x `mod` m ==
183
+0)``. What the hell does that mean?
184
+
185
+Once again we're getting hit by the fact that the type signatures of the
186
+individual pieces are too general; we need to put stuff back into context and
187
+"match up the types" to understand what is really going on.
188
+
189
+We know that ``str <$ guard (x `mod` m == 0)`` must have type `Maybe String`,
190
+and we know that `str` has type `String` and guard returns an `f ()` where `f`
191
+is some functor (`Alternative` being a subclass of `Functor`). We can therefore
192
+see that ``guard (x `mod` m == 0)`` must therefore have type `Maybe ()`. This
193
+means that the only values this expression can have are `Just ()` and `Nothing`.
194
+
195
+Combined with the `<$` we can therefore see that `(m ~> str) x` evaluates to
196
+`Just str` when `m` divides `x`, and `Nothing` otherwise.
197
+
198
+##### Down, down, down
199
+
200
+So now we've understood *that* layer of structure, let's see if we can
201
+understand the combination `3 ~> "fizz <> 5 ~> "buzz`. Because we'll be
202
+referring to this thing a few times, I'm going to give it the name `buzzer`, so
203
+
204
+```haskell
205
+buzzer :: Int -> Maybe String
206
+buzzer = 3 ~> "fizz" <> 5 ~> "buzz"
207
+```
208
+The expressions on either side of the `<>` are *functions* from `Integer` to
209
+`Maybe String`. `<>` is [defined as follows][func] between functions:
210
+
211
+```haskell
212
+(f <> g) x = f x <> g x
213
+```
214
+[func]: https://hackage.haskell.org/package/base-4.6.0.1/docs/src/Data-Monoid.html#line-105
215
+
216
+so clearly for this to work `f` and `g` must have the same signature, *and*
217
+the return value must itself be a monoid. We know that `f` and `g` return
218
+`Maybe String` for our case. `Maybe` is indeed a monoid if the thing that
219
+it contains is also a monoid; we just identify `Nothing` with the monoidal
220
+identity for the contained values and we're done. `String` is, of course,
221
+a monoid with the empty string as its identity element and concatenation
222
+as its `<>`.
223
+
224
+Putting all this together we can see how `buzzer` actually
225
+works. We can explicitly treat each case: not divisible by 3 or 5, divisible
226
+by either 3 or 5, divisible by both 3 and 5.
227
+
228
+When we apply `buzzer` to a number that is neither divisible by 3 nor by 5
229
+then both of the subexpressions evaluate to `Nothing` and we get
230
+`Nothing <> Nothing`, which is just `Nothing`. In the second case we get
231
+either `Just "fizz" <> Nothing` or `Nothing <> Just "buzz"`, which evaluate
232
+to `Just "fizz"` and `Just "buzz"` respectively (thanks to the monoid on
233
+`Maybe`). In the final case we get `Just "fizz" <> Just "buzz"`, which
234
+evaluates to `Just ("fizz" <> "buzz")` which is `Just "fizzbuzz"`.
235
+
236
+#### Putting it all together
237
+Now comes the question of how we would rewrite this fizzbuzz so that it's
238
+easier to understand. On one hand we want to use abstraction to help us reveal
239
+the actual structure of the problem (without getting bogged down in the messy
240
+details) and on the other hand we don't want to abstract into the stratosphere
241
+so that it's no longer clear what our intention is.
242
+
243
+My compromise would probably look something like this:
244
+
245
+```haskell
246
+import Control.Monad (guard)
247
+import Data.Monoid ((<>))
248
+import Data.Maybe (fromMaybe)
249
+
250
+(m ~> str) x = if x `mod` m == 0
251
+    then Just str
252
+    else Nothing
253
+
254
+fizz_or_buzz :: Integer -> Maybe String
255
+fizz_or_buzz =
256
+        3 ~> "fizz"
257
+    <>  5 ~> "buzz"
258
+
259
+fizzbuzz :: Integer -> String
260
+fizzbuzz = fromMaybe <$> show <*> fizz_or_buzz
261
+
262
+main = traverse putStrLn $ map fizzbuzz [1..100]
263
+```
264
+
265
+Essentially I made the following changes:
266
+
267
++ I preferred an explicit 'if-then-else' over the use of `guard` and `<$`,
268
+  but did not apply a type signature to `~>` as I feel it would obscure, rather
269
+  than clarify, meaning.
270
++ I put an explicit type signature on the piece that handles the fizzing and
271
+  buzzing, but kept the abstract monoidal composition. I think that even if
272
+  someone is not 100% clear on how all the monoid instances interact, the
273
+  signature and definition make it obvious what this piece is doing. In addition
274
+  the formatting makes it easy for someone else to modify the code, say to
275
+  add printing of "baz" if the number is divisible by 7, or to reverse the
276
+  order of "fizz" and "buzz".
277
++ I prefer using applicative style for both of the arguments for `fromMaybe`;
278
+  In my opinion this clarifies intent drastically.
279
+
280
+So in the end we have not actually changed too much: the code still works in
281
+essentially the same way; I just clarified intent by adding
282
+explicit names to things, adding type signatures, and using explicit
283
+language features as opposed to what I consider excessive abstract logic.
284
+
285
+Of course, the changes I made are coming from a place of ignorance; I am a
286
+total Haskell noob, so the things that are not obvious to me could well be
287
+obvious for a Haskell veteran. For example, the fact that I chose to keep the
288
+`fromMaybe <$> show <*> fizz_or_buzz` is due to the fact that I understand and
289
+know how to use the applicative instance of functions; maybe if I had more
290
+experience using `guard` and `<$` I would find the initial two-liner clearer
291
+than my explicit 'if-then-else'. I guess only time will tell.
292
+
293
+
294
+### Thoughts
295
+
296
+#### Spaghetti
297
+People complain about object oriented programming because when you make a method
298
+call you have no idea what code is actually getting called ('cause dynamic
299
+dispatch + having to follow the object's method resolution order). I would posit
300
+that finding the definition of any of the functionality defined in a typeclass
301
+is the same thing. From a function definition it is sometimes impossible to know
302
+what code will actually be run because it can depend on the type of the
303
+arguments; you need to go to the call site to find out what will happen.
304
+
305
+In addition, I find that the abstract level at which Haskell operates sometimes
306
+confuses more than it helps. Even though a `a -> b` and `Maybe a` both have
307
+monoid instances, the meaning is *totally* different for the two. In my opinion
308
+this is a case where treating things too generally can actually obscure meaning.
309
+
310
+#### Work from the outside in
311
+I found that the complexity from overgeneralising can be combated by working
312
+top-down. You first need to figure out the type of the top-level/outermost
313
+expression and work inwards. If you start out trying to understand the types for
314
+the constituent expressions, often they will be to general for you to be able to
315
+understand why they are being used in the first place.
316
+
317
+By starting from the outermost expression you can apply the technique of
318
+"matching up the types" to figure out what is going on one layer down, and then
319
+carry on recursively like this until you have the concrete types for the
320
+innermost expressions.
321
+
322
+#### Is there such a thing as being *too* general?
323
+Abstraction is in some sense the essence of programming computers. It allows us
324
+to see the forest instead of the trees and often enables thinking about
325
+problems in a more fruitful way, i.e. *closer to the domain in which the problem
326
+was originally defined*. Many languages define abstract (as opposed to concrete)
327
+concepts. Python (my go-to language) has the concept of a `sequence`, an
328
+`iterable`, a `mapping` etc. These are all useful concepts, as they signal
329
+*intent*; we can define an algorithm that works on any `iterable`, and this
330
+gives us the freedom to pass it an array, linked list, or anything else that can
331
+be iterated over. Someone reading the algorithm doesn't need to care about the
332
+actual type that is passed in to understand what is going on.
333
+
334
+Haskell takes this 1 step further with `Functor`, `Applicative` and `Monad`, and
335
+I am yet to be convinced that this is actually useful for a wide variety of
336
+cases. Even if `Maybe` and `List` can both formally be considered as applicative
337
+functors, the applicative instances for these two types *means totally different
338
+things*. `Maybe` represents computations that can fail, whereas the `List`
339
+`Applicative` instance represents all possible combinations of the provided
340
+computations. If I write some code that does something with a general
341
+`Applicative`, I don't really know what the code *means* before I apply it to
342
+concrete types. This means that *even if* I can formulate an algorithm using
343
+only `Applicative`, *naming* this thing sensibly is going to be a real
344
+challenge.
345
+
346
+On the other hand, some very smart people clearly think that thinking at this
347
+level of abstraction *does* produce better software, and I am still very new to
348
+Haskell and functional programming in general. I would really like to see a good
349
+set of concrete examples that show how abstracting into the stratosphere like
350
+this is actually beneficial and produces code that is more maintainable.
0 351
new file mode 100644
... ...
@@ -0,0 +1,202 @@
1
+---
2
+title: Writing a snake clone in Haskell, part 1
3
+date: 2017-11-16
4
+tags:
5
+  - coding
6
+  - haskell
7
+---
8
+
9
+After my recent dive into Haskell I was keen to try a small project to
10
+test out what I had learned. After watching a bunch of YouTube videos
11
+from various Haskell conferences I came across one by
12
+[Moss Collum](https://github.com/moss) where he describes how he built
13
+a series of Rogue-like games in Haskell over the course of a week.
14
+
15
+I took a look at Moss' [code](https://github.com/moss/haskell-roguelike-challenge)
16
+and thought it was a pretty neat idea, however I wanted to try and make
17
+a snake game instead (nostalgia for the Nokia 3310, I guess). If you don't
18
+know what snake is, here's a sweet GIF of a russian guy getting a perfect game:
19
+
20
+![snake](https://media.giphy.com/media/8D0yR4ylkAC1G/giphy.gif)
21
+
22
+You move a snake around the screen trying to gobble up pieces of food. The snake
23
+moves forward 1 space every second or so autonomously, and each piece
24
+of food you eat makes the snake grow 1 space longer. You die if the snake hits
25
+the walls or its own tail.
26
+
27
+The snake games that I saw on Hackage all seemed to be projects for the author
28
+to learn how to use a specific library, and I found that as a consequence the
29
+code logic was somewhat obscured. I wanted something *much* simpler:
30
+a terminal application controlled by sending keypresses to `stdin`, and with
31
+ASCII "graphics". I specifically wanted to avoid using game libraries; after all, my aim
32
+was to exercise my Haskell knowledge, not to make a novel gaming experience!
33
+
34
+
35
+### Let's go
36
+
37
+All of the code described here is available [on Github](https://github.com/jbweston/haskell-snake).
38
+
39
+#### First iteration
40
+
41
+I started out by looking at some of Moss' code to see an example of how
42
+I could proceed. I decided that the first thing I would do
43
+would be to have a snake of fixed length moving around the screen
44
+in response to keypresses: no "food" to grow the snake, no boundaries,
45
+no collision detection and (most importantly) the snake does not move
46
+by itself.
47
+
48
+The best way of proceeding seemed to be to model the game as a sequence
49
+of transformations on an initial state of the game's world. The
50
+transformations to apply are determined by the commands typed by the
51
+player. Moss' code took advantage of Haskell's lazy IO to get an
52
+(infinite) list of keypresses from `stdin` and then used this
53
+as the sequence of transformations. This is captured by the
54
+following code:
55
+
56
+```haskell
57
+parseInput :: [Char] -> [Direction]
58
+...
59
+advance :: World -> Direction -> World
60
+...
61
+input <- getContents
62
+let states = scanl advance initialWorld (parseInput input)
63
+```
64
+
65
+The last two lines are from the `main` function, and the preceding
66
+lines are the type signatures necessary to understand them. We
67
+can see that we first take the raw input from the user (via the
68
+`getContents` IO action) and parse the sequence of raw keypresses
69
+(the infinite list of `Char`) into a sequence of `Direction`s in
70
+which to move the snake. We then do a left scan of the `advance`
71
+function over this sequence of directions, starting with the
72
+world in its initial state, to generate a sequence of states
73
+of the world! `parseInput` also handles quitting the game when
74
+the user presses `q`. We model this by terminating the
75
+sequence of directions when we detect that `q` was typed.
76
+
77
+Once we have this sequence of game worlds we just need to
78
+draw them to the screen. Naively I initially did the
79
+following [^1]:
80
+
81
+```haskell
82
+drawWorld :: World -> IO ()
83
+...
84
+mapM_ (\s -> clearScreen >> drawWorld s) states
85
+```
86
+
87
+i.e. I cleared the screen before drawing the new state. Unfortunately
88
+this caused the screen to flicker every time the world state
89
+updated, and I guessed (correctly) that it was because of the
90
+`clearScreen` taking just long enough to be noticeable. My solution
91
+was instead to "update" the screen:
92
+
93
+```haskell
94
+drawUpdate :: (World, World) -> IO ()
95
+...
96
+mapM_ drawUpdate $ zip states (tail states)
97
+```
98
+
99
+`drawUpdate` is actually pretty dumb; it just "deletes" the snake
100
+in the previous world by writing a space character to every position
101
+the snake occupied, then draws the snake position in the new world
102
+by writing a `@` at every position it occupies.
103
+
104
+The result can be seen below
105
+
106
+<video src="/images/snake/basic.webm" autoplay loop></video>
107
+
108
+This is smashing, but is clearly not really a snake game yet!
109
+We have to add a few more ingredients to make it more like
110
+the game I remember from the old Nokia phones.
111
+
112
+[^1]: `mapM_` maps a function that returns an IO action (more generally,
113
+     any monadic value), and then sequences those actions.
114
+
115
+#### Adding extra ingredients
116
+
117
+The first thing to do was to actually make it possible to lose the game.
118
+This involved detecting collisions between the snake and itself or with
119
+the boundary. After these additions I had something that looks like this:
120
+
121
+<video src="/images/snake/with-walls.webm" autoplay loop></video>
122
+
123
+The final piece of the puzzle (for now) was to add the food that could
124
+be eaten and would reappear in a random location. This led to the
125
+final iteration:
126
+
127
+<video src="/images/snake/simple-complete.webm" autoplay loop></video>
128
+
129
+This is already starting to look a lot like what I had initially envisioned!
130
+The next step (which I will detail in a subsequent post) is to make the
131
+snake move in the last direction selected every second or so. This will
132
+probably require a rewrite of much of the code; we'll need to have
133
+another "source" for direction commands, and probably different threads
134
+to to do the waiting.
135
+
136
+
137
+### Thoughts
138
+
139
+Writing this short program was really a lot of fun. In addition, it
140
+taught me a bunch of stuff about writing Haskell programs! Below
141
+are a few points that I came to appreciate during this project.
142
+
143
+
144
+#### Type signatures are your documentation
145
+
146
+I was startled by how much intention could be gleaned just from the
147
+type signatures and sensibly naming the functions. Given that I had
148
+some context about the program as a whole I found that the meaning
149
+of most of the functions became self evident. For example, given that
150
+I know that the  `World` datatype contains the state of the game world,
151
+and `Direction` is an order to move the snake in a particular
152
+direction, the meaning of
153
+
154
+```haskell
155
+advance :: World -> Direction -> World
156
+```
157
+
158
+is obviously "advance the state of the game world in response to
159
+an order to move in a particular direction".
160
+
161
+I realise that my perspective on this is pretty skewed due to the short length of
162
+the program I was writing (you can hold the context of the whole program in
163
+your mind at once), but I get the impression that even with longer programs
164
+this concept that the type declarations *are* (for many functions) your documentation
165
+is quite prevalent. This is very different to Python, for example, where we are
166
+encouraged to document every function and detail its parameters.
167
+
168
+
169
+#### If your program compiles, chances are it is correct
170
+
171
+I had read this claim on several places on the web and was initially sceptical,
172
+but can now anecdotally confirm it to be true! I reckon this surprising property
173
+of Haskell is due to the fact that Haskell programs naturally need to be decomposed
174
+into teeny weeny functions that do literally only one thing. As far as I can tell
175
+this need to decompose your program into much smaller pieces than you otherwise
176
+would is a consequence of Haskell's purity. We can't hold any mutable state in "variables",
177
+and each function returns an output and does nothing else, so there's not really much
178
+"room" to do much else than just quickly compute a value and return it. It thus
179
+becomes abundantly clear when you are writing a function whether it is correct or not,
180
+as you often only have to verify that a few expressions are correct.
181
+
182
+Let's take the `advance` function (before we added the food) as an example.
183
+We want it to move the snake
184
+in a particular direction, unless that direction is the *opposite* direction
185
+to the direction in which the snake is currently moving (in which case
186
+`advance` should not change the state of the world). In code:
187
+
188
+```haskell
189
+advance :: World -> Direction -> World
190
+advance w newDir
191
+    | newDir == opposite (direction w) = w
192
+    | otherwise = World { snake = slither (snake w) newDir
193
+                        , direction = newDir
194
+                        }
195
+```
196
+
197
+The above code is obviously correct; if the new direction is opposite to the
198
+current direction, then just return the current world state, otherwise
199
+return a state of the world where the snake has slithered in the new direction.
200
+Of course we still need to verify that `opposite` and `slither` are implemented
201
+correctly, but because they have similarly restricted scopes it becomes just as
202
+easy to verify their correctness.
0 203
new file mode 100644
... ...
@@ -0,0 +1,143 @@
1
+---
2
+title: Writing a snake clone in Haskell, part 2
3
+date: 2018-01-27
4
+tags:
5
+  - coding
6
+  - haskell
7
+---
8
+
9
+In a [previous post](haskell-snake) I talked a bit about writing a snake game in
10
+Haskell. At the end of the post we had a working game, but there was 1 ingredient
11
+missing; the snake would not go anywhere by itself! The fundamental problem was that
12
+our game was being driven by Haskell's [lazy IO][lazy-io]. Whenever a new character
13
+appeared on `stdin` the runtime would crank the handle on our Haskell code,
14
+transforming this character into a sequence of IO actions that the runtime evaluates
15
+to print the game world to the screen.
16
+This use of lazy IO meant that basically all of the logic (except
17
+drawing to the screen) could take place outside the IO monad in nice, pure code.
18
+
19
+[lazy-io]: http://book.realworldhaskell.org/read/io.html#io.lazy
20
+
21
+The challenge now was to find a way of inserting an extra stream of "fake messages
22
+from the keyboard" that would be delivered at regular intervals (these would make
23
+the snake move forward without me having to type a key). It seemed to
24
+make sense to retain the "pipeline" structure of the code, so I thought about
25
+modifying it as illustrated by the following ascii-art:
26
+
27
+    directions from  >-+-------------------------+-> update game world
28
+       keyboard        |                         |    and draw update
29
+                       +-> forward most recent >-+
30
+                             every X seconds
31
+
32
+I came across the [Pipes](https://wiki.haskell.org/Pipes) library pretty
33
+quickly, and was delighted to see that the *first example* in the
34
+`pipes-concurrency` tutorial [is a game][pipes]! Essentially all I
35
+had to do was launch 3 threads that would run the above 3 components,
36
+with each one either feeding messages to, or reading messages from,
37
+a mailbox. The above diagram translates into the following haskell
38
+(inside the IO monad)
39
+
40
+```haskell
41
+(mO, mI) <- spawn unbounded
42
+(dO, dI) <- spawn $ latest West
43
+
44
+let inputTask = getDirections >-> to (mO <> dO)
45
+    delayedTask = from dI >-> rateLimit 1 >-> to mO
46
+    drawingTask = for (from mI >-> transitions initialWorld)
47
+                      (lift . drawUpdate)
48
+```
49
+
50
+We first create some mailboxes: the main one (`mO` and `mI`), which
51
+`drawingTask` will draw directions from, and the one that will handle
52
+the delayed directions (`dO` and `dI`). Then we build up some pipelines
53
+that feed and consume these messages to and from the pipelines.
54
+All we need to do now is to run each of these pipelines in a separate
55
+thread using the `async` function. This is a bit involved
56
+because we first need to "unwrap" the pipeline into an IO action using
57
+`runEffect` (and perform garbage collection ¯\\\_(ツ)\_/¯).
58
+
59
+```haskell
60
+let run p = async $ runEffect p >> performGC
61
+tasks <- sequence $ map run [inputTask, delayedTask, drawingTask]
62
+waitAny tasks
63
+```
64
+
65
+[pipes]: (https://hackage.haskell.org/package/pipes-concurrency-2.0.0/docs/Pipes-Concurrent-Tutorial.html)
66
+
67
+
68
+The full code is [on Github][snake].
69
+
70
+[snake]: https://github.com/jbweston/haskell-snake
71
+
72
+
73
+### Thoughts
74
+
75
+#### Lots of stuff happens in monads
76
+I previously had the impression that Haskell code was super readable because
77
+it was composed of teeny tiny functions that only do one thing. However, after
78
+reading a bit of Haskell code (for example the [`Pipes.Concurrent`][concurrent]
79
+library) I realised that a lot of Haskell code is written inside monads which,
80
+in my opinion, harms readability. When I say that the code "happens in monads"
81
+what I really mean is that code is written using Haskell's [do notation][do]
82
+that allows you to write code that looks like it's imperative, but it really
83
+just a bunch of monadic compositions:
84
+
85
+```haskell
86
+do
87
+    x <- x_monad
88
+    y <- returns_a_monad(x)
89
+    return (x + y)
90
+```
91
+
92
+the above contrived example is equivalent to the following chain of monadic
93
+bind operations:
94
+
95
+```haskell
96
+x_monad >>= (\x -> returns_a_monad(x)
97
+             >>=
98
+               (\y -> return (x + y)))
99
+```
100
+
101
+which is certainly more difficult to read than the do notation!
102
+However, because it is easy to build up a lot of context when using do
103
+notation, I find it goes a bit against the grain of composing tiny
104
+functions that do only one thing. Hopefully as I gain competence in
105
+Haskell I'll be able to overcome these hurdles.
106
+
107
+[concurrent]: https://github.com/Gabriel439/Haskell-Pipes-Concurrency-Library/
108
+[do]: https://en.wikibooks.org/wiki/Haskell/do_Notation
109
+
110
+
111
+#### Haskell's import style is scary
112
+
113
+The language I have worked in most is recent years is Python. The
114
+[zen of Python][zen] teaches us that *explicit is better than implicit*,
115
+because it makes code easier to reason about. Given this, I find Haskell's
116
+default mode when importing modules somewhat scary. In Haskell, when you
117
+say `import foo`, this is equivalent to saying `from foo import *` in
118
+Python. This means that you get a bunch of arbitrary names injected into
119
+your namespace. This isn't quite as bad as `import *` in Python from
120
+a code-correctness perspective because Haskell is statically typed, and
121
+so any problems will (most probably) be caught at compile time. From a
122
+code readability perspective, however, I find it to be a complete nightmare;
123
+someone reading the code has no idea where an (often cryptically named)
124
+function comes from!  For example, `Pipes.Concurrent` exports a function
125
+called `spawn` that *creates a new mailbox*. Someone reading the code may
126
+naturally assume that `spawn` has something to do with creating new threads, 
127
+but without knowing even what module it comes from, it's very difficult to
128
+tell. Now Haskell experts may well respond with "read the code and the
129
+meaning will be obvious" or merely "get gud", but I would posit that *the whole
130
+point* of things like clear variable names and explicit imports is that
131
+you *shouldn't have to* "get gud" to get a sense of what some code
132
+is trying to do. Maintaining mental context is hard, and as
133
+communicators we should try and reduce the burden by not requiring people
134
+to retain excess information, such as which modules export exactly which functions.
135
+
136
+I am, of course, aware that Haskell has several variants of its import
137
+syntax, such as `import qualified` (which requires you to prepend the namespace,
138
+as you would with a regular `import` in Python) or by specifying explicitly
139
+which names should be imported. However, the overwhelming majority of Haskell
140
+code that I have read so far has made use of the unqualified syntax, making it
141
+more difficult than necessary to decipher people's code.
142
+
143
+[zen]: https://www.python.org/dev/peps/pep-0020/
0 144
new file mode 100644
... ...
@@ -0,0 +1,66 @@
1
+---
2
+title: Diving into Haskell
3
+date: 2017-11-08
4
+tags:
5
+  - coding
6
+  - haskell
7
+---
8
+
9
+Haskell has been, for a number of years, a language that I have always wanted to
10
+dive into. I've heard it lauded as the language of "true hackers",
11
+and it's somewhat of a sign that you've made it as a developer if
12
+you can make sense of its terse syntax and seemingly arcane concepts.
13
+No mutation? No for-loops? What?! How do you get *anything done* in
14
+the language if it doesn't have these most basic of control flow
15
+mechanisms?
16
+
17
+Well, the other day I saw the following snippet as a way of generating
18
+the Fibonacci sequence in Haskell:
19
+
20
+```haskell
21
+fibonacci = 1 : 1 : zipWith (+) fibonacci (tail fibonacci)
22
+```
23
+
24
+and I immediately knew that I needed Haskell in my life.
25
+I didn't even fully understand it on
26
+first glance; at that point all I knew about Haskell was that *spaces*
27
+were used for function application, rather than the more traditional `()`,
28
+but already I could see the outline of what the solution
29
+meant. The Fibonacci sequence is defined as `1`, `1`, then the sum of
30
+the previous number with the one before that, recursively. *But that's
31
+exactly what the above code says*. Even to the relatively untrained eye (mine)
32
+we can kind of see that the code is telling us to start with two `1`'s, then
33
+mash together the sequence we are currently building *with itself* (dropping
34
+the first element) using `+` as the "mashing operator".
35
+
36
+Let's contrast this to a least-effort implementation in Python that
37
+generates the same sequence:
38
+
39
+```python
40
+def fibs():
41
+    x = y = 1
42
+    yield y
43
+    while True:
44
+        yield x
45
+        x, y = x + y, x
46
+```
47
+
48
+This is, in my opinion, much harder to read than the Haskell version.
49
+I'm not exactly  sure why; maybe it's because the Haskell version is so
50
+terse that you can hold it all in your mind's eye at once, or maybe it's
51
+got something to do with the way our brains process recursion vs. mutating
52
+values. In any case this example was enough to hook me.
53
+
54
+I devoured the sublime "[Learn you a haskell for great good](http://learnyouahaskell.com/)"
55
+in the space of about a week, although I'm sure it will take a while before I
56
+fully digest the *meaning* of, e.g., functors and applicative functors (even if the mathematical
57
+definition is trivial). I think it's a testament to the quality of the exposition
58
+of this book that I was left with the distinct impression of having "got" monads
59
+after only a few readings (although I'm probably way off the mark). I'm not going
60
+to fall into the "[monads are like burritos](https://byorgey.wordpress.com/2009/01/12/abstraction-intuition-and-the-monad-tutorial-fallacy/)"
61
+trap, though; as far as I can tell, they appear to be just a particularly useful design pattern,
62
+and I am by far not the [first person](https://www.stephanboyer.com/post/9/monads-part-1-a-design-pattern)
63
+to draw this conclusion.
64
+
65
+My next step in Haskell is going to be to tackle a small project of very limited scope,
66
+to see if I can write anything beyond tutorial code; should be fun!
0 67
new file mode 100644
... ...
@@ -0,0 +1,118 @@
1
+---
2
+title: Isolating a Jupyterhub deployment
3
+date: 2017-02-02
4
+tags:
5
+  - docker
6
+  - jupyter
7
+---
8
+
9
+In the [research group][qt] that I am a part of we use [Jupyter][jupyter] and
10
+associated projects a *lot*.  In addition to the local Jupyter instances that
11
+people may run on their private machines, we also have a [Jupyterhub][jhub]
12
+deployment that spawns Jupyter servers in Docker containers that we use for
13
+research purposes, as well as other deployments that we use for guest
14
+researchers and teaching, among other things.
15
+
16
+One really useful recent addition to the Jupyter ecosystem is an authenticator
17
+plugin for Jupyterhub by [yuvipanda][yv] that will give a user a temporary
18
+account that will expire when they log out. Along with the [idle notebook
19
+culler][cull], this effectively allows us to set up a [tmpnb][tmpnb]
20
+deployment, but using all the existing infrastructure we have for deploying and
21
+managing Jupyterhub instances. We want to use this to host an interactive
22
+tutorial for our quantum transport simulation tool, [Kwant][kwant], that anyone
23
+can try out from wherever they are!
24
+
25
+While this would be really awesome, there is currently one problem:
26
+we run everything on our own hardware in the university, so giving random
27
+people on the internet access to a Jupyter notebook servers inside the
28
+university firewall is a recipe for disaster. To get around this problem we will
29
+use the networking capabilites of Docker along with a few iptables rules
30
+to secure our deployment.
31
+
32
+#### Docker networking
33
+
34
+When you create a new Docker container it will, by default, be attached to
35
+the default network bridge used by Docker. All containers connected to the same bridge
36
+will be on the same IP subnet. Restricting access between containers
37
+in this configuration is possible but cumbersome (you'd need to write firewalls rules
38
+targeting each container individually). It is much simpler to first create a new
39
+"docker network", to which you attach all the containers you want to have a similar network
40
+configuration.
41
+
42
+```bash
43
+$ docker network create --driver=bridge my_new_network
44
+48d08d196dc853e58c6115a6fab96ce84028ab68d6fa5d596c91adb406efb3ac
45
+```
46
+
47
+The above command creates a network called `my_new_network`, which we can attach
48
+newly created containers to when invoking `docker run`:
49
+
50
+```bash
51
+$ docker run --network=my_new_network debian:lastest
52
+```
53
+
54
+In the context of Jupyterhub, this last step is actually done with the following
55
+configuration in `jupyterhub_config.py`:
56
+
57
+```python
58
+c.DockerSpawner.network_name = 'my_new_network'
59
+```
60
+
61
+when we execute `docker network create` the Docker daemon actually creates a virtual
62
+ethernet bridge in the kernel. We can inspect this with `brctl`.
63
+
64
+```bash
65
+$ brctl show
66
+bridge name bridge id       STP enabled interfaces
67
+br-48d08d196dc8     8000.024245cf35a7   no
68
+docker0     8000.0242874f9221   no
69
+```
70
+
71
+We can see that our new docker network actually corresponds to the bridge interface `br-48d08d196dc8`.
72
+When a new Docker container is created its virtual network interface is attached to this
73
+bridge interface; just like if a physical machine was plugged into an ethernet switch.
74
+
75
+If we want a more manageable name for the virtual bridge, say `my_bridge`, we can pass it as an argument to
76
+`docker network create`:
77
+
78
+```bash
79
+$ docker network create --driver=bridge -o "com.docker.network.bridge.name"="my_bridge" my_network
80
+```
81
+
82
+
83
+#### Applying IPTables rules
84
+We can now use the bridge interface in IPTables rules to control access to docker containers connected
85
+to it.
86
+For example, if we want to prevent all containers on the network from accessing the internet, we
87
+could apply the following IPTables rule:
88
+
89
+```bash
90
+$ iptables -I DOCKER-ISOLATION -i my_bridge -o !my_bridge -m conntrack --cstate NEW -j REJECT
91
+```
92
+
93
+The above command says the following: Please reject TCP packets that arrive on `my_bridge` and are destined
94
+for a different interface, and which correspond to a new connection (i.e. they have the `SYN` flag set), and
95
+insert this rule before any others on the `DOCKER-ISOLATION` chain. The `DOCKER-ISOLATION` chain is
96
+installed by the Docker daemon when it is installed, and is jumped to from the `FORWARD` chain.
97
+
98
+One final thing to be aware of is the kernel configuration setting
99
+`net.bridge.bridge-nf-call-iptables`. The docker containers are connected to the same network
100
+bridge, which operates on the link layer. This means that packets destined for hosts attached
101
+to the same bridge don't need to go up to the IP layer of the network stack for the kernel to
102
+process them, which means that in principle IPTables does not act on packets that are exchanged
103
+between containers on the docker network. This behaviour can, however, be controlled with the above
104
+kernel configuration. This could be useful if, for example, we want to prevent any traffic
105
+between containers on `my_new_network`:
106
+
107
+```bash
108
+$ sysctl net.bridge.bridge-nf-call-iptables=1
109
+$ iptables -I DOCKER-ISOLATION -i my_bridge -o my_bridge -j DROP
110
+```
111
+
112
+[qt]: https://quantumtinkerer.tudelft.nl
113
+[jupyter]: https://jupyter.org
114
+[jhub]: https://jupyterhub.readthedocs.io/en/latest/
115
+[yv]: https://github.com/yuvipanda
116
+[cull]: https://github.com/jupyterhub/jupyterhub/tree/master/examples/cull-idle
117
+[tmpnb]: https://github.com/jupyter/tmpnb
118
+[kwant]: https://kwant-project.org
0 119
new file mode 100644
... ...
@@ -0,0 +1,18 @@
1
+---
2
+title: Kwant Tutorial
3
+date: 2019-02-27
4
+tags:
5
+  - coding
6
+  - python
7
+  - kwant
8
+---
9
+
10
+I recently gave a talk at the University of Maryland about Kwant and using it for
11
+quantum transport. The tutorial contains an introduction to the main features of Kwant,
12
+and also a relatively in-depth discussion of the internal linear algebra that Kwant uses.
13
+
14
+I made the slides using a Jupyter notebook, and they are available
15
+[on GitHub](https://github.com/jbweston/maryland-kwant-tutorial/) and is executable
16
+[on Binder](https://mybinder.org/v2/gh/jbweston/maryland-kwant-tutorial/master?filepath=index.ipynb).
17
+
18
+Happy Kwanting!
0 19
new file mode 100644
... ...
@@ -0,0 +1,38 @@
1
+---
2
+title: Markov Chain Monte Carlo for decryption
3
+date: 2018-11-20
4
+tags:
5
+  - coding
6
+  - haskell
7
+  - markov-chain
8
+draft: true
9
+---
10
+
11
+Each year I teach part of the Python programming course at the
12
+Casimir research school, and each year I try and think of more
13
+short projects to offer the participants during the latter half
14
+of the course. While fishing for ideas I came across an incredibly
15
+cool idea: using Markov chains to break classic cryptographic ciphers.
16
+
17
++ Found this paper
18
++ Idea is:
19
+  - Analyze a reference text and obtain bigram frequencies
20
+  - Construct a score function for a decryption key by finding
21
+    the frequencies of bigrams in the decrypted text
22
+  - Use this score function with the metropolis-hastings algorithm
23
+    to walk around the key space
24
++ Coded up a solution in Python in a couple of hours, also wanted
25
+  to give it a try in Haskell, to test out iHaskell and see how good
26
+  Haskell is for "exploratory" work
27
+
28
++ TL;DR for exploratory work Haskell seems too restrictive. Mediocre
29
+  library documentation and overly abstracted types make error messages
30
+  impossible to debug
31
+
32
+---
33
+
34
++ Keys are just maps between characters, we make RVars of them
35
++ Trying to make sense of the required pieces of RVars is intense
36
++ We need to run the whole markov chain before we can get the results; not cool!
37
+  Somewhere in our monad stack we are inserting some strictness; we need to find
38
+  out where!
0 39
new file mode 100644
... ...
@@ -0,0 +1,79 @@
1
+---
2
+title: Python + Postscript = Profit!
3
+date: 2016-03-12
4
+tags:
5
+  - coding
6
+---
7
+
8
+While setting up the computing environment for the "Introduction to
9
+Computational Quantum Nanoelectronics" [tutorial][mm16] at the APS March
10
+Meeting, I came across the problem that I needed to generate 150 chits of paper
11
+with login information on them. While all the login info was available in plain
12
+text form, this didn't really lend itself well to easy printing.  Given the
13
+number of chits we would have to generate I didnt really feel like manually
14
+copy/pasting/formatting the contents of the text file using word processing
15
+software. A colleague suggested that I take a look at [Postscript][ps], which
16
+is a language for creating vector graphics. It's pretty bare-bones in
17
+terms of the features it offers (it's mainly meant as the output of
18
+sophisticated document processors such as TeX), but getting a few lines of text
19
+layed out on a page it's perfect. The Python snippet below shows how simple it
20
+is to write a simple Postscript generator.
21
+
22
+```python
23
+import sys
24
+
25
+postscript_header = """
26
+    %%!PS-Adobe-2.0
27
+
28
+    /Inconsolata findfont
29
+    50 scalefont
30
+    setfont
31
+
32
+    %%Pages: {0}
33
+"""
34
+
35
+postscript_page = """
36
+    %%Page {0} {0}
37
+    %%BeginPageSetup
38
+      90 rotate 0 -595 translate
39
+    %%EndPageSetup
40
+
41
+    newpath
42
+    50 400 moveto
43
+    (user: {1}) show
44
+    50 300 moveto
45
+    (password: {2}) show
46
+    showpage
47
+"""
48
+
49
+pages = [postscript_header]
50
+for pagenum, line in enumerate(sys.stdin, 1):
51
+    user, passwd = line.split()
52
+    pages.append(postscript_page.format(pagenum, user, passwd))
53
+
54
+# now we know the number of pages, format the page header
55
+pages[0] = pages[0].format(pagenum)
56
+print('\n'.join(pages))
57
+```
58
+
59
+The above snippet takes username/password pairs from `stdin` and
60
+and writes a postscript document to `stdout`. It displays a single
61
+username/password per page in 50pt Inconsolata[^1] and oriented
62
+landscape. This can be read using most standard document viewers,
63
+and when printing the output can be compacted somewhat by printing
64
+several logical pages per physical page. The raw postscript can also
65
+be converted into other formats such as PDF, which is useful as the
66
+fonts are embedded directly into the document and mean that the
67
+document can be easily shared.
68
+
69
+Now that I've seen just how easy it is to generate proper documents
70
+with Python and postscript I'm sure that I'll be integrating it
71
+into my workflow more often!
72
+
73
+
74
+[^1]: This font is advantageous for username/password combinations
75
+      as it distinguishes zeros from O's by putting a slash through
76
+      the former.
77
+
78
+[mm16]: http://kwant-project.org/mm16
79
+[ps]: https://en.wikipedia.org/wiki/PostScript
0 80
new file mode 100644
... ...
@@ -0,0 +1,103 @@
1
+---
2
+title: Stop squashing your commits
3
+date: 2018-11-20
4
+tags:
5
+  - coding
6
+  - git
7
+draft: true
8
+---
9
+
10
+Actually, don't. Or do. Well, actually, it depends.
11
+The question of whether or not you should squash your git commits together is practically a holy war at this point,
12
+but I would like to give my two cents on the issue, if only to clarify my thoughts to myself.
13
+
14
+### Background
15
++ when starting out, people are taught to *commit early, commit often*
16
+When people start learning Git, one of the mantras that they first learn is:
17
+
18
+> Commit early, commit often.
19
+
20
+The idea is to get people
21
++ gets used to idea that operations in git are *cheap*, and the advantages gained by telling git about changes
22
+  greatly outweigh the disadvantages.
23
++ in fact, because all but a few git operations are *local*, you don't need to worry about having everything fleshed
24
+  out before taking advantage of git.
25
++ implicit in the above is the idea
26
+
27
+
28
+### Git workflows
29
++ other people have discussed git workflow, but not really gone into detail about how to craft commits
30
++ hack until a feature is fully fleshed out, then commit it
31
+  - goes againt *commit early, commit often*
32
+  - don't do this
33
++ hack until a feature is fully fleshed out, commiting at random
34
+  - git delivers diminishing returns if commits don't correspond to logical changes
35
+    - hard to skip back to a "working" state
36
+    - harder to see what has actually been changed without contextual and correct commit messages 
37
+  - commits can be "at random" to a greater extent, but the idea is that commits are made according
38
+    to some scheme that is divorced from the code (e.g. commiting before you go to lunch, or at the end of
39
+    the day
40
++ split feature into smaller pieces, then tackle these pieces one at a time, making a single commit for each
41
+  - this is the "ideal", we should strive for this
42
+  - difficult to do in practice because it presupposes that you already understand the problem well enough
43
+    to split it up.
44
+
45
+###
46
+
47
+
48
+The idea is to get people
49
+used to the idea that, when using git, commits are *cheap*. This is in contrast to older version control systems
50
+(VCS) where commiting is often a rather hefty operation, and where the idea of making *several commits a day* is
51
+pretty crazy. The unintended consequence of this mentali
52
+
53
+The fundamental point is that *git history is a historical record*. This might seem like a tautology, but
54
+my point is that there are several ways of interpreting git history.
55
+
56
+
57
+### The git history as an eye-witness account / lab book
58
+History is *what actually happened*.
59
+"from the trenches". You feel like you're there; oh the trailing whitespace; oh the humanity!.
60
+Get accounts from both sides of the battle.
61
+
62
++ immutability is a public good
63
++ don't get problems when collaborating
64
+
65
+### The git history as a historical account / scientific article
66
+History is written by the victors. When writing a scientific article, you have to keep in mind
67
+that the majority of your readers don't care whether you spent several weeks on a particularly
68
+difficult calculation, trying many routes that ended up as dead ends, all they care about is
69
+the *correct* method for obtaining the results.
70
+
71
+Of course, it may be that you publish something incorrect, and later have to write another paper
72
+to correct the previous one (rare in the academic world, but very common when writing code). This
73
+is to some extent unavoidable, but it does not diminish the importance of taking the time to distill
74
+down the experience into a digestible portion.
75
+
76
+----
77
+*Why are you recording all of history*? Why not use dropbox, or dropbox + some history (say 1 week / 1 month).
78
+Could be several reasons:
79
+
80
++ auditing (knowing who changed what). Might be important for businesses for knowing who to promote/fire
81
++ improving understanding
82
++ increased control / granularity
83
+
84
+### The git history as a lab book
85
+An immutable record of things "as they went down". Many dead ends and mistakes.
86
+Requires extra work to see what the actual progress was.
87
+
88
+### The git history as a journal article
89
+Curated to make the content as understandable as possible. Nobody cares that you spent 3
90
+days tracking down a particularly insidious bug.
91
+
92
+
93
+### Two distinct modes of operation
94
++ A personal record
95
+  - a quick backup in case I accidentally `rm` something
96
+  - allows to explore various paths
97
+
98
++ An account for others to understand the decisions that went into making the code.
99
+  If there is something that I don't understand, I will usually grep the git log for that
100
+  change. 
101
+
102
+- could use branches as the "article" and individual commits as the "lab book". The message of the merge commit
103
+  should be the same as the message of the single commit in the "article" way of working.
0 104
new file mode 100644
... ...
@@ -0,0 +1,87 @@
1
+---
2
+title: A case for rebase
3
+date: 2017-10-23
4
+tags:
5
+  - coding
6
+  - git
7
+---
8
+
9
+There are a lot of Git tutorials on the web that teach people to use `git pull`
10
+when first teaching them about working with remote repositories and collaboration.
11
+I would like to put forward the position that this is a Bad Idea (TM), and that
12
+it is more instructive to teach people to use `git fetch` followed by an explicit
13
+`git merge`.
14
+
15
+I understand the temptation of teaching people to just `git pull`, because it's
16
+a single command (rather than 2) and often it "just werks". On the other hand
17
+I get the impression that teaching people only `git pull`
18
+reinforces an incorrect mental model that causes a ton of confusion when
19
+there are (as there inevitably are) conflicts with the remote repository.
20
+In addition, I've noticed that often people just want to see what their collaborators have
21
+done, without necessarily incorporating those changes into their own work.
22
+Teaching the two operations separately enables this workflow; without it you have
23
+to introduce `git reset` just so that people can get themselves back to their previous
24
+state!
25
+
26
+Because working with a remote repository is essentially (pedants, please contain
27
+yourselves) working with multiple branches I personally think that it is really useful to
28
+teach branches *before* remote repositories[^1]. Once people have the concept of
29
+branches down, it's then a pretty small leap to "by the way, you can fetch
30
+the state of *other people's branches* with `git fetch`". You then explain that
31
+the branch shows up on your local machine as `origin/whatever-branch-name`, and
32
+that you shouldn't try and make commits directly on this branch because it's
33
+"owned" by `origin`. At this point it's probably a good idea to show what happens
34
+when the remote repository is updated by somebody else, so that there is a "fork"
35
+in the history:
36
+
37
+          ◯—◯ ← origin/master
38
+         ╱
39
+    ◯—◯—◯—◯—◯ ← master
40
+
41
+[^1]: This is, of course, tough if you are teaching a Github-centric workflow.
42
+      One way around this may be to get people to initialize their local repositories
43
+      by cloning, and then forget about the remote entirely until the time is right.
44
+
45
+You can then say "ok, `origin/master` and `master` now contain *different things*;
46
+we need to incorporate the changes on `origin/master` with our ones".
47
+With that you introduce `git merge`, and can show the updated history after that
48
+operation:
49
+
50
+          ◯—◯ ← origin/master
51
+         ╱   ╲
52
+    ◯—◯—◯—◯—◯—◯ ← master
53
+
54
+then you can `git push origin master` and show what that does locally:
55
+
56
+          ◯—◯
57
+         ╱   ╲
58
+    ◯—◯—◯—◯—◯—◯ ← master, origin/master
59
+
60
+Teaching this sequence of operations, it is abundantly clear that `git fetch` only
61
+updates `origin/master`; it will *never affect what you are working on right now*.
62
+It's the way that you see what other people are working on, while you also continue
63
+working on your own thing. It's also clear that `git merge` *totally affects what
64
+you're working on right now*, so you'd better get yourself into a place where
65
+you're ready to have your files modified as git magically incorporates all those
66
+sweet sweet changes that your buddy just pushed.
67
+
68
+This workflow also mitigates the common pitfall of:
69
+
70
+    $ git push
71
+        To git-example-origin
72
+        ! [rejected]        master -> master (fetch first)
73
+        error: failed to push some refs to 'git-example-origin'
74
+    $ git pull
75
+        Auto-merging
76
+        CONFLICT (content): Merge conflict in hello-world
77
+        Recorded preimage for 'hello-world'
78
+        Automatic merge failed; fix conflicts and then commit the result.
79
+
80
+So instead of "congratulations, your code is now full of conflict markers,
81
+have fun!" you get to inspect the changes that were introduced by the remote
82
+*before* your try to merge them in. This means you can anticipate if there
83
+will be any problems, and know what to expect when you try to merge.
84
+
85
+You could even imagine running `git fetch` periodically to keep `origin`
86
+up to date with any changes on the remote. This would be complete
87
+madness if you tried to do the same thing with `git pull`!
0 88
new file mode 100644
... ...
@@ -0,0 +1,125 @@
1
+---
2
+title: On thinking differently
3
+date: 2018-02-07
4
+tags:
5
+  - coding
6
+draft: true
7
+---
8
+
9
+This post is about an experience I had while solving a kata-style coding
10
+exercise. While the problem itself was very well defined and had a simple
11
+solution, I was very taken aback that I did not see the *most* elegant and
12
+simple solution, despite my proclaimed fluency with programmatic problem
13
+solving. This experience taught me that I still have a lot to learn about
14
+thinking outside the box, and I'm writing it down here mainly to try and
15
+articulate my thoughts to myself.
16
+
17
+#### Let's begin
18
+
19
+A colleague of my partner regularly posts small coding exercises to the
20
+company Slack channel. I think that this is a great idea for several reasons:
21
+
22
++ it gives people practice at translating problem specifications into code,
23
++ it makes people think about problems that are different to those on which
24
+  they work day to day,
25
++ and it provides a central point for discussions about the merits of different
26
+  ways of attacking problems, in addition to coding style.
27
+
28
+The exercises do not even have to be very complicated (in fact I think that this
29
+is best); the most recent exercise was as follows:
30
+
31
+> Write a function that returns the most commonly occurring alphabetic character
32
+> in a string, treating uppercase and lowercase letters as equivalent.
33
+>
34
+> If two characters occur equally often, the the one that occurs earlier in
35
+> the alphabet should be returned.
36
+
37
+#### My solution
38
+
39
+Seems simple enough, right? I coded up the simplest solution I could think
40
+of in a few minutes
41
+
42
+    :::python
43
+    from collections import Counter
44
+
45
+    def most_common(s):
46
+        s = (c for c in s.lower() if c.isalpha())
47
+        most_common, count = max(Counter(s).items(),
48
+                                 key=lambda c: (c[1], -ord(c[0])))
49
+        return most_common
50
+
51
+I will call the above code "solution 1".
52
+I was convinced that this was the optimal solution:
53
+
54
++ We filter out only the characters we care about, so the counting logic
55
+  does not run for characters that we will later throw away,
56
++ We use a generator expression to avoid making a copy of the (potentially large)
57
+  string in memory
58
++ We make a single pass over the input string
59
+
60
+#### The *other* solution
61
+
62
+This was the solution that was posted to the Slack channel after everyone had
63
+submitted theirs:
64
+
65
+    :::python
66
+    from string import ascii_lowercase
67
+
68
+    def most_common(s):
69
+        return max(ascii_lowercase, key=s.lower().count)
70
+
71
+I will call this code "solution 2". Just looking at it this is *much* cleaner
72
+than solution 1 (although, embarrassingly, it actually took me a minute to
73
+understand how it handles the edge case where two characters have the same
74
+count). It also works in a fundamentally different way to solution 1:
75
+here we iterate over the characters that we are interested in (`ascii_lowercase`)
76
+and compare them based on the number of times that they occur in the input
77
+string, taking the character with the maximum count. If several characters
78
+have the same counts, then `max` will choose the one that occurred first
79
+(it has the [same semantics as a stable sort][max-doc]).
80
+
81
+[max-doc]: https://docs.python.org/3/library/functions.html#max
82
+
83
+Despite its readability I was initially skeptical because we make *26 passes
84
+over the input string*, rather than just 1. It is also the case that even if the
85
+input string contains only the character 'a' (for example) we will still iterate
86
+through the damn string 25 more times, counting up the occurrences of 'b', 'c'
87
+and so on! This is even though we *know* that it doesn't contain anything but `a`s after the
88
+first iteration. My partner had actually tried to solve the problem in a
89
+similar manner to this, but I had dismissed it as suboptimal for the
90
+aforementioned reason. I said to myself "*sure, this seems cleaner, but
91
+there's **no way** that it's more efficient*".
92
+
93
+This is why it was a huge shock to me that *solution 2 actually outperforms
94
+solution 1* in terms of run time.
95
+
96
+What I failed to account for, is that *in this case we don't care about
97
+asymptotic complexity*. Subconsciously I had been thinking: "*hm, if the problem
98
+requirements change and we now want to find the most commonly occurring unicode
99
+character then we would have to iterate over the input string [a hundred
100
+thousand times][unicode]; not cool!*". However, the problem very clearly states
101
+that *we only care about ascii lowercase characters*. In this regime
102
+solution 2 performs way better because the counting of individual characters is
103
+done by the builtin string method `str.count`, which uses a [tight C
104
+loop][fastcount]. Compare this to solution 1, where we iterate over the input
105
+string in a [python loop][counter], incurring the additional cost of a
106
+dictionary lookup, integer addition, and an `isalpha()` check from Python, phew!
107
+
108
+[unicode]: https://en.wikipedia.org/wiki/List_of_Unicode_characters
109
+[fastcount]: https://github.com/python/cpython/blob/master/Objects/stringlib/fastsearch.h#L187
110
+[counter]: https://github.com/python/cpython/blob/master/Lib/collections/__init__.py#L486
111
+
112
+#### Thoughts
113
+This blog post is mainly just me proving to myself that, in this instance, *I was
114
+wrong*. My solution was inferior in every possible metric. This was initially quite hard to swallow, as
115
+before seeing solution 2 I was fully convinced that it was impossible within the
116
+confines of Python to express a solution more cleanly and efficiently. Boy have I
117
+got a lot to learn!
118
+
119
+It was also a good reminder for me to make sure that I actually optimise
120
+my designs for the *intended use case*. I have a natural tendency to try and
121
+write code that solves a problem more general than the one initially
122
+formulated. Although I'll try and justify this as making the code more
123
+"reusable" or "extensible", the real reason is probably just that I enjoy
124
+extracting the abstract structure of a problem. I really have to work on not
125
+[abstracting into the stratosphere](https://www.joelonsoftware.com/2001/04/21/dont-let-architecture-astronauts-scare-you/).
0 126
new file mode 100644
... ...
@@ -0,0 +1,41 @@
1
+---
2
+title: April fools!
3
+date: 2018-04-05
4
+tags:
5
+  - physics
6
+draft: true
7
+---
8
+
9
+So on April 1st Anton and I posted to the group's [blog](https://quantumtinkerer.tudelft.nl/blog/machine-learning-articles/) about
10
+a fascinating project that we'd been working on in the preceding month. We had been using "advanced machine learning techniques" to
11
+conduct sentiment analysis on scientific articles to see if they contain irrefutable evidence for various breakthroughs such as
12
+a working quantum computer or (from our own field) Majorana zero modes. Try it for yourself below!
13
+
14
+<iframe class="centered" src="https://ai.weston.cloud"></iframe>
15
+
16
+To the untrained eye it even looks pretty plausible: there's
17
+a flashy animation when the "predicting" happens, and it even shows you the name of the article that you asked about. Of course
18
+digging even a little bit into the source code reveals that we're using a
19
+[somewhat simplistic model](https://gitlab.kwant-project.org/jbweston/is-it-majoranas/blob/master/backend/Main.hs#L21)
20
+(no chance that we're overfitting here!), nevertheless we manage to get 100% accurate results!
21
+
22
+What surprised me the most was that Anton was managing to sustain conversations with colleagues about this "project"
23
+and people seemed to be treating it seriously! Of course it's entirely possible that they were just playing along,
24
+and that we were, in fact, the ones who were being trolled.
25
+
26
+In any case it was a fun little project for me, as my main goal was to test out the
27
+[Elm language](http://elm-lang.org/) for building frontends for webapps.
28
+We'd switched to the React framework for the rewrite of our [Zesje](http://gitlab.kwant-project.org/zesje/zesje) grading software but
29
+I was eager to see what pure functional programming could bring to the game with respect to managing state. Although in principle I
30
+find the idea of modelling a webapp as a well-defined state machine appealing, in practice I found that for such a small project the
31
+hoop-jumping was more hassle than it was worth.
32
+
33
+I was also interested in seeing how easy it would be to write a small web API in Haskell.
34
+The excellent [Haskell From First Principles](haskellbook.com) uses the Scotty web framework in several examples, so I thought
35
+I'd give that a go. Again, I think that the limited scope of the project really hindered any possible gains that pure functional
36
+programming could provide. Even if pure functional programming gives an asymptotic advantage (in terms of development time and
37
+confidence about a codebase), the relatively large prefactor associated with getting anything done is really significant.
38
+For example, I had to research and import 3 separate network-related libraries to be able to serve the API (`Scotty`), return
39
+HTTP 400 responses (`Network.HTTP.Types`) and send web requests to other APIs (`Network.Wreq`), in addition to another library
40
+(Lens) for accessing attributes from the responses from the `Wreq` library (really). Sometimes it feels like Haskell makes
41
+the simple things much more complicated than they need to be (even if it does make some complicated things easier).