Posted by: fdmanana | December 8, 2010

Adding posix_fallocate to Erlang/OTP

posix_fallocate is a POSIX optional system call that allows to reserve space for a file. It guarantees that subsequent writes will not fail if the total written amount doesn’t exceed the allocated amount of space.

Its big advantage is that when the space is allocated, the kernel tries to allocate contiguous disk blocks, which will speed up IO operations.

It was recently submitted and accepted to Erlang/OTP:

http://www.erlang.org/cgi-bin/ezmlm-cgi?3:sss:1716:201012:mdkoigfchoccmjhpglji#b

https://github.com/erlang/otp/commit/ea3dfb992c769a7d47de1892284b125212d13179

Posted by: fdmanana | October 5, 2010

Purely Functional Data Structures

A good finding at google books, “Purely Functional Data Structures” by Chris Okasaki:

Purely Functional Data Structures (Chris Okasaki)

Now in my TOREAD list.

Posted by: fdmanana | September 27, 2010

Streaming the body of HTTP POST/PUT requests with Erlang OTP

Yesterday I submitted a patch to the Erlang OTP erlang-patches mailing list to add a feature, to the httpc module, that has been lacking for a long time:

Streaming the body of HTTP PUT and POST requests.

It has just been merged into OTP’s pu branch:

http://github.com/erlang/otp/commit/0ae050e3240f1aa68d8d648a36191246f33374b4

http://www.erlang.org/cgi-bin/ezmlm-cgi?3:sss:1483:201009:fcbeggiaekkjadoghldl#b

Hopefully it will get into the next R14 release.

UPDATE: A few days after I submitted another patch on top of that one:

http://github.com/erlang/otp/commit/2809acda106cdd081746d2f2b7d4ddd8c96eff76

http://www.erlang.org/cgi-bin/ezmlm-cgi?3:sss:1483:201009:fcbeggiaekkjadoghldl#b

It adds support for automatically chunking (HTTP chunked Transfer-Encoding) the payload based on what the streaming function returns on each call.

UPDATE: Did some simplifications to the implementation and API when it automatically adds chunked-transfer encoding headers. The new full patch:
https://github.com/erlang/otp/commit/6ec259d2828ac44ee71c7b32392497ba1712ed48

Posted by: fdmanana | September 24, 2010

The new SSL implementation in Erlang OTP

Recently, I was trying the new SSL implementation of OTP. This new implementation appeared in the R12 series and is now the default one in R14. Unlike the “old” implementation, this one is mostly done in Erlang (instead of being basically a wrapper around the OpenSSL library) and only uses the cryptographic functions that the OpenSSL library provides.

The motivation to try it out, was that I was having often (but not very often) errors like the following from the “old” SSL implementation:


** {error,{badinfo,{tcp,#Port,
                        <<"\r\n6d\r\n,\n{\"seq\":70,\"id\":\"97b36d5003934d0c9dd58057b05fa167
\",\"changes\":[{\"rev\":\"1-0d6deda5b380ae207ba87a7a3a32d0a1\"}]}\r\n6d\r\n,\n{\"seq\":71,\"id\":
\"8a1c475b8dc5426e9172d6b970ae7c03\",\"changes\":[{\"rev
\":\"1-72851f645fb6ab77f36866cbe505d82c\"}]}\r\n6d\r\n,\n{\"seq\":72,\"id\":
\"fdb1d5b1c5b24ce481463ad668c13c40\",\"changes\":[{\"rev\":\"1-
c37b5444eec8375631c326a0e77ca427\"}]}\r\n6d\r\n,\n{\"seq\":73,\"id\":
\"b612465dafc44699b09d8bef5d4d4d8d\",\"changes\":[{\"rev\":\"1-
be951f78ba830f5a1002abe0ce479c2d\"}]}\r\n6d\r\n,\n{\"seq\":74,\"id\":
\"d2c2b5a771ef4b57b6d58fce2808cf7c\",\"changes\":[{\"rev\":\"1-
c628443ff4dd7c3d9b4fd226727e2841\"}]}\r\n6d\r\n,\n{\"seq\":75,\"id\":
\"8d669c377f08442981ce2d18a21d920b\",\"changes\":[{\"rev
\":\"1-6db3a14c76701b87b0686412093ac103\"}]}\r\n6d\r\n,\n{\"seq\":76,\"id\":
\"367bf0948d9d459582d187c9232844b8\",\"changes\":[{\"rev
\":\"1-16ae7cf1c04c4f7c024493de1f18c8ed\"}]}\r\n6d\r\n,\n{\"seq\":77,\"id\":
\"f2c805327ae740098e5db221c3f27b4b\",\"changes\":[{\"rev\":\"1-
b22aa541f7e353a4cd430a9293239c77\"}]}\r\n6d\r\n,\n{\"seq\":78,\"id\":
\"6ddf8033cec845c8986ee4bd03ff8ed6\",\"changes\":[{\"rev
\":\"1-23f5957d250f5079277e6e4a86def1f1\"}]}\r\n6d\r\n,\n{\"seq\":79,\"id\":
\"738365bd4fed44158516211847c13616\",\"changes\":[{\"rev
\":\"1-6dcd375366f107fb2575c8eda6c6bdec\"}]}\r\n6d\r\n,\n{\"seq\":80,\"id\":
\"2d66c797761b4506934d00b2fd260f90\",\"changes\":[{\"rev\":\"1-
cc7dddd31fd753a9b4577607ce321cef\"}]}\r\n6d\r\n,\n{\"seq\":81,\"id\":
\"0c01c012d4f540a3a015d57681a0af4f\",\"changes\":[{\"rev\":\"1-
ff288fbba546fbfbf78c602e2fa39ea2\"}]}\r\n6d\r\n,\n{\"seq\":82,\"id\":
\"dc8a7ff04d37428ea83c3515a801bd32\",\"changes\":[{\"rev\":\"1-2">>}}}

(Yes, this was CouchDB related). So I tried the following code in OTP R13B03 and R13B04:


test() ->
    Body = iolist_to_binary([
        "GET / HTTP/1.1\r\n",
        "Host: ", ?HOST, "\r\n",
        "Accept: */*\r\n",
        "Connection: close\r\n", "\r\n"
    ]),
    application:start(crypto),
    application:start(public_key),
    application:start(ssl),
    Options = [
                {ssl_imp, new},
                binary,
                {nodelay, true},
                {active, false},
                {verify, verify_peer},
                {depth, 3},
                {cacertfile, "/etc/ssl/certs/ca-certificates.crt"}
    ],
    {ok, S} = ssl:connect(?HOST, 443, Options),
    ok = ssl:send(S, Body),
    loop(S),
    ssl:close(S).

loop(S) ->
    ssl:setopts(S, [{active, once}]),
    receive
    {ssl, S, Data} ->
        io:format("received data:  ~p~n", [Data]),
        loop(S);
    {ssl_closed, S} ->
        io:format("socket closed", []);
    {ssl_error, S, Error} ->
        io:format("socket error:  ~p", [Error])
    end.

And I was getting the following stack trace when ssl:connect/3 was called:


=ERROR REPORT==== 17-Sep-2010::18:33:04 ===
SSL: 1056: error:{error,
                  {badmatch,
                   {error,
                    {asn1,
                     {'Type not compatible with table constraint',
                      {{badmatch,{error,{asn1,{wrong_tag,{5,16}}}}},
                       [{'OTP-PUB-KEY','dec_Dss-Parms',2},
                        {'OTP-PUB-KEY',dec_SignatureAlgorithm,2},
                        {'OTP-PUB-KEY',dec_OTPTBSCertificate,2},
                        {'OTP-PUB-KEY',dec_OTPCertificate,2},
                        {'OTP-PUB-KEY',decode,2},
                        {pubkey_cert_records,decode_cert,1},
                        {public_key,pkix_decode_cert,2},
                        {ssl_certificate_db,add_certs,3}]}}}}}} /etc/ssl/certs/ca-certificates.crt
  [{ssl_connection,init_certificates,2},
   {ssl_connection,ssl_init,2},
   {ssl_connection,init,1},
   {gen_fsm,init_it,6},
   {proc_lib,init_p_do_apply,3}]

I was finding it weird, since the trusted certificates files I was providing was in the PEM format (supported according to the man page) and it worked with the “old” SSL implementation.

I posted a message to the erlang-bugs mailing list reporting the issue, since it seemed to me that it was a regression:

http://www.erlang.org/cgi-bin/ezmlm-cgi?2:sss:2031:201009:nkpigljldefpimkjppbn#b

It turned out to be a true regression.
Fortunatelly, Ingela Anderton Andin, from the OTP team, quickly responded and worked on a few patches against the R14B release that I tried out until it worked. Those patches are all available at her github account: http://github.com/IngelaAndin

(I must say github is one of my favourite free services on the Web, congratulations to the creators and maintainers).

A special thanks to Ingela for her quick response.
I squashed the relevant commits into a single patch to apply against R14B and it’s available here:

Since Ubuntu is using R13B03 and can not update to R14B so soon (it’s a very recent release and besides desktopcouch/couchdb, they have other Erlang OTP dependents), I prepared them an equivalent patch to apply against R13B03 (the ssl and public_key code has quite a lot of diferences between R13 and R14), available in the following github gist:

http://gist.github.com/594316

Also, as part of that same erlang-bugs thread, it was also proposed a suggestion for adding an extra possible value passed to the certificate validation chain function (option verify_fun) that allows for distinguishing between unknown CAs (not listed in the trusted certificates file) and certificates self-signed by the peer (something common in intranets). This because currently, as of R14B, the term {bad_term, unknown_ca} is used to signal both cases (unknown CA and self-signed ceritificate.

It turns out that the suggestion was accepted by the OTP team and is now available in development branches (will make it into the next OTP R14 series release):

http://github.com/IngelaAndin/otp/commit/3962e4f5d7a496c32862b05eeab026837a6ff681

After that commit, an unknown CA error is still represented by the term {bad_cert, unknown_ca} and a self-signed certificate is now represented by the term {bad_cert, selfsigned_peer} (the “old” SSL implementation allowed to distinguish both cases as well).

Conclusion:

If you use the new SSL implementation (default on R14), don’t except to be able to use the certificates file in a Ubuntu system (and also in a Linux Caixa Mágica system). You’ll have to apply one of those patches available in the gists mentioned above.

I’m a bit surprised that I was the first one finding and reporting this issue/regression.

A big thanks to Ingela Andin from the OTP team for the quick response.

Posted by: fdmanana | September 2, 2010

List concatenation in Erlang

Recently I looked at the myth that tells us that the list concatenation operator in Erlang is inefficient.
This is mentioned at The Eight Myths of Erlang Performance section 2.4.

The meaning of “inefficient” here is in comparison with other approaches. A common approach I see very often in Erlang code is:

lists:flatten( [ List1, List2 ] )

I decided to write a little performance test that compares the following approaches:

  • List1 ++ List2
  • lists:flatten( [ List1, List2 ] )
  • lists:append( List1, List2 )
  • lists:append( [ List1, List2 ] )

The tests’ code is:


-module(teste).
-compile(export_all).

-define(ITERS, 100).
-define(LIST_SIZE, 1000000).

concat_plus_plus(L1, L2) -> L1 ++ L2.

run() ->
    crypto:start(),

    {ok, T1, Dev1} = run_test(?ITERS, ?MODULE, concat_plus_plus, fun gen_args_2/0),
    io:format("Operator ++: ~p iterations, each list with ~p elements, "
        "average time of ~p milisecs, standard deviation: ~p~n",
        [?ITERS, ?LIST_SIZE, T1, Dev1]),

    {ok, T2, Dev2} = run_test(?ITERS, lists, flatten, fun gen_args_1/0),
    io:format("lists:flatten: ~p iterations, each list with ~p elements, "
        "average time of ~p milisecs, standard deviation: ~p~n",
        [?ITERS, ?LIST_SIZE, T2, Dev2]),

    {ok, T3, Dev3} = run_test(?ITERS, lists, append, fun gen_args_2/0),
    io:format("lists:append(L1, L2): ~p iterations, each list with ~p elements, "
        "average time of ~p milisecs, standard deviation: ~p~n",
        [?ITERS, ?LIST_SIZE, T3, Dev3]),

    {ok, T4, Dev4} = run_test(?ITERS, lists, append, fun gen_args_1/0),
    io:format("lists:append( [L1, L2] ): ~p iterations, each list with ~p elements, "
        "average time of ~p milisecs, standard deviation: ~p~n",
        [?ITERS, ?LIST_SIZE, T4, Dev4]).

run_test(Times, Mod, Fun, GenArgs) ->
    Ts = lists:foldl(        
        fun(_, Acc) ->
           Args = GenArgs(),
           {T, _} = timer:tc(Mod, Fun, Args),
           [T | Acc]
        end,
        [], lists:seq(1, Times)),
    Avg = lists:sum(Ts) / length(Ts),
    {ok, round(Avg / 1000), round(std_dev(Ts, Avg) / 1000)}.

std_dev(Values, Avg) ->
    Sums = lists:foldl(
        fun(V, Acc) -> D = V - Avg, Acc + (D * D) end,
        0, Values),
    math:sqrt(Sums / (length(Values) - 1)).

gen_args_2() ->
    L1 = binary_to_list(crypto:rand_bytes(?LIST_SIZE)),
    L2 = binary_to_list(crypto:rand_bytes(?LIST_SIZE)),
    [L1, L2].

gen_args_1() ->
    L1 = binary_to_list(crypto:rand_bytes(?LIST_SIZE)),
    L2 = binary_to_list(crypto:rand_bytes(?LIST_SIZE)),
    [[L1, L2]].

Running the tests 3 times in a row:


Erlang R13B04 (erts-5.7.5)  [64-bit] [smp:2:2] [rq:2] [async-threads:0] [hipe] [kernel-poll:false]

Eshell V5.7.5  (abort with ^G)
1> c(teste).
{ok,teste}
2> teste:run().
Operator ++: 100 iterations, each list with 1000000 elements, average time of 32 milisecs,
standard deviation: 23
lists:flatten: 100 iterations, each list with 1000000 elements, average time of 138 milisecs,
standard deviation: 39
lists:append(L1, L2): 100 iterations, each list with 1000000 elements, average time of 59 milisecs,
standard deviation: 6
lists:append( [L1, L2] ): 100 iterations, each list with 1000000 elements, average time of 82 milisecs,
standard deviation: 18
ok
3> teste:run().
Operator ++: 100 iterations, each list with 1000000 elements, average time of 66 milisecs,
standard deviation: 22
lists:flatten: 100 iterations, each list with 1000000 elements, average time of 151 milisecs,
standard deviation: 51
lists:append(L1, L2): 100 iterations, each list with 1000000 elements, average time of 34 milisecs,
standard deviation: 22
lists:append( [L1, L2] ): 100 iterations, each list with 1000000 elements, average time of 98 milisecs,s
tandard deviation: 38
ok
4> 
4> teste:run().
Operator ++: 100 iterations, each list with 1000000 elements, average time of 35 milisecs,
standard deviation: 26
lists:flatten: 100 iterations, each list with 1000000 elements, average time of 155 milisecs,
standard deviation: 52
lists:append(L1, L2): 100 iterations, each list with 1000000 elements, average time of 63 milisecs,
standard deviation: 15
lists:append( [L1, L2] ): 100 iterations, each list with 1000000 elements, average time of 89 milisecs,
standard deviation: 34
ok
5> 

So in the end either the ++ operator or the lists:append function are the best approaches.

I’m wondering if this applies to Caml as well (operator @ versus functions in the List module).
The List module man page for Caml explicitily says that the implementation for the functions append, concat and flatten are not tail recursive. This gives me the idea that underneath they’re not implemented in C but in pure Caml.
My Caml skills are now too rusty, and would need some time to write similar test code in Caml.
Maybe I’ll do it for a future post.

Posted by: fdmanana | July 14, 2010

My CouchDB retrospective

This is a summary about how I got into the Apache CouchDB community.

By late summer 2009, my friend Sérgio Veiga told me he was using Erlang for his job and how cool the language is as well as the Erlang OTP platform. He knew I was a fan of OCaml and Prolog back in the academic years, and so I would immediately embrace Erlang.

I started by reading Joe Armstrong’s book Programmning Erlang. Then I was thinking about creating some Erlang project but hadn’t ideas for creating something useful and original and I didn’t felt like porting some library or framework from some other language. So I decided to google for existing open source projects done in Erlang. Amongst the first I found was CouchDB.

Back then I was not familiar at all with NoSQL. The closest thing I knew about was probably memcached . At the moment my job was related to Java enterprise (J2EE, Spring, Hibernate, etc), so I immediately started to sympathize with CouchDB’s simplicity and base principles. I started to realize how unnecessarily complex (and counter productive) the Java enterprise + Oracle + Hibernate world is.

So I decided to start contributing to CouchDB. I started searching for the simplest tasks in the Apache Software Foundation’s issue tracking system (JIRA) just to get to know better CouchDB. The first issue I tackled was for Futon, the administration Web UI. This task only involved JavaScript, CSS and HTML. It very straightforward.

Afterwards I started looking into issues involving the Erlang side of CouchDB. The first JIRA issue involving Erlang that I solved was in fact a minor new feature for the CouchDB 0.11 release. This issue was a challenge at the time because it implied understanding the existing HTTP server layer code, some HTTP details I was not aware of (chunked transfer encoding, Content-MD5 header, content encodings) and the code for streaming attachments into a DB (forced me to learn how the core DB code words, the storage details). Paul Davis helped me improving the patch and getting it committed (thanks Paul). That feature also originated a small patch for the Mochiweb project.

Those 2 patches (CouchDB ticket 558 and Mochiweb) were the first real Erlang code I ever wrote.

I then started looking for more (complex) issues to solve in their JIRA system and started following the development mailing list and the IRC channel. By far most of the contributions I gave were targeted at solving existing bugs or implementing features requested by the community. Bringing completely new ideas to the project was never easy, as I had never developed an application using CouchDB (not even today), yet I was able to contribute with code.

Last month I was elected committer and today Apache CouchDB 1.0.0 was released. For me it has been a very rewarding experience for 2 reasons:

  • Technically I learned a LOT – from Erlang and OTP principles to an alternative paradigm for data modeling and storage, many HTTP REST details and a new vision of JavaScript and its potential (CouchApps and CouchDB related projects implemented with node.js). This learning was not just a result from reading existing code and writing new code but it was also a result from the interaction with the committers and other developers.
  • The community. CouchDB’s community is very friendly, helpful and dedicated. This community has been growing fast.

CouchDB is growing and taking a unique direction towards data modeling, replication and Web applications.

Posted by: fdmanana | June 24, 2010

Embracing Concurrency at Scale

A very good talk about distributed systems and high concurrency given by Justin Sheehy:

http://www.infoq.com/presentations/Embracing-Concurrency-At-Scale

Posted by: fdmanana | June 21, 2010

OAuth

I’m now reading all (or most of) the OAuth implementation details:

http://hueniverse.com/oauth/guide/

It was about time to read it…

Posted by: fdmanana | June 16, 2010

2 interesting articles

Today Wout Mertens sent a mail to CouchDB’s development mailing list pointing to 2 interesting articles:

http://fdmanana.files.wordpress.com/2010/06/you_are_doing_it_wrong_server_performance.pdf

and

http://fdmanana.files.wordpress.com/2010/06/cache-oblivious-string-btrees.pdf

Just found this interesting article:

http://fdmanana.files.wordpress.com/2010/06/combining_events_and_threads.pdf

Older Posts »

Categories

Follow

Get every new post delivered to your Inbox.