Posted by: fdmanana | September 2, 2010

List concatenation in Erlang

Recently I looked at the myth that tells us that the list concatenation operator in Erlang is inefficient.
This is mentioned at The Eight Myths of Erlang Performance section 2.4.

The meaning of “inefficient” here is in comparison with other approaches. A common approach I see very often in Erlang code is:

lists:flatten( [ List1, List2 ] )

I decided to write a little performance test that compares the following approaches:

  • List1 ++ List2
  • lists:flatten( [ List1, List2 ] )
  • lists:append( List1, List2 )
  • lists:append( [ List1, List2 ] )

The tests’ code is:


-module(teste).
-compile(export_all).

-define(ITERS, 100).
-define(LIST_SIZE, 1000000).

concat_plus_plus(L1, L2) -> L1 ++ L2.

run() ->
    crypto:start(),

    {ok, T1, Dev1} = run_test(?ITERS, ?MODULE, concat_plus_plus, fun gen_args_2/0),
    io:format("Operator ++: ~p iterations, each list with ~p elements, "
        "average time of ~p milisecs, standard deviation: ~p~n",
        [?ITERS, ?LIST_SIZE, T1, Dev1]),

    {ok, T2, Dev2} = run_test(?ITERS, lists, flatten, fun gen_args_1/0),
    io:format("lists:flatten: ~p iterations, each list with ~p elements, "
        "average time of ~p milisecs, standard deviation: ~p~n",
        [?ITERS, ?LIST_SIZE, T2, Dev2]),

    {ok, T3, Dev3} = run_test(?ITERS, lists, append, fun gen_args_2/0),
    io:format("lists:append(L1, L2): ~p iterations, each list with ~p elements, "
        "average time of ~p milisecs, standard deviation: ~p~n",
        [?ITERS, ?LIST_SIZE, T3, Dev3]),

    {ok, T4, Dev4} = run_test(?ITERS, lists, append, fun gen_args_1/0),
    io:format("lists:append( [L1, L2] ): ~p iterations, each list with ~p elements, "
        "average time of ~p milisecs, standard deviation: ~p~n",
        [?ITERS, ?LIST_SIZE, T4, Dev4]).

run_test(Times, Mod, Fun, GenArgs) ->
    Ts = lists:foldl(        
        fun(_, Acc) ->
           Args = GenArgs(),
           {T, _} = timer:tc(Mod, Fun, Args),
           [T | Acc]
        end,
        [], lists:seq(1, Times)),
    Avg = lists:sum(Ts) / length(Ts),
    {ok, round(Avg / 1000), round(std_dev(Ts, Avg) / 1000)}.

std_dev(Values, Avg) ->
    Sums = lists:foldl(
        fun(V, Acc) -> D = V - Avg, Acc + (D * D) end,
        0, Values),
    math:sqrt(Sums / (length(Values) - 1)).

gen_args_2() ->
    L1 = binary_to_list(crypto:rand_bytes(?LIST_SIZE)),
    L2 = binary_to_list(crypto:rand_bytes(?LIST_SIZE)),
    [L1, L2].

gen_args_1() ->
    L1 = binary_to_list(crypto:rand_bytes(?LIST_SIZE)),
    L2 = binary_to_list(crypto:rand_bytes(?LIST_SIZE)),
    [[L1, L2]].

Running the tests 3 times in a row:


Erlang R13B04 (erts-5.7.5)  [64-bit] [smp:2:2] [rq:2] [async-threads:0] [hipe] [kernel-poll:false]

Eshell V5.7.5  (abort with ^G)
1> c(teste).
{ok,teste}
2> teste:run().
Operator ++: 100 iterations, each list with 1000000 elements, average time of 32 milisecs,
standard deviation: 23
lists:flatten: 100 iterations, each list with 1000000 elements, average time of 138 milisecs,
standard deviation: 39
lists:append(L1, L2): 100 iterations, each list with 1000000 elements, average time of 59 milisecs,
standard deviation: 6
lists:append( [L1, L2] ): 100 iterations, each list with 1000000 elements, average time of 82 milisecs,
standard deviation: 18
ok
3> teste:run().
Operator ++: 100 iterations, each list with 1000000 elements, average time of 66 milisecs,
standard deviation: 22
lists:flatten: 100 iterations, each list with 1000000 elements, average time of 151 milisecs,
standard deviation: 51
lists:append(L1, L2): 100 iterations, each list with 1000000 elements, average time of 34 milisecs,
standard deviation: 22
lists:append( [L1, L2] ): 100 iterations, each list with 1000000 elements, average time of 98 milisecs,s
tandard deviation: 38
ok
4> 
4> teste:run().
Operator ++: 100 iterations, each list with 1000000 elements, average time of 35 milisecs,
standard deviation: 26
lists:flatten: 100 iterations, each list with 1000000 elements, average time of 155 milisecs,
standard deviation: 52
lists:append(L1, L2): 100 iterations, each list with 1000000 elements, average time of 63 milisecs,
standard deviation: 15
lists:append( [L1, L2] ): 100 iterations, each list with 1000000 elements, average time of 89 milisecs,
standard deviation: 34
ok
5> 

So in the end either the ++ operator or the lists:append function are the best approaches.

I’m wondering if this applies to Caml as well (operator @ versus functions in the List module).
The List module man page for Caml explicitily says that the implementation for the functions append, concat and flatten are not tail recursive. This gives me the idea that underneath they’re not implemented in C but in pure Caml.
My Caml skills are now too rusty, and would need some time to write similar test code in Caml.
Maybe I’ll do it for a future post.

About these ads

Responses

  1. Actually, if you “read the source Luke” you will see that

    lists:flatten/1 with [L,L] operates with the | operator, by explicitly traversing each element in both lists.

    lists:append/2 with L,L simply expands into ++
    “append(L1, L2) -> L1 ++ L2.”

    lists:append/1 with [L,L] operates with ++ by recursion as flatten
    “append([E]) -> E;
    append([H|T]) -> H ++ append(T);
    append([]) -> [].”

    So it would seem that ++ would be the smarter choice as it is used as a buildingblock by append, and does not explicitly traverse elements in the erts like flatten.

    Cheers
    /G : http://erlcode.wordpress.com/

  2. Hi Gian,

    Thanks for the analysis.
    I haven’t gone as deep as looking at how OTP implements those functions.

    lists:flatten/1 was clear to me that it traverses all the elements in the list to check which are lists (recursively)

    > lists:flatten( [ [1,2,3] , [ [4] ] ] ).
    [1,2,3,4]

  3. Oh, well you should :)

    *well known and overly used Matrix reference with oral medicine goes here*


Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Categories

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: