niedziela, 25 stycznia 2015

type_traits: GCC vs Clang

Recently I was writing another article for polish developer journal "Programista" (eng. Programmer). This time I decided to focus on type_traits library - what can we find inside, how it is implemented, and finally - what can we expect in the future (mainly in terms of compile-time reflection).

While writing I was not referring to any particular implementation, but Clang and GCC implementations were opened on the second monitor all the times. I noticed some differences that I'd like to document here.

First things first. Whole post is based on:
The first peculiar difference is that one of the first things that appear in the top of GCC's type_traits are interesting template structures: __and_, __or_, __not_, and so on:
 100   template<typename...>
 101     struct __or_;
 102 
 103   template<>
 104     struct __or_<>
 105     : public false_type
 106     { };
 107 
 108   template<typename _B1>
 109     struct __or_<_B1>
 110     : public _B1
 111     { };

They are used in following fashion:
 709   /// is_unsigned
 710   template<typename _Tp>
 711     struct is_unsigned
 712     : public __and_<is_arithmetic<_Tp>, __not_<is_signed<_Tp>>>::type
 713     { };

At first glance everything looks nice. However you won't find such things in Clang's type_traits implementation. Clang approach is to use standard template mechanisms, like template specialization:
 688 // is_unsigned
 689 
 690 template <class _Tp, bool = is_integral<_Tp>::value>
 691 struct __libcpp_is_unsigned_impl : public integral_constant<bool, _Tp(0) < _Tp(-1)> {};
 692 
 693 template <class _Tp>
 694 struct __libcpp_is_unsigned_impl<_Tp, false> : public false_type {};  // floating point
 695 
 696 template <class _Tp, bool = is_arithmetic<_Tp>::value>
 697 struct __libcpp_is_unsigned : public __libcpp_is_unsigned_impl<_Tp> {};
 698 
 699 template <class _Tp> struct __libcpp_is_unsigned<_Tp, false> : public false_type {};
 700 
 701 template <class _Tp> struct _LIBCPP_TYPE_VIS_ONLY is_unsigned : public __libcpp_is_unsigned<_Tp> {};

There's no doubt - GCC's version is more human-friendly. We can read it like it was a book and everything is clear. Reading Clang version is much more harder. Code is bloated with template stuff and there is a lot of kinky helpers.
On the other hand Clang developers use things that are shipped with the C++ compiler. Therefore, at least in theory, compilation should take less time for Clang implementation. It's not that easy to test this, but there's fancy new tool out there - templight. It is a tool that allows us to debug and profile template instances ;)
After some time playing with templight I got following results for simple program that just includes type_traits library, but in two versions: GCC and Clang.

GCCClang
Template instantiations*35
Template memoizations*5945
Maximum memory usage~957kB~2332kB

When we sum instantiations and memoizations it turns out that assumption was right - Clang implementation of type_traits library should be slightly faster. But of course it depends on how compilers are implemented.
Another interesting fact is that during compilation of Clang's type_traits a lot more memory is consumed (comparing to GCC's version).
Both versions of type_traits were compiled using Clang 3.6 (SVN) with templight plugin, so there's no data related to GCC. It may be that for GCC GCC's version of type_traits is better.

The second thing that I've noticed is that Clang's type_traits strive to not rely on compiler built-ins as much as possible while GCC does. Good example here are implementations for std::is_class and std::is_enum.
GCC approach here is to simply use compiler built-ins, like __is_class and __is_enum.
 411   /// is_class
 412   template<typename _Tp>
 413     struct is_class
 414     : public integral_constant<bool, __is_class(_Tp)>
 415     { };
...
 399   /// is_enum
 400   template<typename _Tp>
 401     struct is_enum
 402     : public integral_constant<bool, __is_enum(_Tp)>
 403     { };

 487   { "__is_class",   RID_IS_CLASS,   D_CXXONLY },
 489   { "__is_enum",    RID_IS_ENUM,    D_CXXONLY },

Clang, on contrary, utilizes metaprogramming tricks developed by the community. In case of std::is_class implementation is based on function overloading - the first version of function __is_class_imp::__test accepts pointer to a member and thus will be chosen by the compiler only if the type is a class or union. Therefore second check is needed as well. Simple and brilliant.
 413 namespace __is_class_imp
 414 {
 415 template <class _Tp> char  __test(int _Tp::*);
 416 template <class _Tp> __two __test(...);
 417 }
 418 
 419 template <class _Tp> struct _LIBCPP_TYPE_VIS_ONLY is_class
 420     : public integral_constant<bool, sizeof(__is_class_imp::__test<_Tp>(0)) == 1 && !is_union<_Tp>::value> {};


In case of enums there's also interesting bit in Clang implementation:
 500 template <class _Tp> struct _LIBCPP_TYPE_VIS_ONLY is_enum
 501     : public integral_constant<bool, !is_void<_Tp>::value             &&
 502                                      !is_integral<_Tp>::value         &&
 503                                      !is_floating_point<_Tp>::value   &&
 504                                      !is_array<_Tp>::value            &&
 505                                      !is_pointer<_Tp>::value          &&
 506                                      !is_reference<_Tp>::value        &&
 507                                      !is_member_pointer<_Tp>::value   &&
 508                                      !is_union<_Tp>::value            &&
 509                                      !is_class<_Tp>::value            &&
 510                                      !is_function<_Tp>::value         > {};

Voila! No built-ins needed ;)

The last thing I'd like to mention here is std::is_function. GCC's approach here is a bit ridiculous, because it needs a lot of specializations (there are 24 of them so I don't want to include 'em all).
 490   template<typename _Res, typename... _ArgTypes>
 491     struct is_function<_Res(_ArgTypes......) volatile &&>
 492     : public true_type { };
 493 
 494   template<typename _Res, typename... _ArgTypes>
 495     struct is_function<_Res(_ArgTypes...) const volatile>
 496     : public true_type { };
 497 
 498   template<typename _Res, typename... _ArgTypes>
 499     struct is_function<_Res(_ArgTypes...) const volatile &>
 500     : public true_type { };
 501 
 502   template<typename _Res, typename... _ArgTypes>
 503     struct is_function<_Res(_ArgTypes...) const volatile &&>
 504     : public true_type { };

I think that this double template unpacking (or nested unpacking - I don't know how to name this) is good example how to make implementation look like a nightmare. When you have hammer in your hand everything is looking like a nail, isn't it ;)?



* Memoization - "Memoization means we are _not_ instantiating a template because it is already instantiated (but we entered a context where wewould have had to if it was not already instantiated)."

Brak komentarzy:

Prześlij komentarz