wtorek, 25 kwietnia 2017

C++: on the dollar sign

In most programming languages there are sane rules that specify what can be an identifier and what cannot. Most of the time it's even intuitive - it's just something that matches [_a-zA-Z][a-zA-Z0-9]*. There are languages that allow more (e.g. $ in PHP/JS, or % in LabTalk). How about C++? Answer to this question may be a little surprise.

Almost a year ago we had this little argument with friend of mine whether dollar sign is allowed to be used within C++ identifiers. In other words it was about whether e.g. int $this = 1; is legal C++ or not.
Basically I was stating that's not possible. On the other hand, my friend was recalling some friend of his, which mentioned that dollars are fine.

The first line of defense is of course nearest compiler. I decided to fire up one and simply check what happens if I compile following fragment of code.

1 auto $foo() {
2     int $bar = 1;
3     return $bar;
4 }

At the time I had gcc-4.9.3 installed on my system (prehistoric version, I know ;-). For the record, the command was like this: g++ dollar.cpp -std=c++1y -c -Wall -Wextra -Werror.

And to my surprise... it compiled without single complaint. Moreover, clang and MSVC gulped this down without complaining as well. Well, Sławek - I said to myself - even if you're mastering something for years, there's still much to surprise you. BTW such a conclusion puts titles like following in much funnier light.


It was normal office day and we had other work to get done, so I reluctantly accepted this just as another dark corner. After couple of hours I forgot about the situation and let it resurface... couple of weeks later.

So, fast forward couple of weeks. I was preparing something related to C++ and I accidentally found a reference to the dollar sign in GCC documentation. It was nice feeling, because I knew I will fill this hole in my knowledge in a matter of minutes. So what was the reason compilers were happily accepting dollar signs?
Let me put here excerpt from GCC documentation, which speaks for itself :)
GCC allows the ‘$’ character in identifiers as an extension for most targets. This is true regardless of the std= switch, since this extension cannot conflict with standards-conforming programs. When preprocessing assembler, however, dollars are not identifier characters by default.
Currently the targets that by default do not permit ‘$’ are AVR, IP2K, MMIX, MIPS Irix 3, ARM aout, and PowerPC targets for the AIX operating system.
You can override the default with -fdollars-in-identifiers or fno-dollars-in-identifiers. See fdollars-in-identifiers.

I think three most important things are:
  1. This ain't work in macros.
  2. It doesn't seem to be correlated with -std switch.
  3. Some architectures do not permit it at all.
What got me thinking it this list of architectures. And it took me couple of minutes to find out that e.g. assembler for ARM doesn't allow dollar sign. So any assembly code generated by GCC for ARM would not assemble if dollar sign was used. That's plausible explanation why GCC doesn't allow such a character for all architectures. It doesn't explain why compilers allow it for others, though.

GCC could theoretically mitigate problem with particular architectures by replacing $ signs with some other character, but then bunch of other problems would appear: possible name conflicts, name mangling/demangling would yield incorrect values, and finally it wouldn't be possible to export such "changed" symbols from a library. In other words: disaster.

What about the standard?

After thinking about it for a minute I had strong need to see what exactly identifier does mean. So I opened N3797 and quickly found section I was looking for, namely (surprise-surprise) 2.11 Identifiers. So what does this section say?




Right after formal definition there is an explanation which refers to sections E.1 and E.2. But that's not important here. There is one more thing that appears in the formal definition and it's extremely easy to miss this one. It's "other implementation-defined characters". What does it mean? Yup - the compiler is allowed to allow any other character to be used within identifiers at will.

P.s. surprisingly cppcheck 1.71 doesn't report $ sign in identifiers as a problem at all.

piątek, 6 stycznia 2017

Getting all parent directories of a path

edit: reddit updates

Few minutes ago I needed to solve trivial problem of getting all parent directories of a path. It's very easy to do it imperatively, but it would simply not satisfy me. Hence, I challenged myself to do it declaratively in Python.

The problem is simple, but let me put an example on the table, so it's even easier to imagine what are we talking about.

Given some path, e.g.

/home/szborows/code/the-best-project-in-the-world

You want to have following list of parents:

/home/szborows/code
/home/szborows
/home
/

It's trivial to do this using split and then some for loop. How to make it more declarative?
Thinking more mathematically (mathematicians will perhaps cry out to heaven for vengeance after reading on, but let me at least try...) we simply want to get all of the subsets from some ordered set S that form prefix w.r.t. S. So we can simply generate pairs of numbers (1, y), representing all prefixes where y belongs to [1, len S). We can actually ignore this constant 1 and just operate on numbers.
In Python, to generate numbers starting from len(path) and going down we can simply utilize range() and [::-1] (this reverses collections, it's an idiom). Then join() can be used on splited path, but with slicing from 1 to y. That's it. And now demonstration:

>>> path = '/a/b/c/d'
>>> ['/' + '/'.join(path.split('/')[1:l]) for l in range(len(path.split('/')))[::-1] if l]
['/a/b/c', '/a/b', '/a', '/']

But what about performance? Which one will be faster - imperative or declarative approach? Intuition suggests that imperative version will win, but let's check.

On picture below you can see timeit (n=1000000) results for my machine (i5-6200U, Python 3.5.2+) for three paths:

short_path = '/lel'
regular_path = '/jakie/jest/srednie/zagniezdzenie?'
long_path = '/z/lekka/dlugasna/sciezka/co/by/pierdzielnik/mial/troche/roboty'

Implementations used:

def imper1(path):
    result = []
    for i in range(1, len(path.split('/'))):
        y = '/'.join(path.split('/')[:i]) or '/'
        result.append(y)
    return result

def imper2(path):
    i = len(path) - 1
    l = []
    while i > 0:
        while i != 0 and path[i] != '/':
            i -= 1
        l.append(path[:i] or '/')
        i -= 1
    return l

def decl1(path):
    return ['/' + '/'.join(path.split('/')[1:l])
            for l in range(len(path.split('/')))[::-1] if l]

def decl2(path):
    return ['/' + '/'.join(path.split('/')[1:-l])
            for l in range(-len(path.split('/'))+1, 1) if l] 
 
# decl3 hidden. read on ;-)


It started with imper1 and decl1. I noticed that imperative version is faster. I tried to speed up declarative function by replacing [::-1] with some numbers tricks. It helped, but not to the extend I anticipated. Then, I though about speeding up imper1 by using lower-level constructs. Unsurprisingly while loops and checks were faster. Let me temporarily ignore decl3 for now and play a little with CPython bytecode.

By looking at my results not everything is so obvious. decl{1,2} turned out to have decent performance with 4-part path, which looks like reasonable average.

I disassembled decl1 and decl2 to see the difference in byte code. The diff is shown below.

30 CALL_FUNCTION    1 (1 positional, 0 keyword pair) | 30 CALL_FUNCTION    1 (1 positional, 0 keyword pair)
33 CALL_FUNCTION    1 (1 positional, 0 keyword pair) | 33 CALL_FUNCTION    1 (1 positional, 0 keyword pair)
36 CALL_FUNCTION    1 (1 positional, 0 keyword pair) | 36 UNARY_NEGATIVE
39 LOAD_CONST       0 (None)                         | 37 LOAD_CONST       4 (1)  
42 LOAD_CONST       0 (None)                         | 40 BINARY_ADD
45 LOAD_CONST       5 (-1)                           | 41 LOAD_CONST       4 (1)  
48 BUILD_SLICE      3                                | 44 CALL_FUNCTION    2 (2 positional, 0 keyword pair)
51 BINARY_SUBSCR



As we can see [::-1] is implemented as three loads and build slice operations. I think this could be optimized if we had special opcode like e.g. BUILD_REV_SLICE. My little-optimized decl2 is faster because one UNARY_NEGATIVE and one BINARY_ADD is less than LOAD_CONST, BUILD_SLICE and BINARY_SUBSCR. Performance gain here is pretty obvious. No matter what decl2 must be faster.

What about decl2 vs imper1?
It's more complicated and it was a little surprise that such a longer bytecode can be slower than shorter counterpart.

  3           0 BUILD_LIST               0        
              3 STORE_FAST               1 (result)
                                             
  4           6 SETUP_LOOP              91 (to 100)                     
              9 LOAD_GLOBAL              0 (range)
             12 LOAD_CONST               1 (1)
             15 LOAD_GLOBAL              1 (len)
             18 LOAD_FAST                0 (path)
             21 LOAD_ATTR                2 (split)
             24 LOAD_CONST               2 ('/')
             27 CALL_FUNCTION            1 (1 positional, 0 keyword pair)
             30 CALL_FUNCTION            1 (1 positional, 0 keyword pair)
             33 CALL_FUNCTION            2 (2 positional, 0 keyword pair)
             36 GET_ITER                        
        >>   37 FOR_ITER                59 (to 99)
             40 STORE_FAST               2 (i)

  5          43 LOAD_CONST               2 ('/')
             46 LOAD_ATTR                3 (join)
             49 LOAD_FAST                0 (path)
             52 LOAD_ATTR                2 (split)
             55 LOAD_CONST               2 ('/')
             58 CALL_FUNCTION            1 (1 positional, 0 keyword pair)
             61 LOAD_CONST               0 (None)
             64 LOAD_FAST                2 (i)
             67 BUILD_SLICE              2
             70 BINARY_SUBSCR
             71 CALL_FUNCTION            1 (1 positional, 0 keyword pair)
             74 JUMP_IF_TRUE_OR_POP     80
             77 LOAD_CONST               2 ('/')
        >>   80 STORE_FAST               3 (y)

  6          83 LOAD_FAST                1 (result)
             86 LOAD_ATTR                4 (append)
             89 LOAD_FAST                3 (y)
             92 CALL_FUNCTION            1 (1 positional, 0 keyword pair)
             95 POP_TOP
             96 JUMP_ABSOLUTE           37
        >>   99 POP_BLOCK

  7     >>  100 LOAD_FAST                1 (result)
            103 RETURN_VALUE


The culprit was LOAD_CONST in decl{1,2} that was loading list-comprehension as a code object. Let's see how it looks, just for the record.

>>> dis.dis(decl2.__code__.co_consts[1])
 21           0 BUILD_LIST               0
              3 LOAD_FAST                0 (.0)
        >>    6 FOR_ITER                51 (to 60)
              9 STORE_FAST               1 (l)
             12 LOAD_FAST                1 (l)
             15 POP_JUMP_IF_FALSE        6
             18 LOAD_CONST               0 ('/')
             21 LOAD_CONST               0 ('/')
             24 LOAD_ATTR                0 (join)
             27 LOAD_DEREF               0 (path)
             30 LOAD_ATTR                1 (split)
             33 LOAD_CONST               0 ('/')
             36 CALL_FUNCTION            1 (1 positional, 0 keyword pair)
             39 LOAD_CONST               1 (1)
             42 LOAD_FAST                1 (l)
             45 UNARY_NEGATIVE
             46 BUILD_SLICE              2
             49 BINARY_SUBSCR
             50 CALL_FUNCTION            1 (1 positional, 0 keyword pair)
             53 BINARY_ADD
             54 LIST_APPEND              2
             57 JUMP_ABSOLUTE            6
        >>   60 RETURN_VALUE


So this is how list comprehensions look like when converted to byte code. Nice! Now performance results make more sense. In the project I was working on my function for getting all parent paths was called in one place and perhaps contributed to less than 5% of execution time of whole application. It would not make sense to optimize this piece of code. But it was delightful journey into internals of CPython, wasn't it?

Now, let's get back to decl3. What have I done to make my declarative implementation 2x faster on average case and for right-part outliers? Well... I just reluctantly resigned from putting everything in one line and saved path.split('/') into separate variable. That's it.

So what are learnings?
  • declarative method turned out to be faster than hand-crafter imperative one employing low-level constructs.
    Why? Good question! Maybe because bytecode generator knows how to produce optimized code when it encounters list comprehension? But I have written no CPython code, so it's only my speculation.
  • trying to put everything in one line can hurt - in described case split() function was major performance dragger
reddit-related updates:
Dunj3 outpaced me ;) - his implementation, which is better both w.r.t. "declarativeness" and performance:
list(itertools.accumulate(path.split('/'), curry(os.sep.join)))

syntax highlighting done with https://tohtml.com/python/

wtorek, 3 stycznia 2017

Logstash + filebeat: Invalid Frame Type, received: 1

Post for googlers that stumble on the same issue - it seems that "overconfiguration" is not a great idea for Filebeat and Logstash.

I've decided to explicitly set ssl.verification_mode to none in my Filebeat config and then I got following Filebeat and Logstash errors:

filebeat_1  | 2017/01/03 07:43:49.136717 single.go:140: ERR Connecting error publishing events (retrying): EOF
filebeat_1  | 2017/01/03 07:43:50.152824 single.go:140: ERR Connecting error publishing events (retrying): EOF
filebeat_1  | 2017/01/03 07:43:52.157279 single.go:140: ERR Connecting error publishing events (retrying): EOF
filebeat_1  | 2017/01/03 07:43:56.173144 single.go:140: ERR Connecting error publishing events (retrying): EOF                                                          
filebeat_1  | 2017/01/03 07:44:04.189167 single.go:140: ERR Connecting error publishing events (retrying): EOF



logstash_1       | 07:42:35.714 [Api Webserver] INFO  logstash.agent - Successfully started Logstash API endpoint {:port=>9600}                                                                                                         
logstash_1       | 07:43:49.135 [nioEventLoopGroup-4-1] ERROR org.logstash.beats.BeatsHandler - Exception: org.logstash.beats.BeatsParser$InvalidFrameProtocolException: Invalid Frame Type, received: 3                                      
logstash_1       | 07:43:49.139 [nioEventLoopGroup-4-1] ERROR org.logstash.beats.BeatsHandler - Exception: org.logstash.beats.BeatsParser$InvalidFrameProtocolException: Invalid Frame Type, received: 1                                      
logstash_1       | 07:43:50.150 [nioEventLoopGroup-4-2] ERROR org.logstash.beats.BeatsHandler - Exception: org.logstash.beats.BeatsParser$InvalidFrameProtocolException: Invalid Frame Type, received: 3                                      
logstash_1       | 07:43:50.154 [nioEventLoopGroup-4-2] ERROR org.logstash.beats.BeatsHandler - Exception: org.logstash.beats.BeatsParser$InvalidFrameProtocolException: Invalid Frame Type, received: 1                        
logstash_1       | 07:43:52.156 [nioEventLoopGroup-4-3] ERROR org.logstash.beats.BeatsHandler - Exception: org.logstash.beats.BeatsParser$InvalidFrameProtocolException: Invalid Frame Type, received: 3                                      
logstash_1       | 07:43:52.157 [nioEventLoopGroup-4-3] ERROR org.logstash.beats.BeatsHandler - Exception: org.logstash.beats.BeatsParser$InvalidFrameProtocolException: Invalid Frame Type, received: 1                                      
logstash_1       | 07:43:56.170 [nioEventLoopGroup-4-4] ERROR org.logstash.beats.BeatsHandler - Exception: org.logstash.beats.BeatsParser$InvalidFrameProtocolException: Invalid Frame Type, received: 3                                      
logstash_1       | 07:43:56.175 [nioEventLoopGroup-4-4] ERROR org.logstash.beats.BeatsHandler - Exception: org.logstash.beats.BeatsParser$InvalidFrameProtocolException: Invalid Frame Type, received: 1


It seems it's better to stay quiet with Filebeat :) Hopefully this helped to resolve your issue.

wtorek, 13 grudnia 2016

std::queue's big default footprint in assembly code

Recently I've been quite busy and now I'm kind of scrounging back into C++ world. Friend of mine told me about IncludeOS project and I thought that it may be pretty good exercise to put my hands on my keyboard and help in this wonderful project.

To be honest, the learning curve is quite steep (or I'm getting too old to learn so fast) and I'm still distracted by a lot of other things, so no big deliverables so far... but by just watching discussion on Gitter and integrating it with what I know I spotted probably obvious, but a little bit surprising thing about std::queue.

std::queue is not a container. Wait, what?, you ask. It's a container adapter. It doesn't have implementation. Instead, it takes other implementation, uses it as underlying container and just provides some convenient interface for end-user. By the way it isn't the only one. There are others like std::stack and std::priority_queue to name a few.

One of the dimension in which C++ shines are options for customizing stuff. We can customize things like memory allocators. In container adapters we can customize this underlying container if we decide that the one chosen by library writers isn't good match for us.

By default, perhaps because std::queue requires fast access at the beginning and end, it's underlying container is std::deque. std::deque provides O(1) complexity for pushing/popping at both ends. Perfect match, isn't it?

Well, yes if you care about performance at the cost of increased binary size. As it turns out by simply changing std::deque to std::vector:

std::queue<int> qd;
std::queue<int, std::vector<int>> qv;

Generated assembly code for x86-64 clang 3.8 (-O3 -std=c++14) is 502 and 144 respectively.

I know that in most context binary size is secondary consideration, but still I believe it's an interesting fact that the difference is so big. In other words there must be a lot of things going on under the bonnet of std::deque. I don't recommend changing deque to vector in production - it can seriously damage your performance.

You can play around with the code here: https://godbolt.org/g/XaLhS7 (code based on Voultapher example).

środa, 7 września 2016

elasticdiff

Recently I needed to compare two ElasticSearch indices. To be honest I was pretty sure that I'll find something in the Internet. It was a surprise that no such a tool do exist. I thought that this is good time to pay off portion of my debt to open-source :)

Enter ElasticDiff

elasticdiff is a simple project hosted on GitHub which allows you to compare two ES indices without pain. It is early development, so don't expect much, but for simple indices it works pretty well.

Usage is quite trivial:
python3 elasticdiff.py -t docType -i key http://1.2.3.4/index1 http://2.3.4.5/index1



Output was designed to imitate diff command:
only in left: a5
only in left: a1
only in right: a4
only in right: a2
entries for key a3 differ
--- 

+++ 

@@ -1,4 +1,4 @@

 {
   "key": "a3",
-  "value": "DEF"
+  "value": "XYZ"
 }
Summary:
2 entries only in left index
2 entries only in right index
1 common entries
  0 of them are the same
  1 of them differ

Hopefully someone will find this tool useful.

More information available at GitHub page. I'm planning releasing this on PyPi when it gets more mature. I will update this post when this happens.

poniedziałek, 25 lipca 2016

niedziela, 8 maja 2016

me @ ACCU 2016 - what every C++ programmer should know about modern compilers

In Poland we have this proverb "pierwsze koty za płoty"*. Basically we say it when we do something for the first time in our lives. After finding a translation, I was quite amused - "the first pancake is always spoiled". Recently I had my debut abroad on ACCU 2016 conference. You decide if it was spoiled pancake or not :)

My short 15min talk was about modern compilers. I start with basic introduction, which may be a bit surprising, then I move to C++ standard perspective and undefined behaviors, then there's a part about optimizations and finally I'm covering the ecosystem that has grown around compilers in recent years (sanitizers, 3rd tools based on compilers, etc.). Enjoy!


Slides: with comments / without comments

GCC 6.1 released [link]
If GCC was released two weeks earlier I would definitely include one slide about how verbose and helpful compilers are nowadays.
Several years ago they were unhelpful and very quiet - even if something was going terribly wrong (and the programmer would discover this in run-time) they were happily compiling assuming that the programmer knows what she's doing.
And now.. what a change. The compilers became so friendly that GCC 6.1 can even let you know that you have unmerged git content in the file!

* direct translation doesn't work :) - "first cats over the fences".