이전글
https://mtding00.tistory.com/7
__fastcall
Microsoft Specific
The __fastcall calling convention specifies that arguments to functions are to be passed in registers, when possible. This calling convention only applies to the x86 architecture. The following list shows the implementation of this calling convention.
요소 | 구현 |
인수 전달 순서 | 왼쪽에서 오른쪽으로 인수 목록에서 발견된 처음 두 개의 DWORD 이하 인수는 ECX 및 EDX 레지스터로 전달되고, 다른 모든 인수는 오른쪽에서 왼쪽으로 스택에 전달됩니다. |
스택 유지관리 책임 | 호출된 함수가 스택에서 인수를 꺼냅니다. |
이름 규칙 |
At 기호 (@) 맨 앞 이름 앞에 at 기호 뒤에 바이트 수 (10 진수) 매개 변수에서 목록이 이름에 접미사로 합니다. At sign (@) is prefixed to names; an at sign followed by the number of bytes (in decimal) in the parameter list is suffixed to names. |
대/소문자 변환규칙 | 대/소문자 변환은 수행되지 않습니다. |
참고_
Future compiler versions may use different registers to store parameters.
Using the /Gr compiler option causes each function in the module to compile as __fastcall unless the function is declared by using a conflicting attribute, or the name of the function is main.
The __fastcall keyword is accepted and ignored by the compilers that target ARM and x64 architectures; on an x64 chip, by convention, the first four arguments are passed in registers when possible, and additional arguments are passed on the stack. For more information, see x64 Calling Convention. On an ARM chip, up to four integer arguments and eight floating-point arguments may be passed in registers, and additional arguments are passed on the stack.
For non-static class functions, if the function is defined out-of-line, the calling convention modifier does not have to be specified on the out-of-line definition. That is, for class non-static member methods, the calling convention specified during declaration is assumed at the point of definition. Given this class definition:
C++복사
struct CMyClass { void __fastcall mymethod(); };
이 코드는:
C++복사
void CMyClass::mymethod() { return; }
이 코드와 같습니다:
C++복사
void __fastcall CMyClass::mymethod() { return; }
For compatibility with previous versions, _fastcall is a synonym for __fastcall unless compiler option /Za (Disable language extensions) is specified.
Example
In the following example, the function DeleteAggrWrapper is passed arguments in registers:
C++복사
1
2
3
4
5
6
|
// Example of the __fastcall keyword
#define FASTCALL __fastcall
void FASTCALL DeleteAggrWrapper(void* pWrapper);
// Example of the __ fastcall keyword on function pointer
typedef BOOL (__fastcall *funcname_ptr)(void * arg1, const char * arg2, DWORD flags, ...);
|
END Microsoft Specific
See also
__fastcall의 규칙은 해당 함수의 인자들이 스택이 아닌 레지스터에 저장된다는 것이다. (그 속도가 빨라 지나간다는 표현을 쓴 듯 보임) 여튼 이 규칙은 오로지 x86에서만 이루어지며, 앞 장에서 말한 x64은 무조건 __fastcall을 취한다는 내용과 다소 어긋나는 것을 보여준다.
표를 보면 x86에서 인자는 처음 두 개가 register에 저장되고 나머지 뒤쪽 인자들은 stack에 저장되는 것을 보여준다.
이는 표 밑 단락에 더욱더 자세히 설명이 되어있는데 x64에서는 처음 4개의 원소는 __fastcall처럼 register에 저장이 되지만 그 이상의 인자들은 stack쪽으로 간다는 것이다.
x86에서 인자 4개 이상 시
1
2
3
4
5
6
7
8
9
10
11
12
13
|
; Line 133
push 3
mov edx, 2
mov ecx, 1
call ?Func4@@YIHHHH@Z ; Func4
; Line 134
push 6
push 5
push 4
push 3
mov edx, 2
mov ecx, 1
call ?Func4x@@YIHHHHHHH@Z ; Func4x
|
x64에서 인자 4개 이상 시
1
2
3
4
5
6
7
8
9
10
11
12
13
|
; Line 133
mov r8d, 3
mov edx, 2
mov ecx, 1
call ?Func4@@YAHHHH@Z ; Func4
; Line 134
mov DWORD PTR [rsp+40], 6
mov DWORD PTR [rsp+32], 5
mov r9d, 4
mov r8d, 3
mov edx, 2
mov ecx, 1
call ?Func4x@@YAHHHHHHH@Z ; Func4x
|
번외 - __fastcall로 가변인자 사용시
1
2
3
4
5
6
7
8
9
10
|
; Line 138
push 7
push 6
push 5
push 4
push 3
push 2
push 1
call ?Func4test@@YAHHHHZZ ; Func4test
add esp, 28 ; 0000001cH
|
1
2
3
4
5
6
7
8
9
|
; Line 138
mov DWORD PTR [rsp+48], 7
mov DWORD PTR [rsp+40], 6
mov DWORD PTR [rsp+32], 5
mov r9d, 4
mov r8d, 3
mov edx, 2
mov ecx, 1
call ?Func4test@@YAHHHHZZ ; Func4test
|
x86에선 stdcall을 사용한 형태가 되었으며, x64에선 그냥 기존 방식대로 되었다.
__vectorcall
Microsoft Specific
The __vectorcall calling convention specifies that arguments to functions are to be passed in registers, when possible. __vectorcall uses more registers for arguments than __fastcall or the default x64 calling convention use. The __vectorcall calling convention is only supported in native code on x86 and x64 processors that include Streaming SIMD Extensions 2 (SSE2) and above. Use __vectorcall to speed functions that pass several floating-point or SIMD vector arguments and perform operations that take advantage of the arguments loaded in registers. The following list shows the features that are common to the x86 and x64 implementations of __vectorcall. The differences are explained later in this article.
요소 | 구현 |
C 이름 데코레이션 규칙 |
함수 이름은 두 개의 "at" 기호 붙습니다 (@@) 뒤에 매개 변수 목록의 10 진수) (의 바이트 수입니다. Function names are suffixed with two "at" signs (@@) followed by the number of bytes (in decimal) in the parameter list. |
대/소문자 변환 규칙 | 대/소문자 변환은 수행되지 않습니다. |
Using the /Gv compiler option causes each function in the module to compile as __vectorcall unless the function is a member function, is declared with a conflicting calling convention attribute, uses a vararg variable argument list, or has the name main.
You can pass three kinds of arguments by register in __vectorcall functions: integer typevalues, vector type values, and homogeneous vector aggregate (HVA) values.
An integer type satisfies two requirements: it fits in the native register size of the processor—for example, 4 bytes on an x86 machine or 8 bytes on an x64 machine—and it’s convertible to an integer of register length and back again without changing its bit representation. For example, any type that can be promoted to int on x86 (long long on x64)—for example, a char or short—or that can be cast to int (long long on x64) and back to its original type without change is an integer type. Integer types include pointer, reference, and struct or union types of 4 bytes (8 bytes on x64) or less. On x64 platforms, larger struct and union types are passed by reference to memory allocated by the caller; on x86 platforms, they are passed by value on the stack.
A vector type is either a floating-point type—for example, a float or double—or an SIMD vector type—for example, __m128 or __m256.
An HVA type is a composite type of up to four data members that have identical vector types. An HVA type has the same alignment requirement as the vector type of its members. This is an example of an HVA struct definition that contains three identical vector types and has 32-byte alignment:
C++복사
typedef struct { __m256 x; __m256 y; __m256 z; } hva3; // 3 element HVA type on __m256
Declare your functions explicitly with the __vectorcall keyword in header files to allow separately compiled code to link without errors. Functions must be prototyped to use __vectorcall, and can’t use a vararg variable length argument list.
A member function may be declared by using the __vectorcall specifier. The hidden thispointer is passed by register as the first integer type argument.
On ARM machines, __vectorcall is accepted and ignored by the compiler.
For non-static class member functions, if the function is defined out-of-line, the calling convention modifier does not have to be specified on the out-of-line definition. That is, for class non-static members, the calling convention specified during declaration is assumed at the point of definition. Given this class definition:
C++복사
struct MyClass { void __vectorcall mymethod(); };
이 코드는:
C++복사
void MyClass::mymethod() { return; }
is equivalent to this:
이 코드와 같습니다:
C++복사
void __vectorcall MyClass::mymethod() { return; }
The __vectorcall calling convention modifier must be specified when a pointer to a __vectorcall function is created. The next example creates a typedef for a pointer to a __vectorcall function that takes four double arguments and returns an __m256 value:
C++복사
typedef __m256 (__vectorcall * vcfnptr)(double, double, double, double);
For compatibility with previous versions, _vectorcall is a synonym for __vectorcall unless compiler option /Za (Disable language extensions) is specified.
__vectorcall convention on x64
The __vectorcall calling convention on x64 extends the standard x64 calling convention to take advantage of additional registers. Both integer type arguments and vector type arguments are mapped to registers based on position in the argument list. HVA arguments are allocated to unused vector registers.
When any of the first four arguments in order from left to right are integer type arguments, they are passed in the register that corresponds to that position—RCX, RDX, R8, or R9. A hidden this pointer is treated as the first integer type argument. When an HVA argument in one of the first four arguments can’t be passed in the available registers, a reference to caller-allocated memory is passed in the corresponding integer type register instead. Integer type arguments after the fourth parameter position are passed on the stack.
When any of the first six arguments in order from left to right are vector type arguments, they are passed by value in SSE vector registers 0 to 5 according to argument position.Floating-point and __m128 types are passed in XMM registers, and __m256 types are passed in YMM registers. This differs from the standard x64 calling convention, because the vector types are passed by value instead of by reference, and additional registers are used. The shadow stack space allocated for vector type arguments is fixed at 8 bytes, and the /homeparams option does not apply. Vector type arguments in the seventh and later parameter positions are passed on the stack by reference to memory allocated by the caller.
After registers are allocated for vector arguments, the data members of HVA arguments are allocated, in ascending order, to unused vector registers XMM0 to XMM5 (or YMM0 to YMM5, for __m256 types), as long as there are enough registers available for the entire HVA. If not enough registers are available, the HVA argument is passed by reference to memory allocated by the caller. The stack shadow space for an HVA argument is fixed at 8 bytes with undefined content. HVA arguments are assigned to registers in order from left to right in the parameter list, and may be in any position. HVA arguments in one of the first four argument positions that are not assigned to vector registers are passed by reference in the integer register that corresponds to that position. HVA arguments passed by reference after the fourth parameter position are pushed on the stack.
Results of __vectorcall functions are returned by value in registers when possible. Results of integer type, including structs or unions of 8 bytes or less, are returned by value in RAX. Vector type results are returned by value in XMM0 or YMM0, depending on size.HVA results have each data element returned by value in registers XMM0:XMM3 or YMM0:YMM3, depending on element size. Result types that don't fit in the corresponding registers are returned by reference to memory allocated by the caller.
The stack is maintained by the caller in the x64 implementation of __vectorcall. The caller prolog and epilog code allocates and clears the stack for the called function. Arguments are pushed on the stack from right to left, and shadow stack space is allocated for arguments passed in registers.
Examples:
예제 코드는 접은 글 아래에서 어셈블리와 같이 보도록 하겠다.
__vectorcall convention on x86
The __vectorcall calling convention follows the __fastcall convention for 32-bit integer type arguments, and takes advantage of the SSE vector registers for vector type and HVA arguments.
The first two integer type arguments found in the parameter list from left to right are placed in ECX and EDX, respectively. A hidden this pointer is treated as the first integer type argument, and is passed in ECX. The first six vector type arguments are passed by value through SSE vector registers 0 to 5, in the XMM or YMM registers, depending on argument size.
The first six vector type arguments in order from left to right are passed by value in SSE vector registers 0 to 5. Floating-point and __m128 types are passed in XMM registers, and __m256 types are passed in YMM registers. No shadow stack space is allocated for vector type arguments passed by register. The seventh and subsequent vector type arguments are passed on the stack by reference to memory allocated by the caller. The limitation of compiler error C2719 does not apply to these arguments.
After registers are allocated for vector arguments, the data members of HVA arguments are allocated in ascending order to unused vector registers XMM0 to XMM5 (or YMM0 to YMM5, for __m256 types), as long as there are enough registers available for the entire HVA. If not enough registers are available, the HVA argument is passed on the stack by reference to memory allocated by the caller. No stack shadow space for an HVA argument is allocated. HVA arguments are assigned to registers in order from left to right in the parameter list, and may be in any position.
Results of __vectorcall functions are returned by value in registers when possible. Results of integer type, including structs or unions of 4 bytes or less, are returned by value in EAX. Integer type structs or unions of 8 bytes or less are returned by value in EDX:EAX.Vector type results are returned by value in XMM0 or YMM0, depending on size. HVA results have each data element returned by value in registers XMM0:XMM3 or YMM0:YMM3, depending on element size. Other result types are returned by reference to memory allocated by the caller.
The x86 implementation of __vectorcall follows the convention of arguments pushed on the stack from right to left by the caller, and the called function clears the stack just before it returns. Only arguments that are not placed in registers are pushed on the stack.
Examples:
x86 예제 코드 역시 접은 글 아래에서 어셈블리어와 같이보도록 하겠다.
End Microsoft Specific
See also
__vectorcall은 기본적으로 __fastcall을 확장한 형태이다. 확장되었다고 하여서 레지스터로 받는 매개변수의 갯수가 변경됐다는 것이 아니라, 한 번에 받는 크기가 최대 __m256(32바이트)까지 확장되었다는 것을 말하는 것 같다. 어셈블리를 보면 실제로도 ymm을 이용해 256 크기의 인자를 생성해 함수를 호출하는 것을 볼 수 있다.
예제 코드는 기본적인 인자는 __fastcall과 같게 동작하며, 중간중간 섞인 인자, 그리고 크기 128를 넘는 인자들을 MSDN에서 준비해 그것을 바탕으로 어셈블리를 뽑기로 하였다.
또한 __fastcall에서 x64와 x86와의 차이와 같게 레지스터로 받는 갯수가 각각 4개와 2개로 같다. 이 개수가 초과되면 스택으로 넘겨진다. 단, xmm과 ymm은 모두 6개를 받는다. 또한 정수형은 왼쪽에서 오른쪽 xmm or ymm은 오른쪽에서 왼쪽이다. 이거 핵심임ㅋㅋㅋ. 이것에 대한 예제는 역시 아래에 보여줄 것이다.
__vectorcall은 SSE2를 지원하는데 이는 XMM을 확장한 언어로 보면 되겠다. 더 자세한 것은 다음 자료를 타고 참고하기 바라고 더 알아보고 싶으면 직접 알아보기 바란다...;;
예제 코드 원본
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
|
// crt_vc64.c
// Build for amd64 with: cl /arch:AVX /W3 /FAs crt_vc64.c
// This example creates an annotated assembly listing in
#include <intrin.h>
#include <xmmintrin.h>
typedef struct {
__m128 array[2];
} hva2; // 2 element HVA type on __m128
typedef struct {
__m256 array[4];
} hva4; // 4 element HVA type on __m256
// Example 1: All vectors
// Passes a in XMM0, b in XMM1, c in YMM2, d in XMM3, e in YMM4.
// Return value in XMM0.
__m128 __vectorcall
example1(__m128 a, __m128 b, __m256 c, __m128 d, __m256 e) {
return d;
}
// Example 2: Mixed int, float and vector parameters
// Passes a in RCX, b in XMM1, c in R8, d in XMM3, e in YMM4,
// f in XMM5, g pushed on stack.
// Return value in YMM0.
__m256 __vectorcall
example2(int a, __m128 b, int c, __m128 d, __m256 e, float f, int g) {
return e;
}
// Example 3: Mixed int and HVA parameters
// Passes a in RCX, c in R8, d in R9, and e pushed on stack.
// Passes b by element in [XMM0:XMM1];
// b's stack shadow area is 8-bytes of undefined value.
// Return value in XMM0.
__m128 __vectorcall example3(int a, hva2 b, int c, int d, int e) {
return b.array[0];
}
// Example 4: Discontiguous HVA
// Passes a in RCX, b in XMM1, d in XMM3, and e is pushed on stack.
// Passes c by element in [YMM0,YMM2,YMM4,YMM5], discontiguous because
// vector arguments b and d were allocated first.
// Shadow area for c is an 8-byte undefined value.
// Return value in XMM0.
float __vectorcall example4(int a, float b, hva4 c, __m128 d, int e) {
return b;
}
// Example 5: Multiple HVA arguments
// Passes a in RCX, c in R8, e pushed on stack.
// Passes b in [XMM0:XMM1], d in [YMM2:YMM5], each with
// stack shadow areas of an 8-byte undefined value.
// Return value in RAX.
int __vectorcall example5(int a, hva2 b, int c, hva4 d, int e) {
return c + e;
}
// Example 6: HVA argument passed by reference, returned by register
// Passes a in [XMM0:XMM1], b passed by reference in RDX, c in YMM2,
// d in [XMM3:XMM4].
// Register space was insufficient for b, but not for d.
// Return value in [YMM0:YMM3].
hva4 __vectorcall example6(hva2 a, hva4 b, __m256 c, hva2 d) {
return b;
}
int __cdecl main( void )
{
hva4 h4;
hva2 h2;
int i;
float f;
__m128 a, b, d;
__m256 c, e;
a = b = d = _mm_set1_ps(3.0f);
c = e = _mm256_set1_ps(5.0f);
b = example1(a, b, c, d, e);
e = example2(1, b, 3, d, e, 6.0f, 7);
d = example3(1, h2, 3, 4, 5);
f = example4(1, 2.0f, h4, d, 5);
i = example5(1, h2, 3, h4, 5);
h4 = example6(h2, h4, c, h2);
}
|
참고 - 이 코드의 주석은 x64기준으로 되어있다.
x64 어셈블리
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
|
; Line 250
vmovups ymm4, YMMWORD PTR e$[rbp]
movaps xmm3, XMMWORD PTR d$[rbp]
vmovups ymm2, YMMWORD PTR c$[rbp]
movaps xmm1, XMMWORD PTR b$[rbp]
movaps xmm0, XMMWORD PTR a$[rbp]
call ?example1@@YQ?AT__m128@@T1@0T__m256@@01@Z ; example1
movaps XMMWORD PTR $T9[rbp], xmm0
movaps xmm0, XMMWORD PTR $T9[rbp]
movaps XMMWORD PTR b$[rbp], xmm0 // __m128, __m128, __m256, __m128, __m256 - return __m128
; Line 251
mov DWORD PTR [rsp+48], 7
movss xmm5, DWORD PTR __real@40c00000
vmovups ymm4, YMMWORD PTR e$[rbp]
movaps xmm3, XMMWORD PTR d$[rbp]
mov r8d, 3
movaps xmm1, XMMWORD PTR b$[rbp]
mov ecx, 1
call ?example2@@YQ?AT__m256@@HT__m128@@H0T1@MH@Z ; example2
vmovups YMMWORD PTR $T10[rbp], ymm0
vmovups ymm0, YMMWORD PTR $T10[rbp]
vmovups YMMWORD PTR e$[rbp], ymm0 // int, __m128, int, __m128, __m256, float, int - return __m256
; Line 252
mov DWORD PTR [rsp+32], 5
mov r9d, 4
mov r8d, 3
movaps xmm0, XMMWORD PTR h2$[rbp]
movaps xmm1, XMMWORD PTR h2$[rbp+16]
mov ecx, 1
call ?example3@@YQ?AT__m128@@HUhva2@@HHH@Z ; example3
movaps XMMWORD PTR $T11[rbp], xmm0
movaps xmm0, XMMWORD PTR $T11[rbp]
movaps XMMWORD PTR d$[rbp], xmm0 // int, struct(size == __m128), int, int, int - return __m128
; Line 253
mov DWORD PTR [rsp+32], 5
movaps xmm3, XMMWORD PTR d$[rbp]
vmovups ymm0, YMMWORD PTR h4$[rbp]
vmovups ymm2, YMMWORD PTR h4$[rbp+32]
vmovups ymm4, YMMWORD PTR h4$[rbp+64]
vmovups ymm5, YMMWORD PTR h4$[rbp+96]
movss xmm1, DWORD PTR __real@40000000
mov ecx, 1
call ?example4@@YQMHMUhva4@@T__m128@@H@Z ; example4
movss DWORD PTR f$[rbp], xmm0 // int, float, struct(size == __m256), __m128, int - return float
; Line 254
mov DWORD PTR [rsp+32], 5
vmovups ymm2, YMMWORD PTR h4$[rbp]
vmovups ymm3, YMMWORD PTR h4$[rbp+32]
vmovups ymm4, YMMWORD PTR h4$[rbp+64]
vmovups ymm5, YMMWORD PTR h4$[rbp+96]
mov r8d, 3
movaps xmm0, XMMWORD PTR h2$[rbp]
movaps xmm1, XMMWORD PTR h2$[rbp+16]
mov ecx, 1
call ?example5@@YQHHUhva2@@HUhva4@@H@Z ; example5
mov DWORD PTR i$[rbp], eax // int, struct(size == __m128), int, struct(size == __m256), int - return int
; Line 255
lea rax, QWORD PTR $T13[rbp]
lea rcx, QWORD PTR h4$[rbp]
mov rdi, rax
mov rsi, rcx
mov ecx, 128 ; 00000080H
rep movsb
movaps xmm3, XMMWORD PTR h2$[rbp]
movaps xmm4, XMMWORD PTR h2$[rbp+16]
vmovups ymm2, YMMWORD PTR c$[rbp]
lea rdx, QWORD PTR $T13[rbp]
movaps xmm0, XMMWORD PTR h2$[rbp]
movaps xmm1, XMMWORD PTR h2$[rbp+16]
call ?example6@@YQ?AUhva4@@Uhva2@@U1@T__m256@@0@Z ; example6
vmovups YMMWORD PTR $T14[rbp+96], ymm3
vmovups YMMWORD PTR $T14[rbp+64], ymm2
vmovups YMMWORD PTR $T14[rbp+32], ymm1
vmovups YMMWORD PTR $T14[rbp], ymm0
lea rax, QWORD PTR $T12[rbp]
lea rcx, QWORD PTR $T14[rbp]
mov rdi, rax
mov rsi, rcx
mov ecx, 128 ; 00000080H
rep movsb
lea rax, QWORD PTR h4$[rbp]
lea rcx, QWORD PTR $T12[rbp]
mov rdi, rax
mov rsi, rcx
mov ecx, 128 ; 00000080H
rep movsb // struct(size == __m128), struct(size == __m256), __m256, struct(size == __m128) - return struct(size == __m128)
|
example1 [인자] // __m128, __m128, __m256, __m128, __m256 // [반환] return __m128
xmm0, xmm1, ymm2, xmm3, ymm4 // xmm0
example2 [인자] // int, __m128, int, __m128, __m256, float, int // [반환] return __m256
ecx(레지스터), xmm1, r8d(레지스터), xmm3, ymm4, xmm5, 스택(xmm이 아니므로 6번째는 스택에 전달) // ymm0
example3 [인자] // int, struct(size == __m128), int, int, int // [반환] return __m128
ecx(레지스터), xmm0~1, r8d(레지스터), r9d(레지스터), 스택(xmm이 아니므로 5번째는 스택에 전달) // xmm0
example4 [인자] // int, float, struct(size == __m256), __m128, int // [반환] return float
ecx(레지스터), xmm1, ymm(0, 2, 4, 5), xmm3, 스택(xmm이 아니므로 5번째는 스택에 전달) // xmm0
example5 [인자] // int, struct(size == __m128), int, struct(size == __m256), int // [반환] return int
ecx(레지스터), xmm0~1, r8d(레지스터), ymm(2,3,4,5), 스택(xmm이 아니므로 5번째는 스택에 전달) // eax
example6
[인자] struct(size == __m128), struct(size == __m256), __m256, struct(size == __m128)
[반환] return struct(size == __m128)
xmm0~1, rdx(레지스터), ymm2, xmm3~4 // ymm0~3
+ rdx -> edx의 확장판
정수가 레지스터 4번째를 넘어가면 스텍으로 밀리는 것이 특징이다.
xmm 이나 ymm은 6번째지만 말이다.
x86 어셈블리
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
|
; Line 250
vmovups ymm4, YMMWORD PTR _e$[ebp]
movaps xmm3, XMMWORD PTR _d$[ebp]
vmovups ymm2, YMMWORD PTR _c$[ebp]
movaps xmm1, XMMWORD PTR _b$[ebp]
movaps xmm0, XMMWORD PTR _a$[ebp]
call ?example1@@YQ?AT__m128@@T1@0T__m256@@01@Z ; example1
movaps XMMWORD PTR $T6[ebp], xmm0
movaps xmm0, XMMWORD PTR $T6[ebp]
movaps XMMWORD PTR _b$[ebp], xmm0 // __m128, __m128, __m256, __m128, __m256 - return __m128
; Line 251
push 7
vmovups ymm2, YMMWORD PTR _e$[ebp]
movaps xmm1, XMMWORD PTR _d$[ebp]
movaps xmm0, XMMWORD PTR _b$[ebp]
movss xmm3, DWORD PTR __real@40c00000
mov edx, 3
mov ecx, 1
call ?example2@@YQ?AT__m256@@HT__m128@@H0T1@MH@Z ; example2
vmovups YMMWORD PTR $T5[ebp], ymm0
vmovups ymm0, YMMWORD PTR $T5[ebp]
vmovups YMMWORD PTR _e$[ebp], ymm0 // int, __m128, int, __m128, __m256, float, int - return __m256
; Line 252
push 5
push 4
mov edx, 3
mov ecx, 1
movaps xmm0, XMMWORD PTR _h2$[ebp]
movaps xmm1, XMMWORD PTR _h2$[ebp+16]
call ?example3@@YQ?AT__m128@@HUhva2@@HHH@Z ; example3
movaps XMMWORD PTR $T4[ebp], xmm0
movaps xmm0, XMMWORD PTR $T4[ebp]
movaps XMMWORD PTR _d$[ebp], xmm0 // int, struct(size == __m128), int, int, int - return __m128
; Line 253
movaps xmm1, XMMWORD PTR _d$[ebp]
mov edx, 5
movss xmm0, DWORD PTR __real@40000000
mov ecx, 1
vmovups ymm2, YMMWORD PTR _h4$[ebp]
vmovups ymm3, YMMWORD PTR _h4$[ebp+32]
vmovups ymm4, YMMWORD PTR _h4$[ebp+64]
vmovups ymm5, YMMWORD PTR _h4$[ebp+96]
call ?example4@@YQMHMUhva4@@T__m128@@H@Z ; example4
movss DWORD PTR _f$[ebp], xmm0 // int, float, struct(size == __m256), __m128, int - return float
; Line 254
push 5
mov edx, 3
mov ecx, 1
movaps xmm0, XMMWORD PTR _h2$[ebp]
movaps xmm1, XMMWORD PTR _h2$[ebp+16]
vmovups ymm2, YMMWORD PTR _h4$[ebp]
vmovups ymm3, YMMWORD PTR _h4$[ebp+32]
vmovups ymm4, YMMWORD PTR _h4$[ebp+64]
vmovups ymm5, YMMWORD PTR _h4$[ebp+96]
call ?example5@@YQHHUhva2@@HUhva4@@H@Z ; example5
mov DWORD PTR _i$[ebp], eax // int, struct(size == __m128), int, struct(size == __m256), int - return int
; Line 255
vmovups ymm0, YMMWORD PTR _c$[ebp]
mov ecx, 32 ; 00000020H
lea esi, DWORD PTR _h4$[ebp]
lea edi, DWORD PTR $T2[ebp]
rep movsd
lea ecx, DWORD PTR $T2[ebp]
movaps xmm1, XMMWORD PTR _h2$[ebp]
movaps xmm2, XMMWORD PTR _h2$[ebp+16]
movaps xmm3, XMMWORD PTR _h2$[ebp]
movaps xmm4, XMMWORD PTR _h2$[ebp+16]
call ?example6@@YQ?AUhva4@@Uhva2@@U1@T__m256@@0@Z ; example6
vmovups YMMWORD PTR $T1[ebp+96], ymm3
vmovups YMMWORD PTR $T1[ebp+64], ymm2
vmovups YMMWORD PTR $T1[ebp+32], ymm1
vmovups YMMWORD PTR $T1[ebp], ymm0
mov ecx, 32 ; 00000020H
lea esi, DWORD PTR $T1[ebp]
lea edi, DWORD PTR $T3[ebp]
rep movsd
mov ecx, 32 ; 00000020H
lea esi, DWORD PTR $T3[ebp]
lea edi, DWORD PTR _h4$[ebp]
rep movsd // struct(size == __m128), struct(size == __m256), __m256, struct(size == __m128) - return struct(size == __m128)
|
example1 [인자] // __m128, __m128, __m256, __m128, __m256 // [반환] return __m128
xmm0, xmm1, ymm2, xmm3, ymm4 // xmm0
example2 [인자] // int, __m128, int, __m128, __m256, float, int // [반환] return __m256
ecx(레지스터), xmm0, edx(레지스터), xmm1, ymm2, xmm3, 스택(xmm이 아니므로 6번째는 스택에 전달) // ymm0
example3 [인자] // int, struct(size == __m128), int, int, int // [반환] return __m128
ecx(레지스터), xmm0~1, edx(레지스터), 스택, 스택(xmm이 아니므로 3, 4번째는 스택에 전달) // xmm0
example4 [인자] // int, float, struct(size == __m256), __m128, int // [반환] return float
ecx(레지스터), xmm0, ymm(2, 3, 4, 5), xmm1, edx(레지스터) // xmm0
example5 [인자] // int, struct(size == __m128), int, struct(size == __m256), int // [반환] return int
ecx(레지스터), xmm1~2, edx(레지스터), ymm(2,3,4,5), 스택(xmm이 아니므로 5번째는 스택에 전달) // eax
example6
[인자] struct(size == __m128), struct(size == __m256), __m256, struct(size == __m128)
[반환] return struct(size == __m128)
xmm0~1, ecx(레지스터), ymm0, xmm3~4 // ymm0~3
+ rdx -> edx의 확장판
정수가 레지스터 4번째를 넘어가면 스텍으로 밀리는 것이 특징이다.
정수는 갯수를 x86에서 따로 센다.
xmm or ymm와 정수 개수 차이 예제
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
|
float __vectorcall example7(float a, float b, float c, float d, float e, float f, float g, float h, float i, float j, float k, float l) {
return b;
}
int __vectorcall example8(int a, int b, int c, int d, int e, int f, int g, int h, int i, int j, int k, int l) {
return b;
}
float __fastcall example7x(float a, float b, float c, float d, float e, float f, float g, float h, float i, float j, float k, float l) {
return b;
}
int __fastcall example8x(int a, int b, int c, int d, int e, int f, int g, int h, int i, int j, int k, int l) {
return b;
}
int __cdecl main(void)
{
int i;
float f;
f = example7x(1.0f, 2.0f, 3.0f, 4.0f, 5.0f, 6.0f, 7.0f, 8.0f, 9.0f, 10.0f, 11.0f, 12.0f);
i = example8x(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12);
}
|
x86 어셈블리
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
|
; Line 271
push ecx
movss xmm0, DWORD PTR __real@41400000
movss DWORD PTR [esp], xmm0
push ecx
movss xmm0, DWORD PTR __real@41300000
movss DWORD PTR [esp], xmm0
push ecx
movss xmm0, DWORD PTR __real@41200000
movss DWORD PTR [esp], xmm0
push ecx
movss xmm0, DWORD PTR __real@41100000
movss DWORD PTR [esp], xmm0
push ecx
movss xmm0, DWORD PTR __real@41000000
movss DWORD PTR [esp], xmm0
push ecx
movss xmm0, DWORD PTR __real@40e00000
movss DWORD PTR [esp], xmm0
movss xmm5, DWORD PTR __real@40c00000
movss xmm4, DWORD PTR __real@40a00000
movss xmm3, DWORD PTR __real@40800000
movss xmm2, DWORD PTR __real@40400000
movss xmm1, DWORD PTR __real@40000000
movss xmm0, DWORD PTR __real@3f800000
call ?example7@@YQMMMMMMMMMMMMM@Z ; example7 - __vectorcall
movss DWORD PTR _f$[ebp], xmm0
; Line 272
push 12 ; 0000000cH
push 11 ; 0000000bH
push 10 ; 0000000aH
push 9
push 8
push 7
push 6
push 5
push 4
push 3
mov edx, 2
mov ecx, 1
call ?example8@@YQHHHHHHHHHHHHH@Z ; example8 - __vectorcall
mov DWORD PTR _i$[ebp], eax
; Line 273
push ecx
movss xmm0, DWORD PTR __real@41400000
movss DWORD PTR [esp], xmm0
push ecx
movss xmm0, DWORD PTR __real@41300000
movss DWORD PTR [esp], xmm0
push ecx
movss xmm0, DWORD PTR __real@41200000
movss DWORD PTR [esp], xmm0
push ecx
movss xmm0, DWORD PTR __real@41100000
movss DWORD PTR [esp], xmm0
push ecx
movss xmm0, DWORD PTR __real@41000000
movss DWORD PTR [esp], xmm0
push ecx
movss xmm0, DWORD PTR __real@40e00000
movss DWORD PTR [esp], xmm0
push ecx
movss xmm0, DWORD PTR __real@40c00000
movss DWORD PTR [esp], xmm0
push ecx
movss xmm0, DWORD PTR __real@40a00000
movss DWORD PTR [esp], xmm0
push ecx
movss xmm0, DWORD PTR __real@40800000
movss DWORD PTR [esp], xmm0
push ecx
movss xmm0, DWORD PTR __real@40400000
movss DWORD PTR [esp], xmm0
push ecx
movss xmm0, DWORD PTR __real@40000000
movss DWORD PTR [esp], xmm0
push ecx
movss xmm0, DWORD PTR __real@3f800000
movss DWORD PTR [esp], xmm0
call ?example7x@@YIMMMMMMMMMMMMM@Z ; example7x - __fastcall
fstp DWORD PTR _f$[ebp]
; Line 274
push 12 ; 0000000cH
push 11 ; 0000000bH
push 10 ; 0000000aH
push 9
push 8
push 7
push 6
push 5
push 4
push 3
mov edx, 2
mov ecx, 1
call ?example8x@@YIHHHHHHHHHHHHH@Z ; example8x - __fastcall
mov DWORD PTR _i$[ebp], eax
|
x64 어셈블리
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
|
; Line 271
movss xmm0, DWORD PTR __real@41400000
movss DWORD PTR [rsp+88], xmm0
movss xmm0, DWORD PTR __real@41300000
movss DWORD PTR [rsp+80], xmm0
movss xmm0, DWORD PTR __real@41200000
movss DWORD PTR [rsp+72], xmm0
movss xmm0, DWORD PTR __real@41100000
movss DWORD PTR [rsp+64], xmm0
movss xmm0, DWORD PTR __real@41000000
movss DWORD PTR [rsp+56], xmm0
movss xmm0, DWORD PTR __real@40e00000
movss DWORD PTR [rsp+48], xmm0
movss xmm5, DWORD PTR __real@40c00000
movss xmm4, DWORD PTR __real@40a00000
movss xmm3, DWORD PTR __real@40800000
movss xmm2, DWORD PTR __real@40400000
movss xmm1, DWORD PTR __real@40000000
movss xmm0, DWORD PTR __real@3f800000
call ?example7@@YQMMMMMMMMMMMMM@Z ; example7 - __vectorcall
movss DWORD PTR f$[rbp], xmm0
; Line 272
mov DWORD PTR [rsp+88], 12
mov DWORD PTR [rsp+80], 11
mov DWORD PTR [rsp+72], 10
mov DWORD PTR [rsp+64], 9
mov DWORD PTR [rsp+56], 8
mov DWORD PTR [rsp+48], 7
mov DWORD PTR [rsp+40], 6
mov DWORD PTR [rsp+32], 5
mov r9d, 4
mov r8d, 3
mov edx, 2
mov ecx, 1
call ?example8@@YQHHHHHHHHHHHHH@Z ; example8 - __vectorcall
mov DWORD PTR i$[rbp], eax
; Line 273
movss xmm0, DWORD PTR __real@41400000
movss DWORD PTR [rsp+88], xmm0
movss xmm0, DWORD PTR __real@41300000
movss DWORD PTR [rsp+80], xmm0
movss xmm0, DWORD PTR __real@41200000
movss DWORD PTR [rsp+72], xmm0
movss xmm0, DWORD PTR __real@41100000
movss DWORD PTR [rsp+64], xmm0
movss xmm0, DWORD PTR __real@41000000
movss DWORD PTR [rsp+56], xmm0
movss xmm0, DWORD PTR __real@40e00000
movss DWORD PTR [rsp+48], xmm0
movss xmm0, DWORD PTR __real@40c00000
movss DWORD PTR [rsp+40], xmm0
movss xmm0, DWORD PTR __real@40a00000
movss DWORD PTR [rsp+32], xmm0
movss xmm3, DWORD PTR __real@40800000
movss xmm2, DWORD PTR __real@40400000
movss xmm1, DWORD PTR __real@40000000
movss xmm0, DWORD PTR __real@3f800000
call ?example7x@@YAMMMMMMMMMMMMM@Z ; example7x - __fastcall
movss DWORD PTR f$[rbp], xmm0
; Line 274
mov DWORD PTR [rsp+88], 12
mov DWORD PTR [rsp+80], 11
mov DWORD PTR [rsp+72], 10
mov DWORD PTR [rsp+64], 9
mov DWORD PTR [rsp+56], 8
mov DWORD PTR [rsp+48], 7
mov DWORD PTR [rsp+40], 6
mov DWORD PTR [rsp+32], 5
mov r9d, 4
mov r8d, 3
mov edx, 2
mov ecx, 1
call ?example8x@@YAHHHHHHHHHHHHH@Z ; example8x - __fastcall
mov DWORD PTR i$[rbp], eax
|
__vectorcall
x86 - xmm 6개, 정수형 2개
x64 - xmm 6개, 정수형 4개
__fastcall
x86 - xmm 4개, 정수형 2개
x64 - xmm 4개, 정수형 4개
다음글
'C++' 카테고리의 다른 글
간단한 어셈블리어 모음 (0) | 2019.12.23 |
---|---|
함수 호출 규약 - __thiscall, __clrcall (0) | 2019.12.23 |
함수 호출 규약 - __cdecl, __stdcall (0) | 2019.12.23 |
THREAD를 생성하는 5가지 방법 (0) | 2019.12.23 |
메모리 (0) | 2019.12.23 |