test, 1, 2, 3…
Posts made by slowhash
-
RE: [Dev] NeoScrypt GPU Miner - Public Beta Test
Anyone following the Nvidia neoscrypt development?
-
RE: [Dev] NeoScrypt Hardware Comparison Site
NeoScrypt CPUminer updated to v2.4.1, up to 50% performance increase.
-e, --engine=N choose a NeoScrypt hashing engine
0 integer (default)
1 SSE2
2 SSE2 4-wayI was not aware of different ‘engines’ for hashing on CPU’s as I don’t CPU mine. What is the reason for them?
-
RE: [Dev] NeoScrypt GPU Miner - Public Beta Test
^^ and like you said, the compiler probably fixed that.
But people are trying, in order to make mistakes like that. :)
-
RE: [Dev] NeoScrypt GPU Miner - Public Beta Test
Nice to know that the interest in improving the kernel didn’t go away when Wolf said he was keeping his improvements to himself…
I’m not in any way faulting him for that decision, but that doesn’t mean that I have to like it either… ;)
BTW, I got into X11 mining, and guess who showed up as the top kernel writer… lol
-
RE: [Dev] NeoScrypt GPU Miner - Public Beta Test
There are people doing minor mods to the latest wolf kernel, and bumping the speed up just a tad. One decreased my speed by about 1.5%, the other increased by about 1.5%, but combined they gave me about 9 kh/s on my 290’s, roughly 2.5%.
// NeoScrypt(128, 2, 1) with Salsa20/20 and ChaCha20/20
// Stupid AMD compiler ignores the unroll pragma in these two
#define SALSA_SMALL_UNROLL 3
#define CHACHA_SMALL_UNROLL 3// If SMALL_BLAKE2S is defined, BLAKE2S_UNROLL is interpreted
// as the unroll factor; must divide cleanly into ten.
// Usually a bad idea.
//#define SMALL_BLAKE2S
//#define BLAKE2S_UNROLL 5#define BLOCK_SIZE 64U
#define FASTKDF_BUFFER_SIZE 256U
#ifndef PASSWORD_LEN
#define PASSWORD_LEN 80U
#endif#if !defined(cl_khr_byte_addressable_store)
#error “Device does not support unaligned stores”
#endif// Swaps 128 bytes at a time without using temp vars
void SwapBytes128(void *restrict A, void *restrict B, uint len)
{
#pragma unroll 2
for(int i = 0; i < (len >> 7); ++i)
{
((ulong16 *)A)[i] ^= ((ulong16 *)B)[i];
((ulong16 *)B)[i] ^= ((ulong16 *)A)[i];
((ulong16 *)A)[i] ^= ((ulong16 *)B)[i];
}
}void CopyBytes128(void *restrict dst, const void *restrict src, uint len)
{
#pragma unroll 2
for(int i = 0; i < len; ++i)
((ulong16 *)dst)[i] = ((ulong16 *)src)[i];
}void CopyBytes(void *restrict dst, const void *restrict src, uint len)
{
for(int i = 0; i < len; ++i)
((uchar *)dst)[i] = ((uchar *)src)[i];
}//
// a bit of byte alignment checking goes a long ways…
//
void XORBytesInPlace(void *restrict dst, const void *restrict src, uint mod)
{
switch(mod % 4)
{
case 0:
#pragma unroll 2
for(int i = 0; i < 4; i+=2)
{
((uint2 *)dst)[i] ^= ((uint2 *)src)[i];
((uint2 *)dst)[i+1] ^= ((uint2 *)src)[i+1];
}
break;case 2:
#pragma unroll 8
for(int i = 0; i < 16; i+=2)
{
((uchar2 *)dst)[i] ^= ((uchar2 *)src)[i];
((uchar2 *)dst)[i+1] ^= ((uchar2 *)src)[i+1];
}
break;default:
#pragma unroll 8
for(int i = 0; i < 31; i+=4)
{
((uchar *)dst)[i] ^= ((uchar *)src)[i];
((uchar *)dst)[i+1] ^= ((uchar *)src)[i+1];
((uchar *)dst)[i+2] ^= ((uchar *)src)[i+2];
((uchar *)dst)[i+3] ^= ((uchar *)src)[i+3];
}
}
}void XORBytes(void *restrict dst, const void *restrict src1, const void *restrict src2, uint len)
{
#pragma unroll 1
for(int i = 0; i < len; ++i)
((uchar *)dst)[i] = ((uchar *)src1)[i] ^ ((uchar *)src2)[i];
}// Blake2S
#define BLAKE2S_BLOCK_SIZE 64U
#define BLAKE2S_OUT_SIZE 32U
#define BLAKE2S_KEY_SIZE 32Ustatic const __constant uint BLAKE2S_IV[8] =
{
0x6A09E667, 0xBB67AE85, 0x3C6EF372, 0xA54FF53A,
0x510E527F, 0x9B05688C, 0x1F83D9AB, 0x5BE0CD19
};static const __constant uchar BLAKE2S_SIGMA[10][16] =
{
{ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 } ,
{ 14, 10, 4, 8, 9, 15, 13, 6, 1, 12, 0, 2, 11, 7, 5, 3 } ,
{ 11, 8, 12, 0, 5, 2, 15, 13, 10, 14, 3, 6, 7, 1, 9, 4 } ,
{ 7, 9, 3, 1, 13, 12, 11, 14, 2, 6, 5, 10, 4, 0, 15, 8 } ,
{ 9, 0, 5, 7, 2, 4, 10, 15, 14, 1, 11, 12, 6, 8, 3, 13 } ,
{ 2, 12, 6, 10, 0, 11, 8, 3, 4, 13, 7, 5, 15, 14, 1, 9 } ,
{ 12, 5, 1, 15, 14, 13, 4, 10, 0, 7, 6, 3, 9, 2, 8, 11 } ,
{ 13, 11, 7, 14, 12, 1, 3, 9, 5, 0, 15, 4, 8, 6, 2, 10 } ,
{ 6, 15, 14, 9, 11, 3, 0, 8, 12, 2, 13, 7, 1, 4, 10, 5 } ,
{ 10, 2, 8, 4, 7, 6, 1, 5, 15, 11, 9, 14, 3, 12, 13 , 0 } ,
};#define BLAKE_G(idx0, idx1, a, b, c, d, key) do { \
a += b + key[BLAKE2S_SIGMA[idx0][idx1]]; \
d = rotate(d ^ a, 16U); \
c += d; \
b = rotate(b ^ c, 20U); \
a += b + key[BLAKE2S_SIGMA[idx0][idx1 + 1]]; \
d = rotate(d ^ a, 24U); \
c += d; \
b = rotate(b ^ c, 25U); \
} while(0)void Blake2S(uint *restrict inout, const uint *restrict inkey)
{
uint16 V;
uint8 tmpblock;// Load first block (IV into V.lo) and constants (IV into V.hi)
V.lo = V.hi = vload8(0U, BLAKE2S_IV);// XOR with initial constant
V.s0 ^= 0x01012020;// Copy input block for later
tmpblock = V.lo;// XOR length of message so far (including this block)
// There are two uints for this field, but high uint is zero
V.sc ^= BLAKE2S_BLOCK_SIZE;// Compress state, using the key as the key
#ifdef SMALL_BLAKE2S
#pragma unroll BLAKE2S_UNROLL
#else
#pragma unroll
#endif
for(int x = 0; x < 10; ++x)
{
BLAKE_G(x, 0x00, V.s0, V.s4, V.s8, V.sc, inkey);
BLAKE_G(x, 0x02, V.s1, V.s5, V.s9, V.sd, inkey);
BLAKE_G(x, 0x04, V.s2, V.s6, V.sa, V.se, inkey);
BLAKE_G(x, 0x06, V.s3, V.s7, V.sb, V.sf, inkey);
BLAKE_G(x, 0x08, V.s0, V.s5, V.sa, V.sf, inkey);
BLAKE_G(x, 0x0A, V.s1, V.s6, V.sb, V.sc, inkey);
BLAKE_G(x, 0x0C, V.s2, V.s7, V.s8, V.sd, inkey);
BLAKE_G(x, 0x0E, V.s3, V.s4, V.s9, V.se, inkey);
}// XOR low part of state with the high part,
// then with the original input block.
V.lo ^= V.hi ^ tmpblock;// Load constants (IV into V.hi)
V.hi = vload8(0U, BLAKE2S_IV);// Copy input block for later
tmpblock = V.lo;// XOR length of message into block again
V.sc ^= BLAKE2S_BLOCK_SIZE << 1;// Last block compression - XOR final constant into state
V.se ^= 0xFFFFFFFFU;// Compress block, using the input as the key
#ifdef SMALL_BLAKE2S
#pragma unroll BLAKE2S_UNROLL
#else
#pragma unroll
#endif
for(int x = 0; x < 10; ++x)
{
BLAKE_G(x, 0x00, V.s0, V.s4, V.s8, V.sc, inout);
BLAKE_G(x, 0x02, V.s1, V.s5, V.s9, V.sd, inout);
BLAKE_G(x, 0x04, V.s2, V.s6, V.sa, V.se, inout);
BLAKE_G(x, 0x06, V.s3, V.s7, V.sb, V.sf, inout);
BLAKE_G(x, 0x08, V.s0, V.s5, V.sa, V.sf, inout);
BLAKE_G(x, 0x0A, V.s1, V.s6, V.sb, V.sc, inout);
BLAKE_G(x, 0x0C, V.s2, V.s7, V.s8, V.sd, inout);
BLAKE_G(x, 0x0E, V.s3, V.s4, V.s9, V.se, inout);
}// XOR low part of state with high part, then with input block
V.lo ^= V.hi ^ tmpblock;// Store result in input/output buffer
vstore8(V.lo, 0, inout);
}/* FastKDF, a fast buffered key derivation function:
* FASTKDF_BUFFER_SIZE must be a power of 2;
* password_len, salt_len and output_len should not exceed FASTKDF_BUFFER_SIZE;
* prf_output_size must be -
RE: [Dev] NeoScrypt Hardware Comparison Site
You can bump the sapphire R9 290 Tri-x OC up from 333 to 340… The GPU’s came back from warranty replacement, and can get the juice cranked up a little with fans that aren’t failing…
-
RE: [Dev] NeoScrypt Hardware Comparison Site
Will get this added asap, having a few delays with stuff, usual festive stuff with family and being unable to access a fully working pc to actually do things on
Recently tested an R7 240 (its pointless dont try)
Got the core clock to 1300 MHz (default 780), and the memory from 800 to 1300 also (tried upping the voltage from 1.15v all the way up to 1.4v but still couldnt push past that 1300mhz), huge boost in hashrate, from around 12kh/s to about 20 kh/s, this didnt last long sadly as the cards VRMs blew after just 3 days, poor thing, ohwell was a free gpu :D
If someone hands you a 7990, please don’t do the same overclock till burnout test… lol
-
RE: [Dev] NeoScrypt Hardware Comparison Site
+1 for Wolfs kernel. Took this card from 78kh/s to 200+ Kh/s.
Wolf’s kernel took my 290’s from 70 to 330, and my 290x’s from 70 to 302… (%^@$^ Elpida memory)
-
RE: Download the latest NeoGpuMiner (3.7.8) and Sgminer (5.0.1-git) here!
My guess is that they are building drivers for faster gaming frame rates, not mining.
But what do I know? lol
-
RE: Download the latest NeoGpuMiner (3.7.8) and Sgminer (5.0.1-git) here!
I heard this happens with 14.12, you sure you’re not running that?
Absolutely 100% positive that I’m running 14.9 and not 14.12.
-
RE: [Dev] NeoScrypt GPU Miner - Public Beta Test
Both of those builds won’t run on my rig at all. Noted in thread.
-
RE: Download the latest NeoGpuMiner (3.7.8) and Sgminer (5.0.1-git) here!
CGminer 3.7.8 also immediately crashes with no errors, running 14.9 drivers.
-
RE: Download the latest NeoGpuMiner (3.7.8) and Sgminer (5.0.1-git) here!
This SGminer crashes immediately with no error message.
14.9 drivers are installed, and 14.6 dll’s are in the folder, and it does not matter if the bins from a working version of SGminer are in the folder or not.
Is this expected behavior?
-
RE: Download the latest NeoGpuMiner (3.7.8) and Sgminer (5.0.1-git) here!
So just to verify, this is just slightly optimized mining software, and uses the wolf kernel that is not optimized for the hawaii cards?
-
RE: [Dev] NeoScrypt GPU Miner - Public Beta Test
Looking into it now, thank you. :D
-
RE: [Dev] NeoScrypt GPU Miner - Public Beta Test
I’ve been putting 60+ hour weeks at work and dealing with vehicle issues and work xmas parties, so I haven’t had much of a chance to follow neoscrypt development.
Has anyone anywhere compiled an SGminer for windoz that will work with wolf’s latest kernel for the 290/290x GPU’s?
-
RE: [Dev] NeoScrypt Hardware Comparison Site
Type: GPU
Hashrate: 20.38 Kh/s
Vendor: AMD HD 6850
Est. Power: 127w
Miner: cgminer 3.7.8
Proof: -I 12 -w 32I’m looking to help someone setup a 6850 GPU, and ran across this.
The info shows -I12, but the image shows -I14.
Also, isn’t 98c a little high temp?
-
RE: [Dev] NeoScrypt Hardware Comparison Site
Yes, the 280x also has hynix, and that is part of why it beats the XFX. But XFX made a choice to put that cheaper memory in their card, yet charge the same price for it as other cards that have better memory. So I am making the choice to never buy another XFX card. :P
Yes, clock back your cards to where they will be rock solid stable if you are going to leave them unattended for 2 weeks.
And the new style is much easier to read than the old style!!
-
RE: [Dev] NeoScrypt Hardware Comparison Site
Well, success!! (of a sort)
I managed to get the HIS 280x (GPU 0) to hash faster (303 khs) than I’ve been able to get the XFX 290x to hash reliably (GPU 1 & 2 @ 301 khs)…
When I get the 280x to max speed, I’ll upload the stats and proof to the site, but there never be another XFX GPU in my future.
– edit – up to 307k