ERROR: Argument is of incorrect size in Dict at dict.jl:228 #2402

diegozea · 2013-02-25T17:59:08Z

This works fine on Julia 0.1, but fails on a most recent version
( Version 0.1.0+110766073.r3778 ):

julia> using BioSeq
ERROR: error compiling assign: box: argument is of incorrect size
 in Dict at dict.jl:228
 in include_from_node1 at loading.jl:76
 in include_from_node1 at loading.jl:76
 in reload_path at loading.jl:96
 in require at loading.jl:48
at /home/dzea/.julia/BioSeq/src/alphabets.jl:59
at /home/dzea/.julia/BioSeq/src/BioSeq.jl:64

julia> using BioSeq

julia> (Nucleotide=>Nucleotide)[
                     '.'=>'.',
                     '-'=>'-',
                     'A'=>'T',
                     'B'=>'V',
                     'C'=>'G',
                     'D'=>'H',
                     'G'=>'C',
                     'H'=>'D',
                     'K'=>'M',
                     'M'=>'K',
                     'T'=>'A',
                     'V'=>'B',
                     'S'=>'S',
                     'W'=>'W',
                     'R'=>'Y',
                     'Y'=>'R',
                     'N'=>'N',
                     'X'=>'X',
                     'a'=>'t',
                     'b'=>'v',
                     'c'=>'g',
                     'd'=>'h',
                     'g'=>'c',
                     'h'=>'d',
                     'k'=>'m',
                     'm'=>'k',
                     't'=>'a',
                     'v'=>'b',
                     's'=>'s',
                     'w'=>'w',
                     'r'=>'y',
                     'y'=>'r',
                     'x'=>'x',
                     'n'=>'n' ]
ERROR: error compiling assign: box: argument is of incorrect size
 in Dict at dict.jl:228

The text was updated successfully, but these errors were encountered:

pao · 2013-02-25T18:01:37Z

Nucleotide is an 8-bit bitstype defined here: https://github.com/diegozea/BioSeq.jl/blob/master/src/nucleotide.jl#L3

diegozea · 2013-02-25T18:08:55Z

Looks like the problem is with the convert between Char and Nucleotide

julia> convert(Nucleotide,'A')
ERROR: error compiling convert: box: argument is of incorrect size

I defined the convert rule on the next way:

convert{T<:ASCIIChar}(::Type{T}, x::Char) = box(T,trunc8(unbox(Char,x)))

I don't see the problem. Worked fine in the past.

diegozea · 2013-02-25T18:11:00Z

Maybe is because of this: 24268cc

pao · 2013-02-25T18:12:42Z

@diegozea Have you done the make clean mentioned in 24268cc's commit message?

vtjnash · 2013-02-25T18:16:30Z

As you pointed out, the trunc8 intrinsic was removed from the language in 24268cc in favor of using trunc_int.

StefanKarpinski · 2013-02-25T18:19:30Z

The fact that Nucleotide is a subtype of something called ASCIIChar is so jarringly wrong, I have a hard time getting past it, never mind that an eight bit char type makes no sense in Julia's string system in the first place. Yes, the trunc8 intrinsic is gone. This kind of thing is why I recommended converting to Uint8 and then relying on the predefined behavior of that. Intrinsics are not really part of Julia's stable API, but rather a means for the system to bootstrap itself. As soon as the immutable branch gets merged, there will be even less call to use intrinsics for this.

diegozea · 2013-02-25T18:34:53Z

Thanks @pao, I need to do the make clean

@StefanKarpinski, would you recommended to me do with this? The problem with ASCIIChar is the name, or all the idea of get Char's functions on an 8-bits bitetype? I open this issue on BioSeq in order to don't make noise here: diegozea/BioSeq.jl#1

StefanKarpinski · 2013-02-25T18:57:14Z

A nucleotide is not a character. They're completely unrelated things. If it's convenient to allow transparent conversion between characters and nucleotides, that's fine, but making Nucleotide an actual subtype of ASCIIChar makes no sense. How is ASCIIChar even an abstract type? What? Char isn't abstract. How is ASCIIChar abstract? The very concept of ASCIIChar is nonsensical: a character doesn't have an encoding, a string does. A string is an encoded representation of a sequence of characters: the characters are the encoded values – they themselves don't have an encoding. The phrase "ASCII character" is just used to mean "characters that can be encoded by ASCII".

Historically, strings happened to be a convenient way to represent nucleotides in languages that didn't have the tools to make a proper representation. Here, in a language that gives you every possible tool to do this the right way, I see Nucleotide <: ASCIIChar and it makes me feel like I'm taking crazy pills. You ended up copy-and-pasting with changes all the Char code anyway, so where's the abstraction benefit? There is none. This stuff has nothing to do with ASCII or text strings. At the very beginning I suggested that the right approach was to abstract out common operations on sequences that could apply to both strings and things like nucleotide sequences, but that advice was completely ignored and instead we have nonsense like Nucleotide <: ASCIIChar, so I've really given up on this at this point.

diegozea · 2013-02-25T20:03:34Z

@StefanKarpinski I didn't ignore your advise!
Maybe I don't understand complete what you want (a draft can help a lot) and/or I failed on the implementation (you can look a lot of versions I tried over the last moth on the mail list).
Now that the package is on GitHub, you can make a fork and collaborate with it.
Here I open a issue for this: diegozea/BioSeq.jl#1
I expect you don't give up.
I'm really working hard on this! And I need a lot of help.
Thanks

StefanKarpinski · 2013-02-25T22:31:05Z

I do appreciate all your hard work on this and the amount of time you've spent working on this and other Julia code. I'm not angry at all, but rather frustrated. The frustration stems from my sense that you have repeatedly asked what the best way to do something is, gotten advice that wasn't what you had in mind, and then done the complete opposite. Early on, I wrote the following on the subject of whether you should represent DNA sequences using strings and characters:

In other languages, co-opting strings for bioinformatics data may be the best approach, but for Julia, that's definitely not what I would recommend. Instead, I strongly suggest creating custom bits types and either working with arrays of them or making some custom sequence container. (There may be a case for a Stringlike abstraction above String, to which generic code that applies equally well to sequences of things like DNA base pairs and strings of characters.)

You can, for example, define an 4-bit BasePair type, and have a BasePairArray that uses only 4 bits per pair (like how BitArray packs 8 values into each byte). You can even have a string literal representation for these arrays – dna"CGAATAACG". While Keno is quite right that you would have to define operations on BasePair objects, that seems reasonable – I can't see why integer arithmetic makes sense for BasePairs. It might also be useful to have a Codon type or an AminoAcid type and work with sequences of those. Much of the sequence manipulation logic and code can be shared with between types.

...

The bottom line is that DNA sequences are not strings and they're definitely not Unicode strings. The fact that they've been crammed into strings in Perl, Python, etc. is merely a historical artifact of those languages not being able to define new, appropriate data types for representing DNA. It becomes increasingly awkward and awful to use strings to represent DNA as those languages improve their Unicode support.

I would say that's a pretty clear "no, don't use strings and characters to represent DNA sequences". Now we have DNA sequences represented as strings and characters. Later on, on the subject of how to implement the Char8 type, which I clearly said shouldn't exist at all, I recommended against copying all the intrinsics in char.jl, saying specifically:

Intrinsics are a very low level mechanism that allows the language to bootstrap itself, and probably should be used by very much user-level code. Instead, I would recommend providing conversions to and from Uint8 and defining Char8 behavior by translating to Uint8 and back. Translation between these types is actually a no-op because they have identical representations; it only affects the behavior associated with those byte values.

Instead, Char8 is implemented using intrinsics. Perhaps I was unclear, or I'm too verbose, or you just chose to pay more attention to other simpler statements, but in any case, this is where my frustration comes from. But that's ok. I don't expect everyone to do things the way I suggest, nor would I want that to be the case.

Perhaps we can take another shot at this stuff once the immutable branch gets merged, since that will allow a much simpler representation of a nucleotide using an immutable composite type.

diegozea · 2013-02-26T23:02:26Z

I strongly suggest creating custom bits types and either working with arrays of them or making some custom sequence container. (There may be a case for a Stringlike abstraction above String, to which generic code that applies equally well to sequences of things like DNA base pairs and strings of characters.)

@StefanKarpinski I need some hints on how you think such abstraction can be. I have problems on the past, because AbstractArray and String interface are not the same, and I don't have the ability for inherit from both. So, I try using both (using a Composite type as container), and I feel that AbstractArray is much better. So, I choose using Array (and for example gain the great power of SubArrays and DArrays!!!) and I define every method for work on Strings too. Is There other way? After your words of no-opting Strings, because immutability and others, I think user don't have to use ASCIIString as a Sequences. Because of that, in some moment, I thought don't offer an interface for Strings. But... What do you think?

StefanKarpinski · 2013-02-27T14:19:18Z

Sorry for the slow reply – yesterday was a bit of a short/busy day for me. For now, I think that making your own bits type and having arrays of those is the best way forward. There is already some overlap in the interfaces of strings and arrays, but that is an area where we definitely need more work. Fortunately, duck typing is the ultimate polymorphic escape hatch – even when you can't coinherit, you can always just duck type. I see you've already made some specific changes, so I'll leave comments there instead...

This should include the recent `is_stdlib()` fixes. Short commit log: ``` 7a9d9654 (HEAD -> master, origin/master, origin/HEAD) [ext/HSG]: Store next release _and_ latest nightly (#2418) 7b870924 [ext/HSG]: Enable generating historical stdlibs on macOS (#2417) 5d496193 Update Project.toml feada149 only use the stdlib version cache if a custom version is given to the resolver bae808dc Fix Markdown table formatting (#2416) 6e8b6214 Update docstrings for io kwargs, some io kwarg fixes, update stdlib list (#2402) c2e3879e Mark the "STDLIBS_BY_VERSION up-to-date" test as broken (#2409) ```

This should include the recent `is_stdlib()` fixes. Short commit log: ``` 7a9d9654 (HEAD -> master, origin/master, origin/HEAD) [ext/HSG]: Store next release _and_ latest nightly (JuliaLang#2418) 7b870924 [ext/HSG]: Enable generating historical stdlibs on macOS (JuliaLang#2417) 5d496193 Update Project.toml feada149 only use the stdlib version cache if a custom version is given to the resolver bae808dc Fix Markdown table formatting (JuliaLang#2416) 6e8b6214 Update docstrings for io kwargs, some io kwarg fixes, update stdlib list (JuliaLang#2402) c2e3879e Mark the "STDLIBS_BY_VERSION up-to-date" test as broken (JuliaLang#2409) ```

StefanKarpinski closed this as completed Feb 25, 2013

diegozea mentioned this issue Feb 25, 2013

Reorganization of BioSeq types diegozea/BioSeq.jl#1

Closed

4 tasks

ghost mentioned this issue Oct 9, 2014

"box: argument is of incorrect size" when assigning to a Dict #8637

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

ERROR: Argument is of incorrect size in Dict at dict.jl:228 #2402

ERROR: Argument is of incorrect size in Dict at dict.jl:228 #2402

diegozea commented Feb 25, 2013

pao commented Feb 25, 2013

Uh oh!

diegozea commented Feb 25, 2013

Uh oh!

diegozea commented Feb 25, 2013

Uh oh!

pao commented Feb 25, 2013

Uh oh!

vtjnash commented Feb 25, 2013

Uh oh!

StefanKarpinski commented Feb 25, 2013

Uh oh!

diegozea commented Feb 25, 2013

Uh oh!

StefanKarpinski commented Feb 25, 2013

Uh oh!

diegozea commented Feb 25, 2013

Uh oh!

StefanKarpinski commented Feb 25, 2013

Uh oh!

diegozea commented Feb 26, 2013

Uh oh!

StefanKarpinski commented Feb 27, 2013

Uh oh!

Uh oh!

ERROR: Argument is of incorrect size in Dict at dict.jl:228 #2402

ERROR: Argument is of incorrect size in Dict at dict.jl:228 #2402

Comments

diegozea commented Feb 25, 2013

pao commented Feb 25, 2013

Uh oh!

diegozea commented Feb 25, 2013

Uh oh!

diegozea commented Feb 25, 2013

Uh oh!

pao commented Feb 25, 2013

Uh oh!

vtjnash commented Feb 25, 2013

Uh oh!

StefanKarpinski commented Feb 25, 2013

Uh oh!

diegozea commented Feb 25, 2013

Uh oh!

StefanKarpinski commented Feb 25, 2013

Uh oh!

diegozea commented Feb 25, 2013

Uh oh!

StefanKarpinski commented Feb 25, 2013

Uh oh!

diegozea commented Feb 26, 2013

Uh oh!

StefanKarpinski commented Feb 27, 2013

Uh oh!