Skip to content

ERROR: Argument is of incorrect size in Dict at dict.jl:228 #2402

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
diegozea opened this issue Feb 25, 2013 · 12 comments
Closed

ERROR: Argument is of incorrect size in Dict at dict.jl:228 #2402

diegozea opened this issue Feb 25, 2013 · 12 comments

Comments

@diegozea
Copy link
Contributor

This works fine on Julia 0.1, but fails on a most recent version
( Version 0.1.0+110766073.r3778 ):

julia> using BioSeq
ERROR: error compiling assign: box: argument is of incorrect size
 in Dict at dict.jl:228
 in include_from_node1 at loading.jl:76
 in include_from_node1 at loading.jl:76
 in reload_path at loading.jl:96
 in require at loading.jl:48
at /home/dzea/.julia/BioSeq/src/alphabets.jl:59
at /home/dzea/.julia/BioSeq/src/BioSeq.jl:64

julia> using BioSeq

julia> (Nucleotide=>Nucleotide)[
                     '.'=>'.',
                     '-'=>'-',
                     'A'=>'T',
                     'B'=>'V',
                     'C'=>'G',
                     'D'=>'H',
                     'G'=>'C',
                     'H'=>'D',
                     'K'=>'M',
                     'M'=>'K',
                     'T'=>'A',
                     'V'=>'B',
                     'S'=>'S',
                     'W'=>'W',
                     'R'=>'Y',
                     'Y'=>'R',
                     'N'=>'N',
                     'X'=>'X',
                     'a'=>'t',
                     'b'=>'v',
                     'c'=>'g',
                     'd'=>'h',
                     'g'=>'c',
                     'h'=>'d',
                     'k'=>'m',
                     'm'=>'k',
                     't'=>'a',
                     'v'=>'b',
                     's'=>'s',
                     'w'=>'w',
                     'r'=>'y',
                     'y'=>'r',
                     'x'=>'x',
                     'n'=>'n' ]
ERROR: error compiling assign: box: argument is of incorrect size
 in Dict at dict.jl:228
@pao
Copy link
Member

pao commented Feb 25, 2013

Nucleotide is an 8-bit bitstype defined here: https://github.com/diegozea/BioSeq.jl/blob/master/src/nucleotide.jl#L3

@diegozea
Copy link
Contributor Author

Looks like the problem is with the convert between Char and Nucleotide

julia> convert(Nucleotide,'A')
ERROR: error compiling convert: box: argument is of incorrect size

I defined the convert rule on the next way:

convert{T<:ASCIIChar}(::Type{T}, x::Char) = box(T,trunc8(unbox(Char,x)))

I don't see the problem. Worked fine in the past.

@diegozea
Copy link
Contributor Author

Maybe is because of this: 24268cc

@pao
Copy link
Member

pao commented Feb 25, 2013

@diegozea Have you done the make clean mentioned in 24268cc's commit message?

@vtjnash
Copy link
Member

vtjnash commented Feb 25, 2013

As you pointed out, the trunc8 intrinsic was removed from the language in 24268cc in favor of using trunc_int.

@StefanKarpinski
Copy link
Member

The fact that Nucleotide is a subtype of something called ASCIIChar is so jarringly wrong, I have a hard time getting past it, never mind that an eight bit char type makes no sense in Julia's string system in the first place. Yes, the trunc8 intrinsic is gone. This kind of thing is why I recommended converting to Uint8 and then relying on the predefined behavior of that. Intrinsics are not really part of Julia's stable API, but rather a means for the system to bootstrap itself. As soon as the immutable branch gets merged, there will be even less call to use intrinsics for this.

@diegozea
Copy link
Contributor Author

Thanks @pao, I need to do the make clean

@StefanKarpinski, would you recommended to me do with this? The problem with ASCIIChar is the name, or all the idea of get Char's functions on an 8-bits bitetype? I open this issue on BioSeq in order to don't make noise here: diegozea/BioSeq.jl#1

@StefanKarpinski
Copy link
Member

A nucleotide is not a character. They're completely unrelated things. If it's convenient to allow transparent conversion between characters and nucleotides, that's fine, but making Nucleotide an actual subtype of ASCIIChar makes no sense. How is ASCIIChar even an abstract type? What? Char isn't abstract. How is ASCIIChar abstract? The very concept of ASCIIChar is nonsensical: a character doesn't have an encoding, a string does. A string is an encoded representation of a sequence of characters: the characters are the encoded values – they themselves don't have an encoding. The phrase "ASCII character" is just used to mean "characters that can be encoded by ASCII".

Historically, strings happened to be a convenient way to represent nucleotides in languages that didn't have the tools to make a proper representation. Here, in a language that gives you every possible tool to do this the right way, I see Nucleotide <: ASCIIChar and it makes me feel like I'm taking crazy pills. You ended up copy-and-pasting with changes all the Char code anyway, so where's the abstraction benefit? There is none. This stuff has nothing to do with ASCII or text strings. At the very beginning I suggested that the right approach was to abstract out common operations on sequences that could apply to both strings and things like nucleotide sequences, but that advice was completely ignored and instead we have nonsense like Nucleotide <: ASCIIChar, so I've really given up on this at this point.

@diegozea
Copy link
Contributor Author

@StefanKarpinski I didn't ignore your advise!
Maybe I don't understand complete what you want (a draft can help a lot) and/or I failed on the implementation (you can look a lot of versions I tried over the last moth on the mail list).
Now that the package is on GitHub, you can make a fork and collaborate with it.
Here I open a issue for this: diegozea/BioSeq.jl#1
I expect you don't give up.
I'm really working hard on this! And I need a lot of help.
Thanks

@StefanKarpinski
Copy link
Member

I do appreciate all your hard work on this and the amount of time you've spent working on this and other Julia code. I'm not angry at all, but rather frustrated. The frustration stems from my sense that you have repeatedly asked what the best way to do something is, gotten advice that wasn't what you had in mind, and then done the complete opposite. Early on, I wrote the following on the subject of whether you should represent DNA sequences using strings and characters:

In other languages, co-opting strings for bioinformatics data may be the best approach, but for Julia, that's definitely not what I would recommend. Instead, I strongly suggest creating custom bits types and either working with arrays of them or making some custom sequence container. (There may be a case for a Stringlike abstraction above String, to which generic code that applies equally well to sequences of things like DNA base pairs and strings of characters.)

You can, for example, define an 4-bit BasePair type, and have a BasePairArray that uses only 4 bits per pair (like how BitArray packs 8 values into each byte). You can even have a string literal representation for these arrays – dna"CGAATAACG". While Keno is quite right that you would have to define operations on BasePair objects, that seems reasonable – I can't see why integer arithmetic makes sense for BasePairs. It might also be useful to have a Codon type or an AminoAcid type and work with sequences of those. Much of the sequence manipulation logic and code can be shared with between types.

...

The bottom line is that DNA sequences are not strings and they're definitely not Unicode strings. The fact that they've been crammed into strings in Perl, Python, etc. is merely a historical artifact of those languages not being able to define new, appropriate data types for representing DNA. It becomes increasingly awkward and awful to use strings to represent DNA as those languages improve their Unicode support.

I would say that's a pretty clear "no, don't use strings and characters to represent DNA sequences". Now we have DNA sequences represented as strings and characters. Later on, on the subject of how to implement the Char8 type, which I clearly said shouldn't exist at all, I recommended against copying all the intrinsics in char.jl, saying specifically:

Intrinsics are a very low level mechanism that allows the language to bootstrap itself, and probably should be used by very much user-level code. Instead, I would recommend providing conversions to and from Uint8 and defining Char8 behavior by translating to Uint8 and back. Translation between these types is actually a no-op because they have identical representations; it only affects the behavior associated with those byte values.

Instead, Char8 is implemented using intrinsics. Perhaps I was unclear, or I'm too verbose, or you just chose to pay more attention to other simpler statements, but in any case, this is where my frustration comes from. But that's ok. I don't expect everyone to do things the way I suggest, nor would I want that to be the case.

Perhaps we can take another shot at this stuff once the immutable branch gets merged, since that will allow a much simpler representation of a nucleotide using an immutable composite type.

@diegozea
Copy link
Contributor Author

I strongly suggest creating custom bits types and either working with arrays of them or making some custom sequence container. (There may be a case for a Stringlike abstraction above String, to which generic code that applies equally well to sequences of things like DNA base pairs and strings of characters.)

@StefanKarpinski I need some hints on how you think such abstraction can be. I have problems on the past, because AbstractArray and String interface are not the same, and I don't have the ability for inherit from both. So, I try using both (using a Composite type as container), and I feel that AbstractArray is much better. So, I choose using Array (and for example gain the great power of SubArrays and DArrays!!!) and I define every method for work on Strings too. Is There other way? After your words of no-opting Strings, because immutability and others, I think user don't have to use ASCIIString as a Sequences. Because of that, in some moment, I thought don't offer an interface for Strings. But... What do you think?

@StefanKarpinski
Copy link
Member

Sorry for the slow reply – yesterday was a bit of a short/busy day for me. For now, I think that making your own bits type and having arrays of those is the best way forward. There is already some overlap in the interfaces of strings and arrays, but that is an area where we definitely need more work. Fortunately, duck typing is the ultimate polymorphic escape hatch – even when you can't coinherit, you can always just duck type. I see you've already made some specific changes, so I'll leave comments there instead...

staticfloat added a commit that referenced this issue Mar 6, 2021
This should include the recent `is_stdlib()` fixes.  Short commit log:

```
7a9d9654 (HEAD -> master, origin/master, origin/HEAD) [ext/HSG]: Store next release _and_ latest nightly (#2418)
7b870924 [ext/HSG]: Enable generating historical stdlibs on macOS (#2417)
5d496193 Update Project.toml
feada149 only use the stdlib version cache if a custom version is given to the resolver
bae808dc Fix Markdown table formatting (#2416)
6e8b6214 Update docstrings for io kwargs, some io kwarg fixes, update stdlib list (#2402)
c2e3879e Mark the "STDLIBS_BY_VERSION up-to-date" test as broken (#2409)
```
staticfloat added a commit that referenced this issue Mar 6, 2021
This should include the recent `is_stdlib()` fixes.  Short commit log:

```
7a9d9654 (HEAD -> master, origin/master, origin/HEAD) [ext/HSG]: Store next release _and_ latest nightly (#2418)
7b870924 [ext/HSG]: Enable generating historical stdlibs on macOS (#2417)
5d496193 Update Project.toml
feada149 only use the stdlib version cache if a custom version is given to the resolver
bae808dc Fix Markdown table formatting (#2416)
6e8b6214 Update docstrings for io kwargs, some io kwarg fixes, update stdlib list (#2402)
c2e3879e Mark the "STDLIBS_BY_VERSION up-to-date" test as broken (#2409)
```
ElOceanografo pushed a commit to ElOceanografo/julia that referenced this issue May 4, 2021
This should include the recent `is_stdlib()` fixes.  Short commit log:

```
7a9d9654 (HEAD -> master, origin/master, origin/HEAD) [ext/HSG]: Store next release _and_ latest nightly (JuliaLang#2418)
7b870924 [ext/HSG]: Enable generating historical stdlibs on macOS (JuliaLang#2417)
5d496193 Update Project.toml
feada149 only use the stdlib version cache if a custom version is given to the resolver
bae808dc Fix Markdown table formatting (JuliaLang#2416)
6e8b6214 Update docstrings for io kwargs, some io kwarg fixes, update stdlib list (JuliaLang#2402)
c2e3879e Mark the "STDLIBS_BY_VERSION up-to-date" test as broken (JuliaLang#2409)
```
antoine-levitt pushed a commit to antoine-levitt/julia that referenced this issue May 9, 2021
This should include the recent `is_stdlib()` fixes.  Short commit log:

```
7a9d9654 (HEAD -> master, origin/master, origin/HEAD) [ext/HSG]: Store next release _and_ latest nightly (JuliaLang#2418)
7b870924 [ext/HSG]: Enable generating historical stdlibs on macOS (JuliaLang#2417)
5d496193 Update Project.toml
feada149 only use the stdlib version cache if a custom version is given to the resolver
bae808dc Fix Markdown table formatting (JuliaLang#2416)
6e8b6214 Update docstrings for io kwargs, some io kwarg fixes, update stdlib list (JuliaLang#2402)
c2e3879e Mark the "STDLIBS_BY_VERSION up-to-date" test as broken (JuliaLang#2409)
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants