Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation Fine-Tuning leads to 0 line detection #683

Open
agombert opened this issue Feb 4, 2025 · 0 comments
Open

Segmentation Fine-Tuning leads to 0 line detection #683

agombert opened this issue Feb 4, 2025 · 0 comments

Comments

@agombert
Copy link

agombert commented Feb 4, 2025

Hey,

Thanks for the great work, I have some questions on the fine-tuning. I think it may come from the format of my input data. I've been looking at this link to try to get the right xml well shaped for my jpg images. But after fine-tuning (even after 1 epoch) i don't get any line 👀 .

Here is an example of xml file I have:

<?xml version="1.0" ?>
<alto xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://www.loc.gov/standards/alto/ns-v4#" xsi:schemaLocation="http://www.loc.gov/standards/alto/ns-v4# http://www.loc.gov/standards/alto/v4/alto-4-3.xsd">
    <Description>
        <MeasurementUnit>pixel</MeasurementUnit>
        <sourceImageInformation>
            <fileName>/home/ubuntu/data/20250204_line_detection/images/FRANOM22_COLH78_0261_0232_4.jpg</fileName>
        </sourceImageInformation>
    </Description>
    <Tags>
        <OtherTag DESCRIPTION="line type" ID="LINE_TYPE_1" TYPE="type" LABEL="default"/>
        <OtherTag DESCRIPTION="region type" ID="REGION_TYPE_1" TYPE="region" LABEL="text"/>
    </Tags>
    <Layout>
        <Page WIDTH="491" HEIGHT="722" PHYSICAL_IMG_NR="0" ID="page_0">
            <PrintSpace HPOS="0" VPOS="0" WIDTH="491" HEIGHT="722">
                <TextBlock ID="42c3ee03-8810-4b48-9eb6-ddcb9f23321d" HPOS="0" VPOS="0" WIDTH="491" HEIGHT="722" TAGREFS="REGION_TYPE_1">
                    <TextLine ID="780aac20-46ee-4072-a714-04c14590713b" HPOS="99" VPOS="48" WIDTH="358" HEIGHT="3" BASELINE="99 48 205 50 342 51 457 50" TAGREFS="LINE_TYPE_1">
                        <String CONTENT=""/>
                    </TextLine>
                    <TextLine ID="f9e0050e-4014-437b-bdb6-fc85e4059699" HPOS="107" VPOS="99" WIDTH="327" HEIGHT="1" BASELINE="107 100 194 99 277 100 434 100" TAGREFS="LINE_TYPE_1">
                        <String CONTENT=""/>
                    </TextLine>
                    <TextLine ID="9edd2cd0-e6a5-4579-8734-afafc03819e0" HPOS="101" VPOS="135" WIDTH="196" HEIGHT="8" BASELINE="101 143 167 141 297 135" TAGREFS="LINE_TYPE_1">
                        <String CONTENT=""/>
                    </TextLine>
                    <TextLine ID="5f86381b-c704-4cbd-a1b7-b6295f0ed4a3" HPOS="106" VPOS="177" WIDTH="361" HEIGHT="8" BASELINE="106 185 248 177 467 181" TAGREFS="LINE_TYPE_1">
                        <String CONTENT=""/>
                    </TextLine>
                    <TextLine ID="b4366d1e-151d-4e8c-9960-3bb7437a9c29" HPOS="102" VPOS="226" WIDTH="176" HEIGHT="1" BASELINE="102 226 167 227 278 226" TAGREFS="LINE_TYPE_1">
                        <String CONTENT=""/>
                    </TextLine>
                    <TextLine ID="166863f8-0ebe-488f-97ba-3e62a88aeb79" HPOS="100" VPOS="267" WIDTH="317" HEIGHT="3" BASELINE="100 270 224 268 417 267" TAGREFS="LINE_TYPE_1">
                        <String CONTENT=""/>
                    </TextLine>
                    <TextLine ID="dab2d94e-3ffd-42c2-bf69-7e3d05292b8c" HPOS="100" VPOS="311" WIDTH="254" HEIGHT="1" BASELINE="100 312 192 311 354 312" TAGREFS="LINE_TYPE_1">
                        <String CONTENT=""/>
                    </TextLine>
                    <TextLine ID="35552902-4556-490c-bcae-3fe2826655c6" HPOS="101" VPOS="352" WIDTH="172" HEIGHT="2" BASELINE="101 352 183 354 273 352" TAGREFS="LINE_TYPE_1">
                        <String CONTENT=""/>
                    </TextLine>
                    <TextLine ID="f1fbad86-de43-411b-8233-99030c1235ee" HPOS="104" VPOS="395" WIDTH="268" HEIGHT="4" BASELINE="104 397 191 399 372 395" TAGREFS="LINE_TYPE_1">
                        <String CONTENT=""/>
                    </TextLine>
                    <TextLine ID="f76cd89c-df93-446a-baa8-489ca7a4991e" HPOS="106" VPOS="438" WIDTH="205" HEIGHT="3" BASELINE="106 441 188 438 311 441" TAGREFS="LINE_TYPE_1">
                        <String CONTENT=""/>
                    </TextLine>
                    <TextLine ID="9da1cfdf-e9b1-4322-9171-22843062be75" HPOS="108" VPOS="477" WIDTH="170" HEIGHT="7" BASELINE="108 481 166 477 278 484" TAGREFS="LINE_TYPE_1">
                        <String CONTENT=""/>
                    </TextLine>
                    <TextLine ID="ebea3fb5-1b3f-4ce4-9509-33504388e3a8" HPOS="102" VPOS="526" WIDTH="189" HEIGHT="5" BASELINE="102 531 181 528 291 526" TAGREFS="LINE_TYPE_1">
                        <String CONTENT=""/>
                    </TextLine>
                    <TextLine ID="d45f7502-66d5-4ed2-9a55-6f8a6db7a7dd" HPOS="89" VPOS="572" WIDTH="395" HEIGHT="8" BASELINE="89 580 246 572 417 575 484 575" TAGREFS="LINE_TYPE_1">
                        <String CONTENT=""/>
                    </TextLine>
                    <TextLine ID="6d5e459f-6b18-493b-bcd8-8f2082620791" HPOS="83" VPOS="619" WIDTH="403" HEIGHT="7" BASELINE="83 626 217 625 315 625 486 619" TAGREFS="LINE_TYPE_1">
                        <String CONTENT=""/>
                    </TextLine>
                </TextBlock>
            </PrintSpace>
        </Page>
    </Layout>
</alto>

And here is an BASELINE points on the image:
Image

Then I'm using:

ketos -vvv segtrain -i /home/ubuntu/models/blla.mlmodel -f xml /home/ubuntu/data/20250204_line_detecti
on/alto_xml/*.xml -cl -o /home/ubuntu/models/ft_kraken -d cuda:0

And everything looks to train, but the mean_iu stays around 0.25 and even decreases.

[02/04/25 15:54:37] INFO validation run: accuracy 0.9899430871009827 mean_acc 0.9899430871009827 mean_iu 0.2532690465450287 freq_iu 0.96146160364151

After a few epochs, when I run the inference, I don't get any line though...

Also, I'm using only 30 pictures to test the training before annotating more and scale the process. Do you have any idea why this is not working ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant